Vertica - Create Table Examples and Data Distribution
Updated: Sep 18, 2022
Examples from the blog post come from the Vertica Architecture and SQL book by Tom Coffing and David Cook.
The blog is below this first video, but did you know you can migrate directly to Vertica in a heartbeat with Nexus? This video shows the many features and options for migrating 19 different databases to Vertica with the click of a button. We also show how we can join tables from 19 different systems with Vertica in a single query (federated query). We also show how to search for column names (i.e., customer_number) across all systems in your enterprise and then join them. Finally, you will see that users can work from home to migrate and join data, but they have the option to execute the migrations from a Nexus Server (located on-premises or on any cloud) for high-speed transfers.
In the picture below, you can see that Vertica was born to be parallel. With each query, each Segment performs a single step in parallel. A Vertica system consists of nodes that will work in parallel to store and process your data. This design allows you to start small and grow infinitely. If your Vertica system provides an excellent Return On Investment (ROI), continue investing by purchasing more nodes (adds additional Segments). Most companies start small, but after seeing what Vertica can do, they continue to grow their ROI from the single step of implementing a Vertica system to millions of dollars in profits. Double your Segments and double your speeds Forever. Vertica provides a journey of a thousand smiles!
In the next example below, you have a distribution key, which Vertica refers to as SEGMENT BY HASH. Each row hash uses mathematics to place the entire row on the proper Segment. All parallel processing systems use a hash formula to place the data intelligently. The entire table row is on a segment, but each column in the row is in a separate block. Vertica spreads the rows of a table evenly across the nodes when the distribution key value is unique, such as employee_no. A good Distribution Key is a key to even distribution!
When Unsegmented is the distribution choice, the entire table copies and stores to each Segment, other systems refer to this choice as a replicated table. The general idea is to Segregate all large tables and use Unsegmented on smaller tables. You only use Unsegmented tables to enhance performance for joins. When Segmented humongous tables join with Unsegmented smaller tables, the joining data resides “Segment local,” which provides the fastest tuning for joins.
Vertica cleverly allows you to sort the data on each Segment. For example, in the picture below, we choose Order_No to distribute the rows evenly amongst the Segments, but we then sort the data on each Segment by Order_Date. When data is distributed evenly over the Segments and an important date column orders the data on each Segment, the performance can be incredible.
Imagine joining two giant tables that both contain hundreds of millions of rows. A brilliant tuning technique is to give both tables the same hash key as the join column key. In our picture below, we have done that because whenever two rows join, both rows must reside physically on the same Segment; otherwise, Vertica redistributes the data, so both rows reside on the same Segment. For example, if you SEGMENT BY HASH (DEPT_NO) on both tables because DEPT_NO is the join condition, matching rows will naturally hash together to the same Segment, and the join is much faster. Experts refer to this as Segment local joins, one of the best techniques for performance tuning for large table joins.
If you want Tom Coffing to teach your team a class on the Vertica Architecture and SQL with performance tuning in mind, check out the book table of contents and the course outline here. https://www.nexusdataserver.com/services-9. Individuals can purchase the PDF from Tom.Coffing@CoffingDW.com for $199.00.
The Nexus is often described as a Query tool on steroids because it queries all systems in every enterprise. Above, the Nexus is connected to 20 different systems! For a free trial of Nexus, please contact Tom Coffing at Tom.Coffing@CoffingDW.com or phone 513 300-0341.