This blog comes from the Snowflake Architecture and SQL book by Tom Coffing and David Cook.
Are you ready to learn SQL from the best technical trainer the world has ever seen? Tom Coffing, AKA Tera-Tom, makes learning fun, exciting, and easy. Learning SQL is one of the best moves you can make in your career because you can query every database with SQL.
Tom became famous as a teacher because he memorized every student's name when he greeted them. Tom has taught over 1,000 classes and never missed a single name. Ask anyone who attended a class taught by Tom Coffing.
Tom has also led a team of developers for almost 20 years to create the Nexus, which queries, migrates, and joins data across all systems. Amazingly, Nexus migrates between all the databases below with the click of the mouse.
• MySQL
• Excel
• Microsoft Access
• Snowflake
• Redshift
• Synapse
• BigQuery
• Teradata
• Oracle
• SQL Server
• DB2
• Hadoop
• MapR
• Greenplum
• Postgres
• Vertica
• Netezza
• SAP HANA
• Yellowbrick
• SQLite
You can download a free trial of Nexus at CoffingDW.com. You can also see some of the great features of Nexus right here.
What is a Recursive Derived Table?
Derived tables and Recursive Derived tables have one thing in common: they are temporary tables any user can create that are automatically deleted at the end of the query.
A Recursive Derived table is used when there is a natural hierarchy in the data. Recursive derived tables place one or two rows inside the derived table but then loop and add additional rows connecting the hierarchy. Consider the company hierarchy in the picture below. TeraTom Coffing is at the top of the hierarchy as the CEO. Jane Stevens and Ricardo Gonzales report directly to TeraTom, so they are one step below, and the hierarchy continues down to the analysts.
The next example below is the entire recursive query, but don't focus on understanding the entire query because I will break it down into sections and explain it.
Focus on the top two lines of SQL and notice the WITH RECURSIVE beginning, which establishes we are creating a recursive derived table. We are naming the recursive derived table TERATOM and notifying the database that the recursive derived table will contain five columns named EMP, MGR, LASTN, POS_NAME, and the keyword DEPTH.
The next example will explain more.
Although the examples above and below are the same SQL, there is different color highlighting to break down the example into sections. Notice the blue and red colors because, in the top section below, we are populating the recursive derived table with a single row.
The only employee with a NULL value for their manager in the column MGR_EMPLOYEE_NO is the CEO, who reports to nobody. Everything above the keywords UNION ALL is merely establishing the recursive derived table, its name, the column names, and the entry of a single row, who happens to be the CEO, Tom Coffing.
Notice that we have set the depth to zero and that the Mgr is a NULL value represented by the question mark. We now have one row in the recursive derived table, a term called seeding the table. Much like you throw grass seed down to grow your lawn, we are seeing the table with a single row to grow the table.
The next section after the UNION ALL keyword joins the recursive derived table row of the CEO with the employees one level down. Notice we are joining on the recursive derived table column EMP representing Mr. Coffing's employee number and the HIERARCHY_TABLE's column MGR_EMPLOYEE_NO.
Check out the next example below after this one to see a clearer picture.
Notice that Stevens and Gonzales have an MGR_EMPLOYEE_NO of one, which joins to EMP one because both report to their manager TeraTom Coffing. Now, after the first loop (join) of the recursive derived table with the HIERARCHY_TABLE, there are three rows in the recursive derived table.
A recursive derived table is designed to loop until it loops and doesn't add a row. However, in the example below, the second loop adds Patel and Mumba because their managers are Stevens and Gonzales.
The next loop adds all analysts reporting to Patel and Mumba.
When the fourth loop fails to add a row, the recursive derived table knows to end the loop. The last section below in blue does a SELECT on the recursive derived table and orders the rows by DEPTH (column 5), MGR (column 2), and employee number (column 1).
Below is the SQL again with the final answer set. Once the answer set is received, the recursive derived table is deleted automatically. Now you know exactly how a recursive derived table works.
It is never too early to teach your children about data. Do what this family does and read them a story of the greatest data scientist ever!
Kommentare