The Redstounding Redshift Correlated Subquery
Below is a video of Nexus in action; that is all you will need to see to realize why 20 years of dedication to corporate perfection pays dividends.
I will first show you the fundamental concepts behind a subquery in the examples. I will then show you a basic subquery. I will show you a more complex subquery, and finally, I will show you the correlated subquery.
The example below is not a subquery but a simple in-list containing 100, 200, 300, 400, and 500. Notice the parenthesis. The result returns all employees in DEPT_NO 100, 200, 300, 400, or 500.
In the next example, I show you the first query above on the left but notice the equivalent subquery on the right. Instead of having the department numbers in parenthesis, I have run a subquery to return the department numbers from the department_table. The bottom query runs first, passing up to the top query department numbers 100, 200, 300, 400, and 500.
The example below shows a more complex subquery because it uses aggregation in the subquery. The bottom query runs first to produce the average salary, which is passed to the top query to retrieve employees who have a salary greater than the average salary in the company. The query on the right shows the average salary, which is not in the answer set of our subquery example because the subquery results are never in the answer set.
In our next example, we see the mind-boggling correlated subquery in the left panel. In the previous example above, we return people making a salary greater than the average salary in the company. In the correlated subquery below, we return employees making a salary greater than the average salary in their department.
If you can wrap your head around what seems impossible, the correlated subquery is simple. There are three minor differences between the normal subquery example above and the correlated subquery below.
The first two differences are easy because all we do is alias the table names of both tables. However, the key to the correlated subquery is to use a WHERE clause that co-relates or correlates the top to the bottom as WHERE E.DEPT_NO = EE.DEPT_NO, so now we see all employees making a salary greater than the average salary within their DEPT_NO.
In a correlated subquery, the top query runs first and then runs the bottom query once for each distinct department. Correlated subqueries are fast and efficient on massively parallel processing systems.
The query on the right panel shows the department numbers and their average salaries so you can validate that John, Herbert, and Cletus have a salary greater than the average salary in their department.
Tom Coffing has also led a team of developers for almost 20 years to create the Nexus, which queries, migrates, and joins data across all systems. Amazingly, Nexus migrates between all the databases below with the click of the mouse.
• Microsoft Access
• SQL Server
• SAP HANA
The picture below shows Nexus in dark mode migrating Teradata to Redshift. Some customers migrate thousands of tables to Redshift in a single job.
If you were a dog data scientist, wouldn't you be a retriever of data?