Snowflake – Data Loading Stages Part 1
This blog comes from the Snowflake Architecture and SQL book by Tom Coffing and David Cook.
This blog is a multi-part series that will arm you with extensive knowledge about data loading.
You will first learn about staging, like a staging area in a factory or physical warehouse, where trucks unload goods to forklifts and place the goods from the staging area to a physical location. Snowflake has four stages providing enormous flexibility to load data inside Snowflake.
I will give step-by-step examples of these stages in the multi-part series to load actual data. I will even give you a table you can load and an entire script that produces a database, schema, tables, and data. This data coincides with the Snowflake Architecture and SQL book I use to teach my Snowflake boot camp (available only to corporate customers).
But before I begin the blog, let me brag for a moment about me putting my money where my mouth is about data migration. For the past 18 years, I didn't just write books or teach about data migration; instead, I built brilliant and beautiful software products to query, migrate, and join data across all systems. Trust me; if I knew it would be this difficult and take this long, I might have thought twice!
What do I know about data loading into Snowflake? I have the largest customers in the world using my product (Nexus Server) to automate loading thousands of tables to Snowflake. We have extensive multi-year experience loading to Snowflake from Teradata, Oracle, SQL Server, DB2, Netezza, Greenplum, SAP HANA, MySQL, Postgres, Amazon Redshift, Azure Synapse, and Google BigData.
I built the Nexus Information Highway, which automates migrations between all databases and allows users to run federated queries that join data across all databases. Although most people know me for my books or training classes, my life's work has been to make databases communicate across all platforms.
Nexus automates the conversion of table structures and DDL automatically in seconds from any database to Snowflake. Nexus then builds the data load scripts that export from the source database to Snowflake. ETL teams choose the source systems, the target system of Snowflake, the database, and the schema where they want to load and then schedule the job.
Below is a picture of the architecture of Nexus, which took us 18 years to build, and uses desktop or laptop software (client software) coordinating with a Nexus Server. The Nexus client allows any user to load one or thousands of tables from anywhere that coordinates with Nexus Server software on a Windows Server or Windows VM on any cloud or on-premises.
You need the desktop or laptop to put the power of migration into the hands of users working in the office or from any remote location, but you need the Server to ensure secure data loading on a high-speed network.
Here is a video showing exactly how we automate data migration to Snowflake from any data source. https://www.youtube.com/watch?v=_9ABivdwJPk.
Now back to the blog! In the picture below, you can see that Snowflake uses Bulk Loading, Continuous Loading, and loading from an External and an Internal Stage. I have listed fundamental details you need to know depending on who is loading data, the data size, and where the data source resides.
In the picture below, you will see the philosophy of the Snowflake stages, but I was hoping you could notice the COPY INTO <table> command because that is the most popular way to load data into Snowflake. Once your data is in a stage, the COPY INTO command loads from the stage into Snowflake.
We are at the most important part of the Snowflake data loading blog post because you will want to know your staging options. Staging areas for Snowflake data ingestion are comparable to staging areas in a factory warehouse because staging areas are temporary storage until the data can store in a permanent location.
In the most primitive form, every Snowflake data warehouse user is automatically given a USER staging area. So, if a user needs to place data into a table, they have the ability.
Secondly, every table created in Snowflake automatically has an implied TABLE staging area. In other words, if you have a table named TERATOM, there will be a TERATOM staging area that you can use to load the table TERATOM. The staging area and the table will have the same name.
Next, you will have a NAMED stage where your ETL team can store many tables prepared for loading. Named stages allow an entire team of people to have access rights, so security among the group can be established. Named stages are how professional loading is done.
And finally, if you are loading data from AWS S3, Microsoft Azure Blog Storage, or Google Cloud Platform (GCP), you have a NAMED EXTERNAL STAGE, which you can automate to load data directly into Snowflake.
The user and table stages are automatic for each user and each table, but the Named Stage is created and has access rights. The key takeaway from the example below refers to tables at each stage.
The example below looks complicated, but it is easy. If you are loading using a User Stage, the tables in the stage will begin with an @~, but if you use a Table Stage, an @%, and a Name Stage, an @ sign only.
The information below describes the general details for the files used to load data. In addition, you will need to know the location of files, file format options, and file encoding techniques.
The picture below talks about the User Stage. We will attempt to use your user stage to load a table. I have a table you can load from my website (NexusDataServer.com), or you can follow along and load a table you want to create yourself. In the upcoming example, I am using a CSV flat file on my PC to load into Snowflake.
Download a CSV from my Website to Practice Loading
Go to my NexusDataServer.com website (not my CoffingDW.com website), and press on CLASSES in the top menu.
Locate the Snowflake Architecture and SQL class schedule and book outline, and press the COURSE OUTLINE button.
Like the picture below, you can press the DOWNLOAD FLAT FILE for the DATA LOAD Exercise button. The flat file you are downloading is a CSV file containing approximately 6,000 rows of sales information from a company selling boots, jackets, etc. You want to save this file on your PC or laptop so we can use it to load it into Snowflake. Please place the file in an easy place to remember because you will be selecting it later in this exercise to load into Snowflake.
Here is how you can load the table into Snowflake. You can load any table you desire by following the steps in the next few pictures. Next, go to the Snowflake browser and choose WORKSHEETS. I have a Nexus database, but you can use any database in Snowflake. I also have a schema named SQL_CLASS, but you can use any schema inside any database. Finally, use the exact CREATE TABLE statement below, create the PIVOT_TABLE_LOAD table, and press RUN (in green arrow). Congratulations, you have created an empty table residing on Snowflake. Now, let's get it loaded.
Like the picture below, change from WORKSHEETS at the top to DATABASES. You will see multiple databases. Click on the NEXUS database or the database in which you created the PIVOT_TABLE_LOAD table.
Once you click on your database (Nexus in this example), you will see the PIVOT_TABLE_LOAD table you previously created. Next, click on the PIVOT_TABLE_LOAD table name.
You can now press on the LOAD TABLE icon (up arrow). Notice the columns from the table are listed.
From the drop-down area in the picture below, you will choose which warehouse you want to load into, which is COMPUTE_WH for me. You might have a different data warehouse name, so choose from your list. You want to choose the same warehouse where your empty PIVOT_TABLE_LOAD was created.
Now that you have chosen the warehouse, you will choose the file from your computer to load. You will be loading the CSV file you saved for those downloaded from my website.
Choose SELECT FILES and navigate to where you put your CSV file. Then, after choosing the CSV file, hit NEXT on the bottom right.
Also, notice that you could have chosen to load from an external stage, which will be from AWS S3, Azure, or GCP.
Now that you have chosen your Warehouse and Source file, you will choose the FILE FORMAT. Hit the plus sign shown in the picture below.
You will need to give the file format a name, like TOMSFILEFORMAT, but any name will do. I also filled out an example that will work. When you are done, hit FINISH in the bottom right.
If you are ready, it is time to load the table. Notice I have TOMSFILEFORMAT in the drop-down. Press the LOAD button in the bottom right and breathe.
You will get a confirmation that the table has loaded 6112 rows! Great job! Now, go query the table on Snowflake.