Snowflake Unstructured Data Part 1
Contents from this blog come from the Snowflake Architecture and SQL book by Tom Coffing and David Cook.
We should start with JSON and the Internet of Things for this discussion.
JSON is available from many interfaces and many devices. JSON is fast becoming the preferred alternative to XML. Consequently, dozens of JSON data sources are available inside nearly all major corporations, including cell phones, all newer applications written in 50 languages, and all JSON emitting browsers. In addition, many sensors emit JSON, and the ones that do not can easily place a small program attached to the incoming data stream to convert the raw format to JSON (GitHub.com has plenty of sensor transformers to JSON for free).
Why use JSON? JSON is a lightweight, easy-to-understand standard quickly moved around the internet to represent data.
There are six objects listed below to represent JSON types.
I want to explain JSON syntax. JSON is easy to understand once you break it down. Below, we have an example of an object containing three value pairs. Think of each string as a relational column and each column value.
I want to build a complex JSON object, but let’s start simple. Below is an example of an array with four objects separated by a comma, consisting of three value pairs.
Let’s continue to add and build upon our complex object. Below, we have an example of a more complex object. Remember, a value pair has two parts: a string and a value. The string portion is always literal in double-quotes. In the example below, the value is an array.
Here is how to create a table in Snowflake Below for unstructured data, represented with a variant data type. We named the column SRC, but we could have named it anything. First, we create a table with a single column (src) with a variant data type. We then do an INSERT INTO using the SELECT PARSE_JSON($1) syntax. PARSE_JSON($1) is a function, and the $1 represents the first column (src). As you can see, the table creation and the table inserts are successful.
Below, we query the JSON table WEB_VISITORS. Again, you might need to hover over the answer set to see the results.
You can perform a flatten function to display unstructured rows like a typical table. FLATTEN is a table function that produces a lateral view of a VARIANT, OBJECT, or ARRAY column. FLATTEN returns a row for each object, and the LATERAL modifier joins the data with any information outside the object. Below, we are flattening and bringing back the first and last names, plus the age of the web visitors. Notice that we cast the VARIANT output to string or integer values.
A wonderful fundamental is creating a view on a flattened table so that business users can see the data more relationally. In addition, users can use it as a typical table once you create a view with the flattened JSON data. For example, above, we are running two different queries, and the data return as if this was a typical table, including an ORDER BY statement.
Part two of our Snowflake unstructured data will come tomorrow. Understanding unstructured data gives you a big advantage because you are witnessing the future of data. At first, it is difficult to understand, but soon you will have it down easily, and your co-workers will be amazed at your skill.
The Nexus Information Highway above represents the incredible achievement of automated data migration between all systems listed above. Yes, every system can migrate to Snowflake using Nexus.
If you like how Tom Coffing presents and teaches, you will be amazed at his Nexus Server software for migrations to Snowflake. Check out the best way to automate the movement from all databases to Snowflake here.
Your company can hire Tom Coffing to teach classes on Snowflake. All classes are customized, but Tom is currently doing a 5-day Snowflake boot camp that teaches the Snowflake architecture, tricks and tips, and every SQL command. Check out the agenda here. https://www.nexusdataserver.com/snowflakeclass.
Above, the Nexus performs a federated query joining a Yellowbrick, Snowflake, Teradata, and Oracle table in a single query.
The Snowflake query browser is light years behind the 18 years of development of the Nexus Query Chameleon. And the best part is that Nexus can query all systems in your enterprise and has a Super Join Builder that builds the SQL automatically as the user's point-and-click. And the most amazing part is that Nexus can easily join Snowflake tables with Excel, Access, and tables and views from all other database platforms. Check out just some of the amazing features of Nexus here.