• tomcoffing

Cloud Showdown – Snowflake vs. Databricks

Updated: Sep 16

Snowflake and Databricks, two brilliant companies with different approaches, have been catalysts for paradigm shifts for cloud data – and many experts believe they will establish what might be the next great rivalry in tech.

Snowflake's vision is to take advantage of the scalable CPU and Storage capabilities public cloud providers provide. Databricks' vision is to take advantage of the Apache Spark Open-Source technology and add to it for monetary gains.

Snowflake has the most extensive and expensive software IPO in history, skyrocketing further to an evaluation of 70 billion dollars. Snowflake helps companies move their data into structured virtual warehouses, where customers run granular business analytics.

Databricks has a valuation of $28 billion. Founded in 2013, Databricks is an AI-enabled, open-source data analytics platform company that takes massive amounts of enterprise data and does machine learning and data science to make predictions.

Over 5,000 companies use Databrick's open-source lakehouse architecture to process, engineer and analyze unstructured and semi-structured data.

Let me first highlight Snowflake, and then I will write about Databricks.


Snowflake understands the eight wonders of the big data world:


1) Advertising in the computer industry pays off year after year.

2) The capabilities of scaling compute and storage on public clouds.

3) Customer retention is gold in the computer industry.

4) High-powered investors are critical.

5) A large IPO cuts legacy data like a knife.

6) Sharing data between customers and providers is revolutionary.

7) Standard SQL that makes migration of applications easy is essential.

8) Hiring a proven CEO can make all of the difference.


Advertising in the computer industry pays off year after year.


If you are an automobile manufacturer, spend one million dollars on advertising, and make two million dollars, you have profited by a million dollars. However, if you pay nothing for advertising and marketing the following year, you will make less than one hundred thousand dollars because you merely earn maintenance fees for the cars you sold last year.


If Snowflake spends one million dollars on advertising and makes two million dollars, they also profit by one million dollars. However, suppose Snowflake pays nothing on marketing and advertising the following year. In that case, they make at least 1.5 million dollars because the same customers return and upgrade to larger systems with more data.

The capabilities of scaling compute and storage on public clouds.


The Snowflake architecture allows customers to separate storage and compute, both entirely scalable. Thus, Snowflake concludes that scalability in storage and computation needs to go from the smallest to the most massive scalability. And there is no better place to make that happen than the public clouds of Amazon, Microsoft, and Google.


Customer retention is gold in the computer industry.


Snowflake understands that customer retention on the cloud is pure gold, so their goal with every customer is to make them a long-time customer and then tell all their friends in the industry about the experience.

It is hard to believe it, but the average customer only pays Snowflake $65,000 annually. Still, the number of customers is essential to investors, and customer retention is high because most customers consider it challenging to migrate between systems.


High-powered investors are critical.


The Snowflake investors of Salesforce and Berkshire Hathaway bring enormous muscle to the equation. For example, Salesforce purchased Tableau for 16 billion, eye candy for business users who want to combine analytics with graphs and charts.


The Oracle from Omaha, Warren Buffet, does not invest in companies that do an IPO. The last company Warren Buffet invested in during their IPO was Ford Motor Company in 1956! Warren Buffet's Berkshire Hathaway made 800 million dollars, more than three-quarters of a billion on the Snowflake IPO's first day.

An extensive IPO cuts legacy data like a knife.


Legacy database vendors such as Teradata have spent four decades as industry data warehouse leaders. Teradata argues that its large customers will not move to Snowflake; however, Teradata lost its largest customer to Snowflake. Capital One dropped Teradata and is now Snowflake's largest customer, paying them a whopping price. Twenty-nine million dollars annually, making up around 11% of Snowflake's revenue.


Sharing data between customers and providers is revolutionary.


Snowflake is introducing the sharing of data among Snowflake customers and providers. Some provide data such as weather, crime, and social media; the list goes on and on. Snowflake customers can purchase and query this data with their internal data. What an advantage. Information sharing is a future trend that will continue to add to the influx of valuable data.


Standard SQL that makes migration of applications easy is essential.

Something that nobody else will mention to you is that Snowflake runs on standard SQL. Still, the SQL they accept is more robust by far than any other database. In addition, almost every query from every other system works on Snowflake, so applications' migration is virtually guaranteed.


How do I know about the Snowflake SQL? I have written 30 books on SQL and just wrote a 650-page book on Snowflake SQL.


Hiring a proven CEO can make all of the difference.


Frank Slootman is the CEO and one of the biggest IPO giants in the industry. Frank is an industry superstar with an incredible track record of taking companies public and growing revenue at outstanding paces.

Frank took a company from 2003-2009 with no customers or revenue (startup data domain) to an IPO in 2007 and over 1 billion dollars in sales, which EMC acquired for 2.4 billion dollars in 2009. Frank also became CEO of Service Now from 2011 to 2017, took them through IPO, and grew their business by over 270%.


Databricks takes a different approach.


Databricks is a data and AI company that is venture-backed and headquartered in San Francisco, with offices around the globe—founded by the original creators of Apache Spark, Delta Lake, and MLflow. Databricks also has a genius at the helm in their co-founder Ali Ghodsi. Here is how Ali describes the early days at Databricks.


"There is an incredible computer science professor at Berkeley, Dave Patterson, who just opened up labs and office space to students and said let's brainstorm and collaborate. So we had computer scientists, engineers, mathematicians, and ML experts working together to see what we could create, and out of that came Apache Spark.


The earliest version was built to make it faster to load huge datasets into memory. Spark forms the foundation of much of what we've built at Databricks. When I co-founded Databricks in 2013, I was deeply involved in product creation and engineering because I had worked on core technologies since 2009 at Berkeley."


What I like about Databricks are the infinite ways customers use it to take massive amounts of data and perform machine learning and data science to predict things.


For example, Regeneron uses machine learning algorithms to detect the gene in DNA responsible for chronic liver disease, and then they develop a drug targeting that particular gene.


Comcast uses Databricks to make their voice-activated remote controls work. When you talk to the remote control, that voice data goes into the cloud for Databricks to process using machine learning, and it figures out what you said and directs the TV to the right channel.


Hospitals use Databricks to get a real-time picture of how full their ERs are to redirect patients in ambulances to different hospitals with space.


Financial services firms use Databricks to analyze satellite data to predict which global sectors and companies are worthy of investment.

Shell uses Databricks to monitor sensor data from 200 million valves to predict if any will break. They can replace them ahead of time to keep systems running, save money, and ensure employees stay safe.


Databricks has a unique model called SaaS open source. Databricks is only SaaS, so they continually update the product in the background. Databricks charges customers for the software's development, running, operating, and hosting.


Databricks also contributes constantly to the open-source version of Databricks, which is entirely free. Still, the Databrick SaaS offering has many valuable features for enterprises, such as reliability, availability, and scalability. In addition, Databricks has always been SaaS and only SaaS, with no crutch of on-prem revenue, so they are excellent at delivering everything in the cloud from day one.


Who will win the cloud showdown? Both opponents have been watching the other. Snowflake invests heavily in machine learning and predictive analytics, and Databricks invests in structured data.


Conclusion


I can't entirely agree with most experts that Snowflake and Databrick will become the next big tech rivalry. Neither company has an on-premise solution, which puts them at a disadvantage. Both companies positioning as cloud giants are worthy, but I see things differently. I have spent 15 years developing desktop, server, and cloud software called Nexus, which provides three things:


1) A brilliant query tool that even writes the SQL for users that works across all platforms. Users have one tool where they can simultaneously query every system while sharing and coordinating among themselves.


2) Desktop and Server software that coordinate automatically transforms and migrates data between any two systems so easily that anyone can perform data migrations or move tables to their sandbox.


3) With Nexus as a brilliant cross-system virtualization tool, it makes joining tables across platforms transparent and seamless. As a result, query federation becomes the solution and motivation for storing data where it makes sense.


So, there will be no significant showdown because customers will want data across many clouds and hundreds of platforms as more data becomes available to customers.


Should you try Snowflake and Databricks? Absolutely, but you should also have data on Yellowbrick, MySQL, Postgres, BigQuery, Azure Synapse, Greenplum, SAP Hana, Amazon Redshift, Aurora, Teradata, Oracle, DB2, Hadoop, Microsoft Access, SQLite, and SQL Server.


The Nexus can instantly migrate data between these systems or join data from any combination of these platforms. Now that is the future of technology!

Databases like Snowflake and Databricks play a vital role, but the future of the cloud resides in desktop and server software like Nexus.


There are more than two fish in the sea, but software allowing companies to take advantage of all the beautiful fish in the database species is the key to good fishing.




Thank you,


Tom



Tom Coffing CEO, Coffing Data Warehousing Direct: 513 300-0341 Website: CoffingDW.com Email: Tom.Coffing@CoffingDW.com





207 views0 comments