In this lesson we will:

  • Learn about the Snowflake billing model and credit system;
  • Explain the various cost visiblity and cost control features of Snowflake;
  • Share some high level information about how minimise and control costs.

Consumption Based Pricing

Snowflake offers a genuine consumption based pricing model where you pay only for the services that you use.

There are three classes of resource which you will use and pay for:

Compute

When you execute queries or updates, you need processing power to carry out the requests. You will pay for this compute on a per second basis, with the actual rate per second being based on the amount of compute you need - e.g. an XL sized server for 180 seconds.

In practice, the compute is usually the largest cost in running a Snowflake deployment.

Storage

The data in your Snowflake instance will need to be saved on some persistent store which exists even when you are not running any compute resources. You will pay for this storage on a per byte basis - e.g. 108 GB of data.

Cloud Services

In order to work with your account, you will need to carry out various activities such as login, adminster users and run scheduled jobs. These are billed by Snowflake as a seperate line item.

Credit System

As discussed in a previous lesson, Snowflake is billed using a credit system. As you use Snowflake you will be using credits for your compute, storage and cloud service usage. These will then be billed at a per credit cost at month end.

The per credit cost is a function of the tier you are on, the cloud provider and the cloud provider region.

For instance, at the time of writing, an Enterprise user with their Snowflake instance deployed in AWS us-east region would be paying the following per credit cost:

TierCredit Cost
Standard$2.60
Enterprise$3.90
Business Critical$5.20

Whereas the same deployment in Google Cloud Platform London region would cost:

SyntaxDescription
Standard$2.70
Enterprise$4.00
Business Critical$5.40

This making GCP slightly more expensive with regards to your Snowflake credits if you need to host in that environment.

Virtual Warehouse Costs

Your compute costs will likely be the largest part of your Snowflake deployment. This is because storage is cheap, and because Snowflake pass along their storage costs without making a significant profit.

Snowflake has a number of different tiers of data warehouse which are structured on a T-Shirt size basis, each of which has a credit-per-hour associated cost.

SyntaxCredits Per Hour
X-Small1
Small2
Medium4
Large8
X-Large16
2X-Large32
3X-Large64
4X-Large128
5X-Large256
6X-Large512

Combining the two tables above, we can work out our costs per hour of running a 2X-Large warehouse in AWS eu-west:

TierCredit CostWarehouse SizeWarehouse Credits Per HourWarehouse Cost Per Hour
Standard$2.602X-Large3283.20
Enterprise$3.902X-Large32124.80
Business Critical$5.202X-Large32166.40

And the same for GCP:

TierCredit CostWarehouse SizeWarehouse Credits Per HourWarehouse Cost Per Hour
Standard$2.702X-Large3286.40
Enterprise$4.002X-Large32128.00
Business Critical$5.402X-Large32172.80

This shows that your actual costs are a function of the choices you make and your actual consumpion.

This can be challenging for businesses, as their final monthly price could be unpredictable depending on how much your users use Snowflake in reality. This is a common risk of consumption based pricing models.

Bespoke Arrangements

For larger deployments, it is possible to achieve discounts through direct negotiation with Snowflake. This will likely require pre-payment or agreements for compute and storage consumption.

Snowflake Tables For Billing

Snowflake makes available a lot of information about accured bills through internal tables. We will now take a quick tour of these to highlight the main ones:

WAREHOUSE_METERING_HISTORY

select * from WAREHOUSE_METERING_HISTORY

The WAREHOUSE_METERING_HISTORY view shows a record of when warehouses were started and stopped, and how many credits were used for compute and cloud services in the period.

Note that this table can take up to 3 hours to be updated.

QUERY_HISTORY

select * from QUERY_HISTORY

The QUERY_HISTORY table can be joined with WAREHOUSE_METERING_HISTORY table to understand which queries are ultimately consuming most compute and therefore generating the highest costs.

DATABASE_STORAGE_USAGE_HISTORY

select * from DATABASE_STORAGE_USAGE_HISTORY

The DATABASE_STORAGE_USAGE_HISTORY gives us information about storage costs.

Controlling Costs

Snowflake provides several specific features that can help you control costs. These features include:

  • Multi-cluster warehouse: Snowflake allows you to create multiple clusters within a single warehouse, each with its own set of compute resources. This allows you to scale up or down your compute resources as needed, and to allocate resources to specific workloads or users. By doing so, you can avoid over-provisioning resources and paying for more compute resources than you actually need.

  • Automatic scaling: Snowflake automatically scales compute resources up or down based on your workload. This means that you don't need to manually provision or de-provision compute resources to handle spikes or dips in workload. Automatic scaling allows you to optimize resource utilization and avoid paying for idle resources.

  • Resource monitoring: Snowflake provides detailed resource monitoring and usage reporting, which allows you to track your resource usage and costs over time. You can use this information to identify areas where you can optimize your resource usage and reduce costs.

  • Time travel: Snowflake's time travel feature allows you to access historical data from your data warehouse. This feature can help you avoid the need to create and maintain separate data copies for historical analysis, which can save you storage costs.

  • Automatic query optimization: Snowflake's query optimization engine automatically optimizes queries to minimize resource usage. This can help you reduce query runtime and avoid over-provisioning compute resources.

  • Data sharing: Snowflake's data sharing feature allows you to share data securely with other Snowflake accounts, without the need for data replication. This can help you avoid data duplication and reduce storage costs.

Spark provides several built-in functions for performing aggregation queries on your data. Here are some examples of how to use Spark to perform common aggregation queries like group by, count, sum, and others:

Group by and count:

sql Copy code from pyspark.sql.functions import col

Read data from a CSV file

df = spark.read.csv("path/to/file.csv", header=True, inferSchema=True)

Group by a column and count the number of occurrences in each group

df.groupBy("column_name").count().show() Group by and sum:

sql Copy code

Group by a column and sum the values in another column

df.groupBy("column_name").agg({"other_column": "sum"}).show() Group by and average:

sql Copy code

Group by a column and calculate the average of the values in another column

df.groupBy("column_name").agg({"other_column": "avg"}).show() Group by multiple columns:

sql Copy code

Group by multiple columns and count the number of occurrences in each group

df.groupBy("column_name_1", "column_name_2").count().show() Using SQL-like syntax:

sql Copy code

Register the DataFrame as a temporary view

df.createOrReplaceTempView("my_table")

Perform a group by query using SQL-like syntax

spark.sql("SELECT column_name, COUNT(*) FROM my_table GROUP BY column_name").show()

Next Lesson:
06

Snowflake Tasks

In this lesson we will learn about Snowflake tasks, which allow us to run database jobs on a schedule.

0h 15m




Work With The Experts In Real-Time Analytics & AI

we help enterprise organisations deploy powerful real-time Data, Analytics and AI solutions based on ClickHouse, the worlds fastest open-source database.

Join our mailing list for regular insights:

We help enterprise organisations deploy advanced data, analytics and AI enabled systems based on modern cloud-native technology.

© 2024 Ensemble. All Rights Reserved.