1. Time travel, user can set and retrieve data going back to history, based on their snowflake edition and object or account specific time travel (day-data_retention_time_in_days) setup.
2. Fail safe , user does not have control on retrieval of data applicable only after time travel period is over . In this scenario only Snowflake support can help up til 7 days only. So if you have set time travel as 6 days (Assuming) then you yourself can retrieve db objects after the transaction execution + 6 days of time. from 7th to 13th days post transaction execution snowflake support can help to retrieve your objects. After 13th days objects can not be retrieved or restored back
Time travel is available between 1 to 90 days based on the Snowflake edition that you signup for.
All the data is compressed by default in Snowflake. Snowflake chooses the best compression algorithms and its not configurable by the end users. The best thing is snowflake charges the customers based on the final size of data after the compression is applied
Fail-safe is an advanced feature available in Snowflake to ensure data protection. This plays an important role in Snowflake’s data protection lifecycle. Fail-safe offers 7 days extra storage even after the time travel period is over.
A materialized view is a pre-computed dataset that is derived from query specification. Since the data is pre-computed, it becomes much easier to query the materialized view compared to a non-materialized view from the base table of the view. In plain language, materialized views are designed to improve the query performance of common and repetitive query models. Materialized views are the main database objects and accelerate projection, selection operations and expensive aggregation for queries that execute on larger datasets.
All the queries are executed on this processing layer. Snowflake utilizes "virtual warehouses for processing queries. Every virtual warehouse is a Massively Parallel Processing computing cluster consisting of several nodes assigned by snowflake’s cloud provider. In the query processing layer, every Virtual warehouse will not share its computational resources with other virtual warehouses. Each virtual warehouse is thus independent and does not have an impact on the other virtual warehouses in the event of a failover.
We can access the Snowflake data warehouse through:
Web User Interface
SnowSQL Command-line Client
SQL is entirely in SQL database or purely based on SQL database. It is a relational database system that stores the data in columns and is also compatible with other tools like Excel, tableau, etc. Snowflake has also come up with the query tool, which supports the multi-statement transactions and involves role-based security. These are some of the features that are usually expected in the SQL database.
Snowflake has different editions that will help customers based on the requirements. They are listed below:
1. Standard edition: this edition is best for beginners and is called Snowflake's introductory level offering. It provides unlimited access to the standard features to the users.
2. Enterprise edition: the Enterprise edition comes along with standard edition features and services and includes some of the additional features required for large-scale Enterprises.
3. business-critical edition: the business statical edition is also called an enterprise for sensitive data. The business-critical edition provides support to the high-level data protection in order to protect the sensitive data from meeting the organization's needs.
4. Virtual private Snowflake: virtual private Snowflake on VPS is responsible for providing high-level security for the organizations that deal with financial activities.
Snowflake Data sharing allows organizations to securely and immediately share their data. Secure data sharing enables sharing of the data between the accounts through Snowflake secure views, database tables.
In Snowflake, a Virtual warehouse is one or more clusters endorsing users to carry out operations like queries, data loading, and other DML operations. Virtual warehouses approve users with the necessary resources like temporary storage, CPU for performing various snowflake operations.
Fail-safe is a modern feature that exists in Snowflake to assure data security. Fail-safe plays a vital role in the data protection lifecycle of the Snowflake. Fail-safe provides seven days of additional storage even after the time travel period is completed.
Snowflake automatically generates metadata for files in internal stages or external stages. It is stored in virtual columns and can be queried using a standard SELECT statement.
Snowflake Time Travel tool allows us to access the past data at any moment in the specified period. Through this, we can see the data that we can change or delete. Through this tool, we can carry out the following tasks:
Restore the data-associated objects that may have lost unintentionally.
For examining the data utilization and changes done to the data in a specific time period.
Duplicating and backing up the data from the essential points in history.
Less disk space is used than in a denormalized model.
Better data quality (data is more structured, so data integrity problems are reduced)
Supported data platforms for Snowflake
* Microsoft Azure (Azure)
* mazon Web Services (AWS)
* Google Cloud Platform (GCP)
Columnar databases organize data at Column level instead of the conventional row level. All Column level operations will be much faster and consume less resources when compared to a row level relational database
Snowflake supports the most common standardized version of SQL, i.e., ANSI for powerful relational database querying.
Both Star and snowflake schemas are similar, but the difference is in dimensions. In a snowflake, at least a few dimensions are normalized, while in a star schema, logical dimensions are denormalized into tables.
Snowflake gives a cloud-based data storage and analytics service. It is termed “data warehouse-as-a-service,” which allows you to analyze and store data using cloud-based software and hardware.
Snowflake Computing provides secure and governed access to the complete data network and a core architecture to allow various types of data workloads, including a single platform to develop modern data applications.
The services layer is the brain of Snowflake. The services layer for Snowflake authenticates user sessions, provides management, enforces security functions, performs query compilation and optimization, and coordinates all transactions
Snowflake cloud data warehouse platform provides instant, secure, and governed access to the entire data network and a core architecture to enable various types of data workloads, including a single platform for developing modern data applications.
Unique features of the Snowflake data warehouse are listed below:
Database and Object Closing
Support for XML
Hive meta store integration
Supports geospatial data
Security and data protection
Search optimization service
Table streams on external tables and shared tables
Snowflake provides compute and storage separately, and storage cost is the same as storing the data. AWS addresses this issue by adding Redshift Spectrum, which allows data querying directly on S3, but not as seamless as Snowflake.
Top Snowflake competitors are AWS, Google, Microsoft, Cloudera, IBM, SAP, and Teradata.
The three layers of Snowflake architecture include:
Database storage – In Snowflake, it reorganizes into its internal optimized, columnar, and compressed format when data is loaded. This data is stored in cloud storage.
Query Processing – Queries are executed in the processing layer and are processed using “virtual warehouses.”
Cloud Services – It’s a collection of services that coordinate activities across Snowflake. Services included in this are authentication, metadata management, infrastructure management, access control, and Query parsing & optimization.
To retrieve the task history details for runs in a scheduled or executing state, query the TASK_HISTORY table function in the Information Schema.
Both Snowflake and Redshift offer on-demand pricing but differ in package features. Snowflake separates compute usage from storage in its pricing structure, while Redshift combines both.
Stored Procedures enable you to create modular code containing complex business logic by including different SQL statements with procedural logic.
<> To execute Snowflake Procedure, perform the following steps:
<> Execute a SQL statement.
<> Retrieve the results of a query
<> Retrieve result set metadata
Snowflake is a cloud-based Data Warehouse solution presented as a Saas (Software-as-a-Service) with full support for ANSI SQL.
ETL is an acronym for Extract, Transform, and Load. It’s a process used to extract data from one or more sources and load it to a specified data warehouse or database. The sources may include third-party apps, flat files, databases, and more.
Snowflake ETL means applying the ETL process to load data into the Snowflake database/data warehouse. That includes extracting data from data sources, making required transformations, and then loading it into Snowflake.
Yes, Snowflake supports stored procedures. A stored procedure is similar to a function; as such, it’s created once and used many times. Using the CREATE PROCEDURE command, you can make it, and with a CALL command, you can execute it.
Snowflake is designed for OLAP (Online Analytical Processing ) database system. Depending on the usage, you can use it for OLTP (Online Transaction Processing ) purposes as well.
There are a lot of reasons for Snowflake gaining momentum these days. Let’s look at them one by one
<> Snowflake serves a wide range of technology areas, including business intelligence, data integration, security & governance, and advanced analytics.
<> It provides cloud infrastructure and supports modern design architectures suitable for agile and dynamic usage trends.
<> Supports out-of-the-box features like data sharing, on-the-fly scalable compute, data cloning, separation of storage and compute, third-party tools access, etc.
<> Snowflake simplifies data processing.
<> Fits for many use cases – ODS with staged data, data lakes with raw data, and data marts/data warehouse with presentable and modeled data.
<> Snowflake offers scalable computing power.
Data security is the highest priority for all organizations. Snowflake follows industry-leading security standards to encrypt and secure the customer accounts and data stored in Snowflake.
It offers best-in-class key management features at no additional cost.
Following are the security measures used by the Snowflake to protect the customer data:
* Snowflake uses a managed key to automatically encrypt the data entered into it.
* Snowflake uses TLS to protect communication between clients and servers.
* It allows you to select a geographical location to store your data based on your cloud region.
The data partitioning that happens in Snowflake is called clustering. This usually specified the clustered keys on the table. The practice of managing the clustered data that is present in the table is called Re-clustering.
Following are the typical ways one can use to access:
* Web User Interface
* JDBC Drivers
* ODBC Drivers
* Python Libraries
* SnowSQL Command-line Client
There are many ways on how you can access the Snowflake data warehouse. They are:
1. Jdbc drivers
2. Odbc drivers
3. Python libraries
4. Web user interface
5. Snow SQL command-line client
Snowflake has a unique architecture based on a multi-cluster, patented, shared data specifically created for the cloud. The architecture of Snowflake includes its storage computer and the service layers. All these are integrated with each other logically and also scaled to be independent of each other.
Definitely. AWS glue presents a comprehensive managed environment that easily connects with Snowflake as a data warehouse service. These two solutions collectively enable you to handle data ingestion and transformation with more ease and flexibility.
Snowflake offers storage and computation independently, and storage cost is similar to data storage. AWS handles this aspect by inserting Redshift Spectrum, which enables data querying instantly on S3, yet not as continuous as Snowflake.
Snowflakes store the data in multiple micro partitions which are internally optimized and compressed. The data is stored in a columnar format in the cloud storage of Snowflake. The data objects stored by Snowflake cannot be accessed or visible to the users. By running SQL query operations on Snowflake, you can access them.
The services layer is the brain of Snowflake. The services layer for Snowflake authenticates user sessions, provides management, enforces security functions, performs query compilation and optimization, and coordinates all transactions.
Yes, Snowflake is an ETL tool. It’s a three-step process, which includes:
<> Extracts data from the source and creates data files. Data files support multiple data formats like JSON, CSV, XML, and more.
<> Loads data to an internal or external stage. Data can be staged in an internal, Microsoft Azure blob, Amazon S3 bucket, or Snowflake managed location.
<> Data is copied into a Snowflake database table using the COPY INTO command.
All data processing tasks within Snowflake are performed by virtual warehouses, which are one or more clusters of compute resources. When performing a query, virtual warehouses retrieve the minimum data required from the storage layer to fullfil the query requests
The storage layer stores all the diverse data, tables and query results in Snowflake. The Storage Layer is built on scalable cloud blob storage (uses the storage system of AWS, GCP or Azure). Maximum scalability, elasticity, and performance capacity for data warehousing and analytics are assured since the storage layer is engineered to scale completely independent of compute resources
Snowflake is built on a patented, multi-cluster, shared data architecture created for the cloud. Snowflake architecture is comprised of storage, compute, and services layers that are logically integrated but scale infinitely and independent from one another
Snowflake is referred to as an ETL tool that consists of three steps, hence called a three-step process. It includes the following:
1. Extract: the first step includes the extraction of the data from the source and will be creating the data files. The data files that are created will be able to support multiple data formats like XML, CSV, JSON, etc.
2. Load: this step includes the loading of the data to an external or internal stage. The data Staging will be either in an internal Amazon S3 bucket, Microsoft Azure Blob or any Snowflake managed location.
3. Copy: in this step, the data will be copied into the Snowflake database table by using the copy into the command.
Snowflake has come up with a unique architecture that is specifically built on Amazon Web Services cloud data warehouse. Snowflake does not require any software or hardware or maintenance using extra, which is specifically required by other platforms. Snowflake architecture consists of three different layers, which are data storage, query processing and cloud services. Each layer has its functionality. Let me give you a brief explanation of each layer in the Snowflake architecture.
1. Data storage: in this layer, the stored data is organized into columnar, internal optimized format.
2. Query processing: in query processing, the virtual warehouses will be processing the queries that are present in the Snowflake.
3. Cloud services: the cloud services layer is responsible for coordinating and handling all the related activities across the Snowflake. It is also accountable and provides the best results in infrastructure management, metadata management, query parsing, authentication and access control.
A Snowflake cloud data warehouse is referred to as the analytical data warehouse built on the new SQL database engine. Snowflake utilizes a unique architecture that is specifically built for the cloud. It is implemented as the software as a service and is first available on the Amazon Web Services to load and perform the analysis of the large volumes of data. Snowflake has come up with the most dynamic and remarkable feature, which has the capability to spin any number of virtual warehouses, which means that the user will be allowed to operate any number of workloads that will be independent against the same data without any risk or issues.