Following are the different types of tables available in Snowflake:
<> Permanent: It is a typical database table. It utilizes more space, and we can enable fail-safe and Time-travel periods. Permanent tables are useful for the data that requires a higher level of data recovery and data protection.
<> Transient: In Snowflake, we can create transient tables that exist until externally dropped and are accessible to all the users with relevant privileges.
<> Temporary: We use the temporary table to store transitory and non-permanent data. Temporary tables exist only in the session in which they were developed.
<> External: External tables are read-only, and we cannot perform DML operations on them. We can use external tables for join and query operations.
Snowflake endorses the following ETL tools:
For retrieving the activity history details for executing in an executing or scheduled state, query the “TASK_HISTORY” table function in the information schema.
The storage layer saves all the varied data, query results, and tables. The storage layer is developed on the extensible cloud blob storage. The highest elasticity, scalability, and capacity for data analytics and warehouse are ensured as we engineer the storage for scaling fully autonomous computing resources.
A big Snowflake warehouse contains eight nodes. When we run a query on the cluster, we execute the query through a similar number of knots like the parallel node.
Following are the advantages of the Snowflake Compression:
* Storage expenses are lesser than original cloud storage because of compression.
* No storage expenditure for on-disk caches.
* Approximately zero storage expenses for data sharing or data cloning.
Horizontal scaling increases concurrency when we have to support additional users. We can utilize auto-scaling and raise the number of virtual warehouses to support and satisfy user queries immediately.
Vertical Scaling reduces processing When we have large workloads, and if we want to maximize it and make it run rapidly, we can explore selecting a large virtual warehouse size.
There are four types of tables which can be created in Snowflake
1. Permanent:- It is the regular database table. Consumes space, Time-travel and fail-safe period can be enabled. All tables in snowflake by default are micro-partitioned, compressed, encrypted, and stored in columnar format. Permanent tables are designed for data that requires the highest level of data protection and recovery. The Tables persist until dropped.
2. Temporary:- A temporary table is used for storing non-permanent, transitory data (e.g. ETL data, session-specific data). Temporary tables only exist within the session in which they were created and persist only for the remainder of the session. As such, they are not visible to other users or sessions. Once the session ends, data stored in the table is purged completely from the system and, therefore, is not recoverable, either by the user who created the table or Snowflake.
3. Transient:- Snowflake supports creating transient tables that persist until explicitly dropped and are available to all users with the appropriate privileges. Transient tables are similar to permanent tables with the key difference that they do not have a Fail-safe period. As a result, transient tables are specifically designed for transitory data that needs to be maintained beyond each session (in contrast to temporary tables) but does not need the same level of data protection and recovery provided by permanent tables
4. External:- External tables are read-only, therefore no DML operations can be performed on them; however, external tables can be used for query and join operations. Views can be created against external tables.
MPP stands for Massively Parallel Processing, and is a database architecture successfully deployed by Teradata and Netezza. Unlike traditional Symmetric Multi-Processing (SMP) hardware which runs a number of CPUs in a single machine, the MPP architecture deploys a cluster of independently running machines, with data distributed across the system. In addition to the ability to handle massive data volumes, this means it supports a scale out architecture, as additional nodes can be added to the cluster, although this can take from hours to days to deploy.
EPP stands for Elastic Parallel Processing, and was pioneered by Snowflake Computing. This uses a number of independently running MPP clusters connected to a shared data pool. This architecture has the advantage that new clusters can be started within seconds, to elastically grow or shrink resources as needed.[/vc_column_text][vc_column_text css_animation=”left-to-right”]
Whenever we load the data into the Snowflake, it organizes the data into the compressed, columnar, and optimized format. Snowflake deals with storing the data that comprises data compression, organization, statistics, file size, and other properties associated with the data storage. All the data objects we store in the Snowflake are inaccessible and invisible. We can access the data objects by executing the SQL query operation through Snowflake.
All the data we enter into the Snowflake gets compacted systematically. Snowflake utilizes modern data compression algorithms for compressing and storing the data. Customers have to pay for the packed data, not the exact data.
In Snowflake, Zero-copy cloning is an implementation that enables us to generate a copy of our tables, databases, schemas without replicating the actual data. To carry out zero-copy in Snowflake, we have to use the keyword known as CLONE. Through this action, we can get the live data from the production and carry out multiple actions.
Both Snowflake and Star Schemas are identical, yet the difference exists in dimensions. In Snowflake, we normalise only a few dimensions, and in a star schema, we denormalise the logical dimensions into tables.
Snowflake is built on an AWS cloud data warehouse and is truly Saas offering. There is no software, hardware, ongoing maintenance, tuning, etc. needed to work with Snowflake.
Three main layers make the Snowflake architecture - database storage, query processing, and cloud services.
<> Data storage - In Snowflake, the stored data is reorganized into its internal optimized, columnar, and optimized format.
<> Query processing - Virtual warehouses process the queries in Snowflake.
<> Cloud services - This layer coordinates and handles all activities across the Snowflake. It provides the best results for Authentication, Metadata management, Infrastructure management, Access control, and Query parsing.
Time Travel will be available between 1 to 90 days, based on the Snowflake edition you are using or signing up for. There will be a cost associated with the time travel in Snowflake. There will be storage charges that will be incurred, which are specifically for maintaining the historical data during the failed Safe And The Time Travel periods.
There are three different types of caching in Snowflake. They are listed below.
1. Query results caching
2. Metadata cache
3. Virtual warehouse local district caching
Snowflake compression has got the below advantages.
1. Due to Compression, the storage costs will be less than the native cloud storage.
2. There will not be any storage cost for the disc caches.
3. Due to compression, there will be zero storage overhead for data cloning and data sharing.
Columnar database usually refers to the databases in which the data is organized in the form of column-level instead of using the conventional row level. It is observed that the column level operations will be faster when compared to the row-level operations and also utilize fewer number of resources when compared to the row-level database.
Cloning, also referred to as “zero-copy cloning” creates a copy of a database, schema or table, without duplication the associated storage files on disk
The computer layer is responsible for performing the data processing task within Snowflake, usually one or more clusters of the compute resources. The virtual Warehouses are responsible for retrieving the data from the storage layer to perform the query request.
Autoscaling is an advanced feature in Snowflake that starts and stops clusters based on the requirement to support workloads on the warehouse.
SQL stands for Structured Query Language and is the common language used for data communication. Within SQL, common operators are clubbed into DML (Data Manipulation Language) & DDL (Data Definition Language) to perform various statements such as SELECT, UPDATE, INSERT, CREATE, ALTER, DROP, etc.
Snowflake is a data warehouse platform and supports the standard version of SQL. Using SQL in Snowflake, we can perform the typical data warehousing operations like create, insert, alter, update, delete, etc.
A materialized view in Snowflake is a pre-computed data set derived from a query specification. As the data is pre-computed, it becomes far easier to query materialized view than a non-materialized view from the view’s base table.
In simple words, materialized views are designed to enhance the query performance for common and repetitive query patterns. Materialized Views are primary database objects and speedup projection, expensive aggregation, and selection operations for queries that run on larger data sets.
Amazon S3 is a storage service that offers high data availability and security. It provides a streamlined process for organizations of all sizes and industries to store their data.
Snowflake comes with a unique and powerful form of data partitioning called micro-partitioning. Data resided in all snowflake tables is automatically converted into micro partitions. In general Micro partitioning is performed on all Snowflake tables.
Snowflake supports different programming languages like Go, Java, .NET, Python, C, Node.js, etc.
Following are the major advantages of using Snowpipe:
* Real-time insights
* Ease of use
* Zero Management
Snowpipe is a continuous, and cost-effective service used to load data into Snowflake. The Snowpipe automatically loads the data from files once they are available on stage. This process simplifies the data loading process by loading data in micro-batches and makes data ready for analysis.
Below mentioned are the various connectors and drivers available in Snowflake:
* Snowflake Connector for Python
* Snowflake Connector for Kafka
* Snowflake Connector for Spark
* Go Snowflake Driver
* Node.js Driver
* JDBC Driver
* .NET Driver
* ODBC Driver
* PHP PDO Driver for Snowflake
Zero copy cloning is a snowflake implementation that allows you to create a copy of your schemas, tables, databases without copying the actual data. In order to perform zero-copy in Snowflake, you need to use a keyword called CLONE. With this option, you can get real-time data from production and perform multiple actions.
Following are the 3 types of data sharing types:
* Sharing Data between functional units.
* Sharing data between management units.
* Sharing data between geographically dispersed location
Data retention is one of the key components of Snowflake and the default data retention period for all snowflake accounts is 1 day (24 hours). This is a default feature and applicable for all Snowflake accounts.
Stored procedures enable us to create modular code that includes complex business logic by adding different SQL statements with procedure logic. Following are the steps to execute the Snowflake procedure:
* Execute the SQL statement
* Retrieve the results of the query.
* Retrieve the result set metadata.
It is a subset of columns in a table which helps us to co-locate the data inside the table. It is best suited to situations in which tables are extended; the sequence was not perfect because of DML.
Following syntax enables to create of a temporary table in the Snowflake
create temporary table mytable (id number, creation_date date);
Snowflake enables the creation of temporary tables for storing temporary data (not extended to a long period of time). It is storing temporary, transient (Session specific or ETL data).
The snowflake data warehouse supports creating tables either transient or temporary. These kinds of tables are used to store the data that does not to be stored or analyzed for a long period of time.
Various ways to access Snowflake Cloud Data Warehouse
* Web User Interface
* JDBC Drivers
* ODBC Drivers
* Python Libraries
* SnowSQL Command-line Client
Yes, Storage charges are incurred to maintain historical data during both the Time Travel and Fail-safe periods.