Tasks β How Scheduling Works Internally
β¨ Story Time β βI Donβt Want to Run Queries Manuallyββ
Meet Nina, a data engineer.
She wakes up every morning and manually runs the same ETL queries to populate reporting tables.
- Slow
- Error-prone
- Not scalable
Snowflake solves this with Tasks, its built-in scheduling and automation engine.
π§© What is a Snowflake Task?β
A Task in Snowflake:
- Automates SQL statements or stored procedures
- Can be scheduled at fixed intervals or triggered by another Task
- Integrates with Streams for incremental pipelines
Key benefit: Snowflake handles scheduling and execution behind the scenes β no external cron jobs needed.
π How Tasks Workβ
- Create a simple Task:
CREATE OR REPLACE TASK refresh_orders
WAREHOUSE = my_warehouse
SCHEDULE = 'USING CRON 0 * * * * UTC'
AS
INSERT INTO orders_delta
SELECT * FROM orders_stream;
- Runs every hour
- Consumes new data from
orders_stream - Populates
orders_deltaautomatically
- Start the Task:
ALTER TASK refresh_orders RESUME;
- Check Task history:
SELECT *
FROM SNOWFLAKE.ACCOUNT_USAGE.TASK_HISTORY
WHERE TASK_NAME = 'REFRESH_ORDERS';
π― Task Scheduling Typesβ
| Type | Description |
|---|---|
| Time-based (Cron) | Run Tasks at fixed intervals (minutes, hours, days) |
| Task Dependency (Chained Tasks) | Trigger one Task after another finishes |
| Stream-driven | Combine with Streams to process incremental data |
Example of Chained Tasks:
-- Task A
CREATE TASK load_raw ...
-- Task B dependent on Task A
CREATE TASK transform_data
AFTER load_raw
AS ...
This ensures pipelines execute in order automatically.
π§ͺ Real-World Use Caseβ
Scenario: E-commerce pipeline
- Stream tracks new orders β
orders_stream - Task 1: Load incremental orders into staging β runs every 15 mins
- Task 2: Transform staging β insert into
orders_deltaβ depends on Task 1 - Task 3: Update dashboards β depends on Task 2
Result: Fully automated, near real-time ETL pipeline β no manual intervention.
β‘ Internal Mechanics of Tasksβ
- Tasks run on Snowflake warehouses
- Automatic retries on failures (configurable)
- Execution logs available via
TASK_HISTORY - Tasks can be suspended/resumed anytime
- Chained tasks respect dependencies without external orchestration tools
Magic: Snowflake ensures Tasks run exactly once per scheduled interval and manages state internally.
π§ Best Practicesβ
- Use Tasks + Streams for incremental pipelines
- Monitor Task history and failures regularly
- Keep warehouses for Tasks small but adequate to reduce cost
- Chain Tasks carefully to avoid circular dependencies
- Resume Tasks after maintenance or deployment
π Summaryβ
- Tasks automate SQL queries and pipelines in Snowflake.
- They support time-based scheduling, chained execution, and stream-driven pipelines.
- Tasks remove the need for external schedulers or cron jobs.
- They integrate seamlessly with Streams and cloning for robust, automated ETL workflows.
- Understanding Tasks allows you to build fully automated, cost-efficient, and reliable pipelines in Snowflake.
π Next Topic
Materialized Views β When to Use & When to Avoid