Guides

Snowflake

🚧

Prerequisites

You would need access to the ACCOUNTADMIN role to enable this share.

Setting up tables

Data in Snowflake typically resides in tables. When sharing data with Toplyne, the best practice is to create separate tables for event data and profile data partitioned on a time column.

How does Toplyne sync work?

We prefer to ingest your data from Snowflake in an incremental fashion. You must add a monotonically increasing timestamp column in your tables. We will use this column to identify all the new data added to the tables and will only read that much data for syncing.

You can add a column called UPDATED_AT with timestamp data type to your table, and this column will highlight when the row in question was added to the table.

If you don't share such a timestamp column, we'll do a daily full scan of your table and sync it entirely.

Event data table

This table will capture a user's action; it will help answer the question, “What has a user done on your product, and when.”

Rows

Each row in this table represents an event triggered by a user.

Columns

  • USER_ID (Required): Key to identify which user has performed the event
  • ACCOUNT_ID (required for account-level analytics): Key to identify which account the particular user belongs to
  • EVENT_NAME (Required): Name of the event
  • TIMESTAMP (Required): Timestamp at which the event has occurred
  • UPDATED_AT (Required): UTC timestamp when the row has been updated or added (not the event timestamp)
  • EVENT_PROPERTIES (Optional): Event properties, typically shared as a JSON
    with key/value pairs

Partition

Partition this table on the TIMESTAMP column.

Sample Event Table

USER_IDACCOUNT_IDEVENT_NAMETIMESTAMPUPDATED_ATEVENT_PROPERTIES
129823Payment_Initiated2022-10-26 07:08:342023-01-09 17:08:34{
"paymentid": 123,
"amount" : 200,
}
156777Trial_Ended2022-11-01 20:01:142023-01-09 17:08:34{
"accountid": 77
}

Profile data table

This table will capture any profile information a user or an account has; it will help answer the question, “Who is this user or account?”.

Rows

Each row in this table is a unique entity.

Columns

  • USER_ID/ACCOUNT_ID (Required): Key to identify which entities' properties are listed in the row.
  • USER_CREATION_TIME/ ACCOUNT_CREATION_TIME (Optional): To identify since when the user or account has been active.
  • UPDATED_AT: UTC timestamp when the row has been updated or added (not the event timestamp)
  • USER_PROPERTIES/ACCOUNT_PROPERTIES: Each subsequent column is a profile key; against it, you will have its value. For example, you may include the email address, geographical location, account name, current billing plan they are on, etc.

Partition

Partition this table on the UPDATED_AT column.

Sample User Table

USER_IDUSER_CREATION_TIMEUPDATED_ATEmailCountryCurrent Plan
12962022-10-26 07:08:342022-11-26 07:08:34[email protected]UKAnnual Plan
26762022-11-01 20:01:142022-12-01 20:01:14_[email protected]UKMonthly Plan

Sample Account Table

ACCOUNT_IDACCOUNT_CREATION_TIMEUPDATED_ATAccountNameCountryCurrent Plan
232022-10-26 07:08:342022-11-26 07:08:34QueenUKAnnual Plan
562022-11-01 20:01:142022-11-26 07:08:34BeetlesUKMonthly Plan

📘

Important information regarding Snowflake Share

  1. Snowflake sharing works seamlessly as long as the tables and schemas that are being shared are not recreated after the share has been created.
  2. We are completely agnostic to change in data shape.
  3. However, to seamlessly deal with data backfilling, we leverage Snowflake's CDC using STREAMs.
  4. So we would want you to enable change tracking on all the tables that you share with us.
    For reference: alter table <table> set change_tracking=true;
  5. As far updates to data are concerned, we classify the data into two categories:
    1. Events data: e.g., events data coming from clickstream. We don't handle updates in these datasets, so we would prefer you notify us if you do any significant updates to such data.
    2. Profiles data: e.g., Data regarding users/accounts/teams/workspaces, etc. This data will include some personal information like, device info, country info, metadata about an entity, etc. You can keep updating such data in your source and we seamlessly handle updates to these data points.

Step-by-step guide

  1. Login to your Snowflake account on the Classic Console. If you are on the newer Snowsight, navigate to the Classic Console by clicking the Classic Console option.
  1. In the upper right corner, click on your email account and switch the role to ACCOUNTADMIN. Reach out to your admin if you can't see the option.
  1. On the upper left side, click on the Shares icon.
  1. Select the Outbound tab.

  2. Click on the Create icon next to the Outbound tab button.

  1. Fill in the Secure Share Name field (a text field, so any name should work) and select the database you want to share.

  2. After this, click on Select Tables & Secure Views.

  3. You can access the select tables you want to share and hit Apply.

  4. Click on Create at the bottom right.

  1. Click On Next: Add Consumers on the bottom right to add the account you want to share the data with.
  1. Under Account Type, select Full (don’t worry - this will grant read-only access!) and enter the appropriate account locator based on your region. If you cannot see your region here, please reach out to your Toplyne POC or to [email protected].
RegionSnowflake Account Locator
us-east-1jfa01628
us-east-2yj35120
us-west-2xta40177
eu-west-3ux59952
eu-west-1ob64984
  1. Click on Add on the bottom right - and we’re ready!

❗️

Sharing tables that are recreated every time

There are two ways we can retain the share with Toplyne even after a table is recreated.

  1. Adding a query on the DAGs that generate the tables to grant access each time.

    -- To fetch the name of the share that you have created with Toplyne:
    SHOW SHARES;
    
    -- Query to be included during table generation to retain the shared access:
    GRANT SELECT ON TABLE <table_name> TO SHARE <share_name>;
    
  2. Alternatively, you can create a Snowflake task to automate this.

    -- Create or Replace the task if it already exists
    CREATE OR REPLACE TASK <your_task_name>
      WAREHOUSE = <dev> -- Replace with the name of your warehouse
      SCHEDULE = '360 m'
      COMMENT = 'Task to grant SELECT permission on table to share every 6 hours with Toplyne'
    AS
    -- SQL statement to grant SELECT permission on Toplyne shared table
    GRANT SELECT ON TABLE <table_name> TO SHARE <share_name>;
    
    ALTER TASK <your_task_name> RESUME;