Skip to content

Commit

Permalink
UK to US spelling corrections (#992)
Browse files Browse the repository at this point in the history
  • Loading branch information
jpiekos authored Aug 28, 2024
1 parent ec10271 commit e19d71f
Showing 1 changed file with 2 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@ title: "High Volume Optimizations"
sidebar_position: 20
---

For users with very high data volumes (>100M daily events) you may find that, even with our [optimised upserts](/docs/modeling-your-data/modeling-your-data-with-dbt/package-mechanics/optimized-upserts/index.md) and [incremental sessionisation](/docs/modeling-your-data/modeling-your-data-with-dbt/package-mechanics/incremental-processing/index.md), the package is still slow in any given run or the processing cost is high. There are a few specific things you can do to help optimize the package even further, which require various levels of effort on your part. In general we have taken the decision to not do these things as part of the "normal" deployment of our packages as there is a trade-off for each one and in the vast majority of use cases they aren't required.
For users with very high data volumes (>100M daily events) you may find that, even with our [optimized upserts](/docs/modeling-your-data/modeling-your-data-with-dbt/package-mechanics/optimized-upserts/index.md) and [incremental sessionization](/docs/modeling-your-data/modeling-your-data-with-dbt/package-mechanics/incremental-processing/index.md), the package is still slow in any given run or the processing cost is high. There are a few specific things you can do to help optimize the package even further, which require various levels of effort on your part. In general we have taken the decision to not do these things as part of the "normal" deployment of our packages as there is a trade-off for each one and in the vast majority of use cases they aren't required.

## Tune the incremental logic parameters
Our [incremental sessionisation](/docs/modeling-your-data/modeling-your-data-with-dbt/package-mechanics/incremental-processing/index.md) by default will look back and reprocess 6 hours of data; if you are confident there is no delay in data loading, or if you are using the `load_tstamp` as the `snowplow__session_timestamp` then there is much less need to have such a big lookback window, decreasing this to 1 hour will greatly reduce volumes of data for regular (~hourly) package runs.
Our [incremental sessionization](/docs/modeling-your-data/modeling-your-data-with-dbt/package-mechanics/incremental-processing/index.md) by default will look back and reprocess 6 hours of data; if you are confident there is no delay in data loading, or if you are using the `load_tstamp` as the `snowplow__session_timestamp` then there is much less need to have such a big lookback window, decreasing this to 1 hour will greatly reduce volumes of data for regular (~hourly) package runs.

Decreasing the backfill limit days will only impact on backfill runs so once models are up to date this will have little impact. Decreasing the upsert lookback days, or the session lookback days can have benefits but come at the risk of duplicates making it into the manifest or derived tables so do this at your own risk.

Expand Down

0 comments on commit e19d71f

Please sign in to comment.