From e19d71fe5ee5ff5aa80f53b28a7bf0443270d65b Mon Sep 17 00:00:00 2001 From: John Piekos Date: Wed, 28 Aug 2024 07:09:57 -0400 Subject: [PATCH] UK to US spelling corrections (#992) --- .../dbt-custom-models/high-volume-optimizations/index.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/modeling-your-data/modeling-your-data-with-dbt/dbt-custom-models/high-volume-optimizations/index.md b/docs/modeling-your-data/modeling-your-data-with-dbt/dbt-custom-models/high-volume-optimizations/index.md index 015b1e2035..6aa0798804 100644 --- a/docs/modeling-your-data/modeling-your-data-with-dbt/dbt-custom-models/high-volume-optimizations/index.md +++ b/docs/modeling-your-data/modeling-your-data-with-dbt/dbt-custom-models/high-volume-optimizations/index.md @@ -3,10 +3,10 @@ title: "High Volume Optimizations" sidebar_position: 20 --- -For users with very high data volumes (>100M daily events) you may find that, even with our [optimised upserts](/docs/modeling-your-data/modeling-your-data-with-dbt/package-mechanics/optimized-upserts/index.md) and [incremental sessionisation](/docs/modeling-your-data/modeling-your-data-with-dbt/package-mechanics/incremental-processing/index.md), the package is still slow in any given run or the processing cost is high. There are a few specific things you can do to help optimize the package even further, which require various levels of effort on your part. In general we have taken the decision to not do these things as part of the "normal" deployment of our packages as there is a trade-off for each one and in the vast majority of use cases they aren't required. +For users with very high data volumes (>100M daily events) you may find that, even with our [optimized upserts](/docs/modeling-your-data/modeling-your-data-with-dbt/package-mechanics/optimized-upserts/index.md) and [incremental sessionization](/docs/modeling-your-data/modeling-your-data-with-dbt/package-mechanics/incremental-processing/index.md), the package is still slow in any given run or the processing cost is high. There are a few specific things you can do to help optimize the package even further, which require various levels of effort on your part. In general we have taken the decision to not do these things as part of the "normal" deployment of our packages as there is a trade-off for each one and in the vast majority of use cases they aren't required. ## Tune the incremental logic parameters -Our [incremental sessionisation](/docs/modeling-your-data/modeling-your-data-with-dbt/package-mechanics/incremental-processing/index.md) by default will look back and reprocess 6 hours of data; if you are confident there is no delay in data loading, or if you are using the `load_tstamp` as the `snowplow__session_timestamp` then there is much less need to have such a big lookback window, decreasing this to 1 hour will greatly reduce volumes of data for regular (~hourly) package runs. +Our [incremental sessionization](/docs/modeling-your-data/modeling-your-data-with-dbt/package-mechanics/incremental-processing/index.md) by default will look back and reprocess 6 hours of data; if you are confident there is no delay in data loading, or if you are using the `load_tstamp` as the `snowplow__session_timestamp` then there is much less need to have such a big lookback window, decreasing this to 1 hour will greatly reduce volumes of data for regular (~hourly) package runs. Decreasing the backfill limit days will only impact on backfill runs so once models are up to date this will have little impact. Decreasing the upsert lookback days, or the session lookback days can have benefits but come at the risk of duplicates making it into the manifest or derived tables so do this at your own risk.