Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throttle error sends at the SDK\Server level to avoid error bursts. #2276

Open
twiggy opened this issue Dec 12, 2024 · 1 comment
Open

Throttle error sends at the SDK\Server level to avoid error bursts. #2276

twiggy opened this issue Dec 12, 2024 · 1 comment
Labels
awaiting feedback Awaiting a response from a customer. Will be automatically closed after approximately 2 weeks.

Comments

@twiggy
Copy link

twiggy commented Dec 12, 2024

Description

We are throttled by our per day quota occassionally. Generally we hit this quoto because a singular app has an intermittent or configuration error. A common occurance is being unable to connect to the database. Each request , retry, etc leads to a bug being thrown into bugsnag before we can react.

Describe the solution you'd like
We can generally work around this in our code, but it would be nice if the bugsnag client(s) could keep track of the number of errors it has sent and lock itself down at a threshold. In addition if it could communicate this to bugsnag's server so they could alert us, track it, etc vs hitting the billing quota. A bonus might be to have some sort of timing where the threshold resets every 10 minutes if the issue goes away. Another bonus might be just logging the error to disk vs sending it to your systems for compliance etc.

Describe alternatives you've considered
There is a callback at least in the nodejs SDK and we could keep a global counter in redis/text file/memory, but then we'd also have to deal with the rest of the request like logging the errors, keeping track of occurances etc. Our bug tracking vendor should do that :). There is an option to do this at the request level for the web browser, but do not see a way to this on our servers.

Additional context
360 days out of the year we are not even at 50% of our quota,so the days we hit it can seem a bit overly punative (we don't get credit for the light days\months ) since the client doesn't handle bursts of the same error over and over. Generally its some AWS intermiddent error etc. We also have several applications and the quota being at the account level isn't as flexible as we need.

@hannah-smartbear
Copy link

Hi @twiggy,

To give a bit of extra context on how BugSnag rate-limiting works, your quota is accrued constantly throughout the day. As events are received they reduce an 'event balance'. If that event balance reaches zero, rate limiting will begin i.e., BugSnag will stop saving new events.

As the event balance is being constantly accrued, even after rate limiting begins, some events will be stored as the balance goes positive. If the error spike subsides and the rate returns to normal, it's quite possible all events will then be saved for the rest of the day. This means that if you have a large influx of events in the morning, BugSnag will begin rate limiting at that point so that you will always have some coverage spread throughout the day.

While it sounds like you have a node JS app, for browser JS apps you can configure the maximum number of events that can be sent per page. To help us raise the correct feedback with our product team, does something to this effect for Node sound like the sort of thing you are looking for?

We can generally work around this in our code, but it would be nice if the bugsnag client(s) could keep track of the number of errors it has sent and lock itself down at a threshold

Please could you let us know a little more about how you are currently working around this in your code?

360 days out of the year we are not even at 50% of our quota,so the days we hit it can seem a bit overly punative (we don't get credit for the light days\months ) since the client doesn't handle bursts of the same error over and over.

BugSnag doesn't differentiate between events from different errors when rate limiting / throttle repeated events, because it can be a very important indicator for some developers to understand how widespread an issue is. However, you could implement an onError callback to throttle repeated events. For example, you could look to store an event in memory and use that to check against subsequent events, and if they’re the same you could return false to stop the new event from being reported.

Another thing to note here is that because BugSnag’s rate limiting is accrued on a rolling basis, in theory you can use up to twice your quota in a single day. In an extreme case, if no events are sent for 24 hours then the quota is accrued to the maximum. Then in the following 24 hours, the organisation can use up this fully accrued quota as well as the quota that accrues over this 24 hour period. While this is indeed an extreme case, the rolling quota may give you a bit of extra flexibility at times if you are not using much of your event quota on most days.

Additionally, instead of rate-limiting, you may also want to consider the option to ‘capture every event’, which allows you one-off occurrences of exceeding your daily quota. You can find more information about this option here. However please note that if you exceed your daily limit more than 3 times in a 30 day rolling window and you have selected to ‘capture every event’, we’ll automatically upgrade you to the next plan tier.

We also have several applications and the quota being at the account level isn't as flexible as we need.

We offer project-level rate-limiting for our Enterprise customers. To enquire about Enterprise pricing please provide contact details via the following link and our sales team would be happy to discuss your requirements: https://www.bugsnag.com/pricing-request

We also have an item in our backlog to add more powerful configuration options for event rate-limiting, for example “After any new event reaches X (e.g. 500) instances, sample by Y%”. Does this sound like something you’d be interested in? If so, let us know, and we can raise your interest in this with our product team.

@hannah-smartbear hannah-smartbear added the awaiting feedback Awaiting a response from a customer. Will be automatically closed after approximately 2 weeks. label Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting feedback Awaiting a response from a customer. Will be automatically closed after approximately 2 weeks.
Projects
None yet
Development

No branches or pull requests

2 participants