-
Notifications
You must be signed in to change notification settings - Fork 339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: When consumer fails to connect to SQS due to connectivity instant retry occurs #490
Comments
Hey, yeah we can't react to the code One possible way of detecting might be to look for I don't think we can't ultimately support every possible use case with this library, and we're not really looking to, it's main aim is to be simple boilerplate that we used internally and shared externally. |
Yeah 100% agree and appreciate your time on this so far thanks, with high traffic that could cause a big delay if the odd message fails for a genuine reason. Totally understand not being able to support every use case although surely a malformed url or no connection are 2 very common cases especially if deploying to cloud where a malformed security group or bad env variable could then cause high cpu/memory and lots of potentially expensive logging to occur without people realising straight away. Especially since the loop can be every 1-2ms which is a lot of events. In the first draft PR I raised I added another configurable option that a timeout happens on any errors which whilst not perfect I'd say is a better alternative than having the potential for the issues mentioned above. Would be keen to know your opinion on adding something like that back in, happy to do the work but want to check its the way you'd like to go first. |
I also get the same Error when launch ECS task on EC2. It works well with AWS Fargate |
In my case, AWS Fargate can imply Region. But EC2 need specific the Region. |
Describe the bug
Apologies for raising another issue though in this case the key error case hasn't been resolved despite PR #485 which adds additional support for SQS Error types, currently if the consumer cannot access the queue due to internet connectivity or an invalid URL being passed in, it will throw an error that looks like the one below which won't be caught by
isConnectionError
and cause an infinite loop.This is why in the previous PR I added code
SQSError
to the potential PR to resolve it or you could use codeError
in the current version as its such a generic message output from the AWS SDK.If the consumer cannot connect to SQS due to no internet connection or the unlikely event SQS is down, if pollingWaitTimeMs is 0 (set by default), it will instantly retry the connection causing a continual flurry of requests. There is currently no option to resolve this other than by setting pollingWaitTimeMs to a higher value however this also isn't ideal since it adds a longer wait between polls too. Currently authentication errors are caught and can have a back-off configured with authenticationErrorTimeout though this doesn't exist for general connection errors.
Your minimal, reproducible example
https://stackblitz.com/edit/github-pwpmkg?file=index.js
Steps to reproduce
Method 1 (Error)
node index.js
Method 2 (TimeoutError)
node index.js
Expected behavior
As a user I expected these retries to be throttled after a failure but instead I see requests continuously being made and error messages being sent to the console at an excessive rate.
How often does this bug happen?
Every time
Screenshots or Videos
Platform
Occurs on all platforms and tested with the issue happening on Node versions 16-20.
Package version
10.1.0
AWS SDK version
3.564.0
The text was updated successfully, but these errors were encountered: