-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
10-20 times lower throughput of using SNScrape on a container in comparison to non containerized environment #946
Comments
Your profiles show that the big difference comes from importing modules, e.g.
|
Thanks @JustAnotherArchivist Below is a copy of the docker file RUN section for your info. RUN set -x You are right I am not using the precompiled version and hence it gets faster on the subsequent run, but 3-5s is the outcome of running the command after some time. At the start, it is usually about 15-20s. Moreover, how about when it's been run as a module in a service? I'm facing the same extra latency when it's built as a service via Flask. I will try it in a pre-compiled mode to see how much it gets better, but I wouldn't expect significant improvement given I'm facing almost similar latency in the service mode. |
The alternative is that the issue lies in the containerisation itself. At least I can't think of anything in snscrape that could cause something like this, and I know that past versions of seccomp have had issues in this area. Those should largely have been resolved over the years though, so assuming you're using reasonably recent versions of libseccomp and Docker, that shouldn't be the problem. |
I just remembered one other relevant difference: Alpine Linux uses musl, which is known to have lower performance than glibc for some things. You might want to play with images employing glibc for comparison, e.g. Debian. |
@JustAnotherArchivist Nice! I'll check a Debian-based image and will get back to you if that helps. |
@JustAnotherArchivist Just wanted to let you know that we have tested various images and looks like using none alpine based images could significantly improve the latency. The best image so far according to our tests has been python:3.8.10-slim-buster (~5x faster). It's still slower than running it locally but it's now acceptable. Thanks for your help. |
Good to hear there's an improvement. I don't know what the remaining difference could be apart from seccomp. I suppose you could try to disable that with |
Describe the bug
I am not sure what exactly could be wrong here but I thought to share my experience here in case anybody else faced the same issue.
Using SNScrape locally on a non-containerized environment takes about 0.2-0.3s on fetching tweets on average which is pretty good for our use case. However, when the same code runs on a container it causes the latency to be significantly higher (e.g. 3-5s). We thought perhaps the CPU throttling causing issues here or it was some sort of issue with warming up the container as we have been using SNScrape via the command line, so we attempted to test it as a service and the same issue persists. We have attempted to test it on Minikube and Kubernetes and the same issue persists. Below is the outcome of the profiling we did to understand what could be wrong here.
Profiling on a container:
https://justpaste.it/alqiq
Profiling on a non-containerized env:
https://justpaste.it/3frkx
Interestingly even running
snscrape --version
in a container is super slow!My guess is we are using some libraries in SNscrape that is much faster in a non-containerized mode in comparison to when they run on a container. We haven't tested different base docker images yet.
How to reproduce
Run it on a container. SSH to the container and try
snscrape --version
Expected behaviour
Expect to have relatively similar throughput when running SNScrape on a container vs non-containerized env.
Screenshots and recordings
No response
Operating system
Alpine Linux v3.12
Python version: output of
python3 --version
3.8.10
snscrape version: output of
snscrape --version
0.6.2.20230320
Scraper
twitter-user
How are you using snscrape?
CLI (
snscrape ...
as a command, e.g. in a terminal)Backtrace
No response
Log output
No response
Dump of locals
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: