-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to use proxy with snscrape #825
Comments
That's not relevant. Does the search work? That is, do you get results on https://twitter.com/search?q=ukraine&f=live? The search has different restrictions than the site in general.
Not a possible output of
Not the current snscrape version.
This is not a complete debug log. It's just the exception traceback formatted formatted in a strange way. See README for instructions on obtaining a complete log. |
update: Snscrape updated to latest version, python version output in correct format. debug log updated according to README instructions |
That log was not generated with the latest snscrape version. But I just found a proxy where the search works with curl but not with latest dev snscrape, so yeah, there is indeed some kind of issue here. |
I just ran the log again, using snscrape 0.6.2.20230320. with |
Do you have control over the proxy server? If so, what's the proxy software and can you run snscrape from there directly for a test? |
I don't have control of the proxy server and it doesn't have many options, I pretty much just have the address, port, username, password. Ran tests like in snscrape using https://google.com, https://api.ipify.org, https://httpbin.org/ip, all work.
also tried running a tor server locally, this is more easily reproducible. with tor running you can add: |
Yeah, I expected as much. I'll see if I can figure out a way to reproduce it with a simple self-hosted proxy server. Else debugging is going to be tricky. |
Do you have recommendations on self-hosted proxy for debugging? Trying mitmproxy now. |
I don't have much experience with the type of proxy at play here, so no. |
I'm getting the same error from snscrape.modules.twitter import TwitterUserScraper
proxies = {
"http": "http://192.168.1.5:20102",
"https": "http://192.168.1.5:20102",
}
u = TwitterUserScraper("ubuntu", proxies=proxies)
for i in u.get_items():
print(i, type(i)) The proxy works fine and can be confirmed by: import requests
proxies = {
"http": "http://192.168.1.5:20102",
"https": "http://192.168.1.5:20102",
}
resp = requests.get("https://twitter.com", proxies=proxies)
print(resp.status_code)
# 200 |
Loading the Twitter homepage isn't the same as loading the HTTP endpoints, I believe. If you can load whatever snscrape can't in a web browser (with no cookies - try an incognito or private window), that's different. |
I logged into the proxy server and ran the same code: from snscrape.modules.twitter import TwitterUserScraper
u = TwitterUserScraper("ubuntu")
for i in u.get_items():
print(i, type(i)) The error didn't occur, so I think this shouldn't be a problem with the IP. |
@davuses Thank you, that's helpful. What proxy server software are you using? |
I set up the proxy server with this tool: https://github.com/v2fly/v2ray-core |
Can confirm it is not an IP issue, used the same proxy for testing both TWINT and snscrape, snscrape gives the 404 errors, TWINT does not. |
This comment was marked as off-topic.
This comment was marked as off-topic.
I had the same problem using the TwitterProfileScraper, I made the following changes to the code in I tested usage with and without a proxy on the TwitterProfileScraper in WireShark and can confirm requests are going through the proxy.
|
@nadSTC None of those changes should be necessary. The |
Describe the bug
I'm getting a ScraperException when trying to use a proxy with snscrape on twitter.
I've tested the proxy ip is not banned by twitter by adding it to network configuration as well as just the browser. checked the browser is using the proxy ip on https://httpbin.org/ip.
Used a private browser window to connect to twitter.com and this worked normally. So it doesn't seem to be a proxy issue or a twitter blocking issue. I think it's something to do with how the python requests library is configuring the cipher when sending the request through proxy but I don't really know.
How to reproduce
Example of code reproducing the error. the proxy should work on twitter.com in a private browser window without being logged in.
Expected behaviour
The same code without proxy outputs correct response.
Operating system
macos 13.2.1
Python version: output of
python3 --version
Python 3.11.0
snscrape version: output of
snscrape --version
snscrape 0.6.2.20230320
Scraper
TwitterSearchScraper
How are you using snscrape?
import snscrape.modules.twitter as sntwitter
Log output
adding the full output report (proxy address changed for privacy reasons):
The text was updated successfully, but these errors were encountered: