-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
All Twitter scrapes are failing: blocked (404)
#996
Comments
So sad :-( |
Twitter disabled their public web site today (2023-06-30) and require users to login, twitter used to be public prior to this date. Would it be possible to automate the login as well providing a username and pw to snscrape, i.e. before calling a graphql api to login to twitter and simulate a logged-in session? |
I do not think the developer would do this, as he said that auth would never be added into features: see #270 . |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
Please consider deleting my prior off-topic comment. Don't nuke this one as off-topic: A Twitter employee says it's temporary: https://twitter.com/AqueelMiq/status/1674843555486134272 |
Elon talked about it too 💀 |
blocked (404)
This comment was marked as duplicate.
This comment was marked as duplicate.
Musk referred to EXTREME scraping, indicating that scrapers may no longer be functional post changes. Let's see how it is done. |
This comment was marked as off-topic.
This comment was marked as off-topic.
Hello, This may or may not help. Here's a route to access Tweets without logging in (contains further iframe to platform.twitter.com): Would combining this with a pre-existing list of Tweets allow data scraping to continue? Alternatively users could build the tweet list using google search, e.g. for Tesla tweets: "site:twitter.com/tesla/status" or via another cached list (e.g. Waybackmachine - https://web.archive.org/web/*/https://twitter.com/tesla/status*) If I'm off the mark, I apologise but thought I'd pass this on, on the off chance it may help at least as a temporary measure. Just a note to @JustAnotherArchivist - thank you for the hard work you have put into this library - it is very much appreciated Ben |
URL: https://cdn.syndication.twimg.com/tweet-result CODE: import requests
url = "https://cdn.syndication.twimg.com/tweet-result"
querystring = {"id":"1652193613223436289","lang":"en"}
payload = ""
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/114.0",
"Accept": "*/*",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate, br",
"Origin": "https://platform.twitter.com",
"Connection": "keep-alive",
"Referer": "https://platform.twitter.com/",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "cross-site",
"Pragma": "no-cache",
"Cache-Control": "no-cache",
"TE": "trailers"
}
response = requests.request("GET", url, data=payload, headers=headers, params=querystring)
print(response.text) Generated by Insomnia |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
https://twitter.com/elonmusk/status/1675187969420828672 😂
|
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as resolved.
This comment was marked as resolved.
Scraping seems to be still possible, check this: |
while cool, it's using API V1 and you can't get long tweet |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as duplicate.
This comment was marked as duplicate.
This comment was marked as spam.
This comment was marked as spam.
I used this script A to get the Guest Token [When you get an error If it changed again - please mention. Script A: #!/usr/bin/env python3
import sys
import json
import textwrap
import requests
with requests.Session() as session:
guest_token = session.post("https://api.twitter.com/1.1/guest/activate.json", headers={
"Authorization": "Bearer AAAAAAAAAAAAAAAAAAAAAFXzAwAAAAAAMHCxpeSDG1gLNLghVe8d74hl6k4%3DRUMF4xAQLsbeBhTSRrCiQpJtxoGWeyHrDb5te2jpGskWDFW82F",
}).json()["guest_token"]
flow_token_resp = session.post("https://api.twitter.com/1.1/onboarding/task.json?flow_name=welcome&api_version=1&known_device_token=&sim_country_code=us", headers={
"Authorization": "Bearer AAAAAAAAAAAAAAAAAAAAAFXzAwAAAAAAMHCxpeSDG1gLNLghVe8d74hl6k4%3DRUMF4xAQLsbeBhTSRrCiQpJtxoGWeyHrDb5te2jpGskWDFW82F",
"Content-Type": "application/json",
"User-Agent": "TwitterAndroid/9.95.0-release.0 (29950000-r-0) ONEPLUS+A3010/9 (OnePlus;ONEPLUS+A3010;OnePlus;OnePlus3;0;;1;2016)",
"X-Twitter-API-Version": "5",
"X-Twitter-Client": "TwitterAndroid",
"X-Twitter-Client-Version": "9.95.0-release.0",
"OS-Version": "28",
"System-User-Agent": "Dalvik/2.1.0 (Linux; U; Android 9; ONEPLUS A3010 Build/PKQ1.181203.001)",
"X-Twitter-Active-User": "yes",
"X-Guest-Token": guest_token,
}, data=textwrap.dedent(
"""{
"flow_token": null,
"input_flow_data": {
"country_code": null,
"flow_context": {
"start_location": {
"location": "splash_screen"
}
},
"requested_variant": null,
"target_user_id": 0
},
"subtask_versions": {
"generic_urt": 3,
"standard": 1,
"open_home_timeline": 1,
"app_locale_update": 1,
"enter_date": 1,
"email_verification": 3,
"enter_password": 5,
"enter_text": 5,
"one_tap": 2,
"cta": 7,
"single_sign_on": 1,
"fetch_persisted_data": 1,
"enter_username": 3,
"web_modal": 2,
"fetch_temporary_password": 1,
"menu_dialog": 1,
"sign_up_review": 5,
"interest_picker": 4,
"user_recommendations_urt": 3,
"in_app_notification": 1,
"sign_up": 2,
"typeahead_search": 1,
"user_recommendations_list": 4,
"cta_inline": 1,
"contacts_live_sync_permission_prompt": 3,
"choice_selection": 5,
"js_instrumentation": 1,
"alert_dialog_suppress_client_events": 1,
"privacy_options": 1,
"topics_selector": 1,
"wait_spinner": 3,
"tweet_selection_urt": 1,
"end_flow": 1,
"settings_list": 7,
"open_external_link": 1,
"phone_verification": 5,
"security_key": 3,
"select_banner": 2,
"upload_media": 1,
"web": 2,
"alert_dialog": 1,
"open_account": 2,
"action_list": 2,
"enter_phone": 2,
"open_link": 1,
"show_code": 1,
"update_users": 1,
"check_logged_in_account": 1,
"enter_email": 2,
"select_avatar": 4,
"location_permission_prompt": 2,
"notifications_permission_prompt": 4
}
}"""
))
flow_token = flow_token_resp.json()["flow_token"]
resp = session.post("https://api.twitter.com/1.1/onboarding/task.json", headers={
"Authorization": "Bearer AAAAAAAAAAAAAAAAAAAAAFXzAwAAAAAAMHCxpeSDG1gLNLghVe8d74hl6k4%3DRUMF4xAQLsbeBhTSRrCiQpJtxoGWeyHrDb5te2jpGskWDFW82F",
"Content-Type": "application/json",
"User-Agent": "TwitterAndroid/9.95.0-release.0 (29950000-r-0) ONEPLUS+A3010/9 (OnePlus;ONEPLUS+A3010;OnePlus;OnePlus3;0;;1;2016)",
"X-Twitter-API-Version": "5",
"X-Twitter-Client": "TwitterAndroid",
"X-Twitter-Client-Version": "9.95.0-release.0",
"OS-Version": "28",
"System-User-Agent": "Dalvik/2.1.0 (Linux; U; Android 9; ONEPLUS A3010 Build/PKQ1.181203.001)",
"X-Twitter-Active-User": "yes",
"X-Guest-Token": guest_token,
}, data=json.dumps({
"flow_token": flow_token,
"subtask_inputs": [
{
"open_link": {
"link": "next_link",
},
"subtask_id": "NextTaskOpenLink",
}
],
"subtask_versions": {
"generic_urt": 3,
"standard": 1,
"open_home_timeline": 1,
"app_locale_update": 1,
"enter_date": 1,
"email_verification": 3,
"enter_password": 5,
"enter_text": 5,
"one_tap": 2,
"cta": 7,
"single_sign_on": 1,
"fetch_persisted_data": 1,
"enter_username": 3,
"web_modal": 2,
"fetch_temporary_password": 1,
"menu_dialog": 1,
"sign_up_review": 5,
"interest_picker": 4,
"user_recommendations_urt": 3,
"in_app_notification": 1,
"sign_up": 2,
"typeahead_search": 1,
"user_recommendations_list": 4,
"cta_inline": 1,
"contacts_live_sync_permission_prompt": 3,
"choice_selection": 5,
"js_instrumentation": 1,
"alert_dialog_suppress_client_events": 1,
"privacy_options": 1,
"topics_selector": 1,
"wait_spinner": 3,
"tweet_selection_urt": 1,
"end_flow": 1,
"settings_list": 7,
"open_external_link": 1,
"phone_verification": 5,
"security_key": 3,
"select_banner": 2,
"upload_media": 1,
"web": 2,
"alert_dialog": 1,
"open_account": 2,
"action_list": 2,
"enter_phone": 2,
"open_link": 1,
"show_code": 1,
"update_users": 1,
"check_logged_in_account": 1,
"enter_email": 2,
"select_avatar": 4,
"location_permission_prompt": 2,
"notifications_permission_prompt": 4,
}
}))
try:
tokens = [json.dumps(resp.json()["subtasks"][i]["open_account"]["user"]["id"]) for i in range(len(resp.json()["subtasks"]))]
print(json.dumps(resp.json()["subtasks"][0]["open_account"]))
except KeyError:
print("Failed to fetch guest account, is your IP rate limited or so?", file=sys.stderr)
sys.exit(1)
print("Tokens: ", tokens) Script B: import requests
url = "https://cdn.syndication.twimg.com/tweet-result"
select_token = 0
search_keywords = "How much is the fish?"
params = {
"id":tokens[select_token],
"lang":"en",
"keywords": search_keywords
}
payload = ""
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/114.0",
"Accept": "*/*",
"Accept-Language": "en-US,en;q=0.5",
"Accept-Encoding": "gzip, deflate, br",
"Origin": "https://platform.twitter.com",
"Connection": "keep-alive",
"Referer": "https://platform.twitter.com/",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "cross-site",
"Pragma": "no-cache",
"Cache-Control": "no-cache",
"TE": "trailers"
}
response = requests.request("GET", url, data=payload, headers=headers, params=params)
print(response.text) |
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as spam.
This comment was marked as spam.
Now, individual tweets can be viewed without logging in, but I tried |
Hi, how did you get this information? |
No specific notification, I just opened a tweet while not logged in. |
Confirmed too, that viewing both tweets and users without login is now successful. Maybe it is a good start.
|
This comment was marked as spam.
This comment was marked as spam.
1 similar comment
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
Vercel's react-tweet now has a bit of a workaround. They figured out that you can use the Twitter embed API to get data from any tweet. Usually, you'd need a special token to get any data but they reverse engineered the token and you can generate it yourself using the tweet id. The API is at this URL: 'https://cdn.syndication.twimg.com/tweet-result' and the token generator looks like this: function getToken(id: string) {
return ((Number(id) / 1e15) * Math.PI)
.toString(6 ** 2)
.replace(/(0+|\.)/g, '')
} Source: https://github.com/vercel/react-tweet/blob/main/packages/react-tweet/src/api/fetch-tweet.ts |
Hi everyone, I know this will sound like an ad. I have used this library for a while back then, and waited to see if the community would manage. We have an Insight API for aggregated metrics and a Fullstream API that output the entire annotated feed. Just reach out for trial & access. We are willing to support researchers and OSINT efforts, with have an API & can provide raw archives. Just reach out on [email protected] or visit developers.exorde.io |
With the exception of
twitter-trends
, all Twitter scrapes are failing since sometime in the past hour. This is likely connected to Twitter as a whole getting locked behind a login wall since earlier today. There is no known workaround at this time, and it's not known whether this will be fixable.The text was updated successfully, but these errors were encountered: