Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrape images, video, and post forwarding information for Telegram #413

Open
wants to merge 30 commits into
base: master
Choose a base branch
from
Open
Changes from 3 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
72b26f2
Scrape images, video, and post forwarding information for Telegram ch…
loganwilliams Oct 15, 2020
de4ebed
Fix KeyError caused by retweets without URLs in TwitterProfileScraper
loganwilliams Feb 24, 2022
b8efce2
Clean up unnecessary imports
loganwilliams Mar 8, 2022
ed82916
added capability to extract the number of channel members when the th…
trislee Mar 29, 2022
fb8d73a
handled case where channel has no profile image
trislee Mar 29, 2022
d32c9ad
added capability to scrape multiple videos from a single post
trislee Mar 30, 2022
a7eb54d
implemented Media dataclasses for Telegram, and added variable for ex…
trislee Mar 31, 2022
4e59638
added a forwardedUrl attribute to TelegramPost and made forwarded att…
trislee Mar 31, 2022
2ce014a
fixed edge case for videos that have data-link-attr but no href attri…
trislee Apr 3, 2022
f978954
Merge branch 'JustAnotherArchivist:master' into master
trislee Apr 3, 2022
babcddd
made Telegram scraper not return full channel info for forwarded_from…
trislee Apr 17, 2022
1e4e0c2
fixed issue where Telegram scraper terminated early because some page…
trislee Apr 17, 2022
b276c3c
fixed issue where some videos and photos weren't being scraped (becau…
trislee Apr 17, 2022
97d38e5
added additional termination criteria to Telegram scraper
trislee Apr 21, 2022
9b3faec
added additional attributes for hashtags and user mentions, removed r…
trislee Apr 21, 2022
21f7b62
moved forward finding out of tgme_widget_message_text clause, since i…
trislee Apr 21, 2022
5648e95
improved consistency of code formatting and added _STYLE_MEDIA_URL_PA…
trislee Apr 27, 2022
c18ca0f
Merge branch 'master' into telegram-media
trislee May 9, 2022
0a4bd39
Merge pull request #2 from bellingcat/telegram-media
trislee May 9, 2022
f385135
fixed merge conflicts
trislee May 9, 2022
b13e62e
Merge branch 'JustAnotherArchivist-master'
trislee May 9, 2022
e2d9223
forgot to save modified twitter.py module
trislee May 9, 2022
0822a9c
Merge pull request #4 from JustAnotherArchivist/master
trislee May 25, 2022
07a5f6f
merged master into more-tg-info to update upstream PR
trislee May 25, 2022
65723f1
fixed merge
trislee May 25, 2022
56e4232
fixed typo
trislee Jun 23, 2022
056cd62
incorporated requested changes from maintainer, removed modifications…
trislee Jun 23, 2022
73f10a4
fixed edge case where channel with no members fails _get_entity
trislee Jul 5, 2022
cbdfeed
fixed edge case where members information wasnt included
trislee Nov 30, 2022
cacd783
merged upstram changes
trislee Apr 4, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion snscrape/modules/vkontakte.py
Original file line number Diff line number Diff line change
@@ -177,11 +177,15 @@ def _post_div_to_item(self, post, isCopy = False):
continue
if 'data-video' in a.attrs:
# Video
if 'data-link-attr' in a.attrs:
hrefUrl = urllib.parse.unquote(a.attrs['data-link-attr'].split('to=')[1].split('&')[0])
else:
hrefUrl = f'https://vk.com{a["href"]}'
video = Video(
id = a['data-video'],
list = a['data-list'],
duration = int(a['data-duration']),
url = f'https://vk.com{a["href"]}',
url = hrefUrl,
thumbUrl = a['style'][(begin := a['style'].find('background-image: url(') + 22) : a['style'].find(')', begin)],
)
continue