Replies: 4 comments 11 replies
-
I tried using ScraperAPI last week, and it worked for the first few attempts before being blocked consistently. I thought using their API endpoint had marginal advantage over using their Proxy mode, but that could have just been coincidence. Can you try the Method 1 and Method 3 they have in the documentation using cURL with a Google Scholar publication page (one that has |
Beta Was this translation helpful? Give feedback.
-
OK, I have some good news! I've figured out the issue in scholarly that makes it hard to work with ScraperAPI and will submit a fix soon. However, I find using ScraperAPI extremely slow, but at least it works and they have a free plan. So this is definitely worth it. |
Beta Was this translation helpful? Give feedback.
-
@danuccio Please install v1.4.2 of scholarly with your ScraperAPI credentials and let us know how it works. It would be slower with ScraperAPI but it is reliable. For several hundred authors/publications, you would need to upgrade to a paid plan most likely but you could check if it works with a free plan first. |
Beta Was this translation helpful? Give feedback.
-
A few ideas, to keep it simple:
|
Beta Was this translation helpful? Give feedback.
-
I am not sure if this is technically an issue with the Scholarly package so I put this in the discussion section.
I am trying to scrape data from Google Scholar for several hundred authors (and their respective publications) each week. I wrote a code that integrates Scholarly and should be able to do this. However, to avoid getting blocked by Google I am looking into Scholarly's API options. Scraper API looked like the most promising of the options and I integrated it into my code right before Labor Day. At the time it worked, although with a pretty high failure rate (~16-33%). Scraper API's support team said they had a known issue with Google at the time but were fixing it. After they fixed it, I seemed to be able to scrape with a near 100% success rate, although my Scraper API dashboard indicated a near 100% failure rate; Scraper API support said they did not have a record of me even making requests to Google, which would suggest the requests were being made from my IP address.
My question is has anyone else been having issues with Scholarly/Scraper API compatibility in the past week or two? I am trying to find out if the issue is with my code or if Scholarly and Scraper API are no longer compatible.
On a related note, does anyone know if the Luminati option is still working since they became Bright Data?
Below I have included the most relevant portions of my code.
This chunk is where I use Scholarly to call ScraperAPI, search for an author, and call a function to gather publication info.
`def main(author_ids, output_file, random_interval_precaution, article_limit_precaution, verbosity, api_key):
`
This chunk is a function that loops through the publications of an author and fills them using scholarly.fill().
`
def gather_pub_info(author, random_intervals, dicList, random_interval_precaution, article_limit, verbosity, pg, api_key):
`
Beta Was this translation helpful? Give feedback.
All reactions