You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like a programatic way of requesting access to gated datasets. The current solution to gain access forces me to visit a website and physically click an "agreement" button (as per the documentation).
An ideal approach would be HF API download methods that negotiate access on my behalf based on information from my CLI login and/or token. I realise that may be naive given the various types of access semantics available to dataset authors (automatic versus manual approval, for example) and complexities it might add to existing methods, but something along those lines would be nice.
Perhaps using the *_access_request methods available to dataset authors can be a precedent; see reject_access_request for example.
Motivation
When trying to download files from a gated dataset, I'm met with a GatedRepoError and instructed to visit the repository's website to gain access:
Cannot access gated repo for url https://huggingface.co/datasets/open-llm-leaderboard/meta-llama__Meta-Llama-3.1-70B-Instruct-details/resolve/main/meta-llama__Meta-Llama-3.1-70B-Instruct/samples_leaderboard_math_precalculus_hard_2024-07-19T18-47-29.522341.jsonl.
Access to dataset open-llm-leaderboard/meta-llama__Meta-Llama-3.1-70B-Instruct-details is restricted and you are not in the authorized list. Visit https://huggingface.co/datasets/open-llm-leaderboard/meta-llama__Meta-Llama-3.1-70B-Instruct-details to ask for access.
This makes task automation extremely difficult. For example, I'm interested in studying sample-level responses of models on the LLM leaderboard -- how they answered particular questions on a given evaluation framework. As I come across more and more participants that gate their data, it's becoming unwieldy to continue my work (there over 2,000 participants, so in the worst case that's the number of website visits I'd need to manually undertake).
One approach is use Selenium to react to the GatedRepoError, but that seems like overkill; and a potential violation HF terms of service (?).
As mentioned in the previous section, there seems to be an API for gated dataset owners to managed access requests, and thus some appetite for allowing automated management of gating. This feature request is to extend that to dataset users.
Your contribution
Whether I can help depends on a few things; one being the complexity of the underlying gated access design. If this feature request is accepted I am open to being involved in discussions and testing, and even development under the right time-outcome tradeoff.
The text was updated successfully, but these errors were encountered:
Feature request
I would like a programatic way of requesting access to gated datasets. The current solution to gain access forces me to visit a website and physically click an "agreement" button (as per the documentation).
An ideal approach would be HF API download methods that negotiate access on my behalf based on information from my CLI login and/or token. I realise that may be naive given the various types of access semantics available to dataset authors (automatic versus manual approval, for example) and complexities it might add to existing methods, but something along those lines would be nice.
Perhaps using the
*_access_request
methods available to dataset authors can be a precedent; seereject_access_request
for example.Motivation
When trying to download files from a gated dataset, I'm met with a
GatedRepoError
and instructed to visit the repository's website to gain access:This makes task automation extremely difficult. For example, I'm interested in studying sample-level responses of models on the LLM leaderboard -- how they answered particular questions on a given evaluation framework. As I come across more and more participants that gate their data, it's becoming unwieldy to continue my work (there over 2,000 participants, so in the worst case that's the number of website visits I'd need to manually undertake).
One approach is use Selenium to react to the
GatedRepoError
, but that seems like overkill; and a potential violation HF terms of service (?).As mentioned in the previous section, there seems to be an API for gated dataset owners to managed access requests, and thus some appetite for allowing automated management of gating. This feature request is to extend that to dataset users.
Your contribution
Whether I can help depends on a few things; one being the complexity of the underlying gated access design. If this feature request is accepted I am open to being involved in discussions and testing, and even development under the right time-outcome tradeoff.
The text was updated successfully, but these errors were encountered: