-
Notifications
You must be signed in to change notification settings - Fork 737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: allow using other HTTP clients #2661
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request is neither linked to an issue or epic nor labeled as adhoc!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request is neither linked to an issue or epic nor labeled as adhoc!
So if I understand this correctly, we cannot get rid of Does it mean we have to model the In |
Also, a completely unrelated thought / nit, but what file and directory casing convention are we following in Crawlee? It's already messy in the current codebase, but seeing |
Well, yeah, since got is part of our public API, we cannot completely remove it 🤷. We'll see what our default http client for 4.0 will be.
Not really, but I'd prefer to implicitly support got stuff that won't be part of the http client interface (e.g.,
Well, I didn't notice any, so I just look at the nearest files and do my best effort. |
Historically things were snake case, but I believe nobody from the current team is in favor of that (I never like this). I would go with kebab case for anything new, and unify in v4 maybe (or sooner, I don't think it has any effect downstream, we don't allow deep imports anywhere). |
docs/guides/custom-http-client.mdx
Outdated
async sendRequest<TResponseType extends keyof ResponseTypes = "text">( | ||
request: HttpRequest<TResponseType>, | ||
): Promise<HttpResponse<TResponseType>> { | ||
/* ... */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use fetch
or axios
here or is that more tricky than it sounds like? the example without any implementation is not really helpful.
if the problematic part is streaming, maybe we could provide some default dummy implementation/helper that would just load things into memory and wrap it in a stream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, if I'm going to implement it, I would rather just ship it as code. And writing an incomplete example is not exactly helpful either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's wrong with doing both? Docs are important, you won't get around that just by publishing some package, we need them. If you don't want to do it now, let's show the got scraping client implementation here (which can be simplified, referencing the full working code), but we need something better than what we have now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the curl-impersonate thing is (allegedly) slow and doesn't work on Windows - that's not something I'd like to put on display. A complete implementation of something else (fetch, axios, ...) would take some effort, and nobody asked for it (yet). That and the fact that we're going to change the interface pretty soon anyway (to be more ergonomic, among other things) kinda deter me.
Of course, I could just sketch something here and leave out the tricky parts (e.g., redirection + cookie handling), but that doesn't feel right either. It would look like we're leaving out the gnarly parts and leaving them for the user to figure out.
I guess just linking to that gist with curl-impersonate with a disclaimer could work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but that doesn't feel right either.
It would still look better than what we have now. I am fine with leaving out tricky parts for now, but leaving out everything feels weird.
Btw, people already asked indirectly about axios support, we have several reports about things not working with got scraping, but working with axios.
A link to that gist could also help, and I would surely add it here.
eef1c59
to
7779572
Compare
cf7f363
to
5d0200b
Compare
5d0200b
to
aaf84bf
Compare
Then how can I send a stream request from the router handler using
|
That's the point, you don't have to change anything in your |
See https://gist.github.com/janbuchar/3a4724927de2c3a0bb16c46bb5940236 for an example curl-impersonate client.
The following got-scraping options were ignored (they will still work, but they're not part of the new interface):