bug: sitemap parser returning invalid URLs #2698
Labels
bug
Something isn't working.
hacktoberfest
t-tooling
Issues with this label are in the ownership of the tooling team.
In the case of a pretty-printed XML sitemap, the
Sitemap.load()
call can return invalid URLs (the text content of theloc
element contains newlines and whitespace characters).The solution is likely simple - we should call
trim()
on those (see snippet below). Maybe also try parsing the urls withnew URL()
, skipping if this call throws?crawlee/packages/utils/src/internals/sitemap.ts
Line 149 in ea48d46
The text was updated successfully, but these errors were encountered: