Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic when DNS resolve issues #77

Open
boyter opened this issue Aug 6, 2024 · 0 comments
Open

Panic when DNS resolve issues #77

boyter opened this issue Aug 6, 2024 · 0 comments

Comments

@boyter
Copy link

boyter commented Aug 6, 2024

I am running a PiHole and thought I would try goclone against some websites, and encountered the following colly issue

$ goclone https://searchcode.com/
Extracting -->  https://searchcode.com/
Css found --> /static/css/newstyles.css
Extracting -->  https://searchcode.com/static/css/newstyles.css
Js found --> //cdn.carbonads.com/carbon.js?zoneid=1673&serve=C6AILKT&placement=searchcodecom
Extracting -->  https://cdn.carbonads.com/carbon.js?zoneid=1673&serve=C6AILKT&placement=searchcodecom
panic: Get "https://cdn.carbonads.com/carbon.js?zoneid=1673&serve=C6AILKT&placement=searchcodecom": dial tcp 0.0.0.0:443: connect: connection refused

goroutine 35 [running]:
github.com/imthaghost/goclone/pkg/crawler.Extractor({0x14000407c00, 0x55}, {0x140003a6000, 0x21})
	/Users/ghost/go/src/github.com/imthaghost/goclone/pkg/crawler/extractor.go:35 +0x24c
github.com/imthaghost/goclone/pkg/crawler.Collector.func2(0x140004aec60)
	/Users/ghost/go/src/github.com/imthaghost/goclone/pkg/crawler/collector.go:37 +0x120
github.com/gocolly/colly/v2.(*Collector).handleOnHTML.func1(0x0, 0x140004a1560)
	/Users/ghost/go/pkg/mod/github.com/gocolly/colly/[email protected]/colly.go:1074 +0x70
github.com/PuerkitoBio/goquery.(*Selection).Each(0x140004a1530, 0x14000073e30)
	/Users/ghost/go/pkg/mod/github.com/!puerkito!bio/[email protected]/iteration.go:10 +0x50
github.com/gocolly/colly/v2.(*Collector).handleOnHTML(0x140003ac000, 0x140003c06c0)
	/Users/ghost/go/pkg/mod/github.com/gocolly/colly/[email protected]/colly.go:1064 +0x288
github.com/gocolly/colly/v2.(*Collector).fetch(0x140003ac000, {0x140003a4060, 0x17}, {0x10531f364, 0x3}, 0x1, {0x0, 0x0}, 0x0, 0x1400038c210, ...)
	/Users/ghost/go/pkg/mod/github.com/gocolly/colly/[email protected]/colly.go:676 +0x7a0
created by github.com/gocolly/colly/v2.(*Collector).scrape
	/Users/ghost/go/pkg/mod/github.com/gocolly/colly/[email protected]/colly.go:574 +0x43c

This only occurs when running against websites that have blocked content, which then throws the above. While portions of the site are still cloned such an error seems like something that should be handled.

Disabling the pi-hole resolves the issue. While I understand pi-hole is not the expected path, I imagine DNS might be configured in some cases and produce something like the above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant