Documentation for Auth - OKTA SSO #421

jacksonp2008 · 2017-11-05T02:22:59Z

Testing this product, so far so good. I will have a number of sites that use OKTA SSO which I will need to crawl. Any pointers on how to do this?

essiembre · 2017-11-05T22:05:46Z

I am not too familiar with OKTA SSO, but I can see they support a few authentication methods. Maybe one is supported by the GenericHttpClientFactory. If not, you can extend this class to provide your custom authentication mechanism using OAuth 2.0 and carrying your security token around. You can have a look at https://www.norconex.com/how-to-crawl-facebook/ which does something similar.

If the authentication for your sites is standard/generic enough and you can share credentials for a sample site, maybe we'll be able to add built-in support for another authentication scheme.

jacksonp2008 · 2017-11-08T04:30:05Z

Thanks. OKTA is not social auth, rather an application for enterprises to provide single sign-on via SAML for web applications. https://developer.okta.com/use_cases/integrate_with_okta/sso-with-saml

It may be possible to send username/password, and 2FA. I'll do some research on this on the OKTA side.

Can you please point me to how I would extend GenericHttpClientFactory for this? Generally, how would I send a simple username/password to a site and then crawl it?

essiembre · 2017-11-08T04:52:50Z

I did not mean to suggest OKTA was a social auth. The link to the blog post was to point you to an example extending the Collector. There is an example of a class extending GenericDocumentFetcher showing one way you can pass a token with every URL requests. On second thought, it may be the best class to overwrite since that is actually where the HTTP requests are happening.

Extending GenericHttpClientFactory could be good if you know there is a way to add "default" SAML authentication (or other supported auth) on Apache HttpClient.

It seems like OKTA provides a Java API which should make your life easier: https://developer.okta.com/code/java/index

If you have a way to provide me with a protected URL with a temporary test account (sent by email), we could make adding SAML support a feature request if you like.

jacksonp2008 · 2017-11-09T06:34:44Z

Very interested and thank-you! -- I need to do some work on my end.

OKTA does offer a free test account, might help. I am totally open to getting you access to test.

I've got some basic auth sites to figure out first... tried one below, inside <httpcollector... I have:

  <httpClientFactory class="com.norconex.collector.http.client.impl.GenericHttpClientFactory">
      <authUsername>super secret account</authUsername>
      <authPassword>super secret password</authPassword>
      <authUsernameField>super secret account</authUsernameField>
      <authPasswordField>super secret password</authPasswordField>
   <authMethod>form</authMethod>
  </httpClientFactory>

Cred's are good, but it returns:

INFO  [CrawlerEventManager]       REJECTED_BAD_STATUS: https://updates.forescout.com/ (HttpFetchResponse [crawlState=BAD_STATUS, statusCode=401, reasonPhrase=Authorization Required])

Must be missing something. tried <authMethod>
form
basic
digest

no love, but very close.

essiembre · 2017-11-09T06:46:27Z

Can you share test credentials for that one too? From a very quick look at the site, it seems to be using "basic" (or maybe "digest"). I do not know if that's related, but there is a new flag that fixes basic auth issues for some people, described here: #420 (comment)

Here is the flag:

<authPreemptive>true</authPreemptive>

Give it a try and let me know.

essiembre · 2017-11-10T18:26:45Z

Marking as a feature-request to support SAML.

jacksonp2008 · 2017-11-14T04:01:14Z

Just getting back to this, thank-you Pacal. I did add the authpreemptive but it didn't fix. I'm not sure how to debug further, is there a way to enable verbose logging so I can see what's going on behind the scenes. If I can't figure it out, I may take you up on your kind offer to login.

jacksonp2008 · 2017-11-14T04:50:57Z

Just fyi, I setup postman and did a "basic" auth request with the credentials and it works fine but with the crawler it fails.

Here is my config:

<httpcollector id="Minimum Config HTTP Collector">
  <httpClientFactory class="com.norconex.collector.http.client.impl.GenericHttpClientFactory">
      <authUsername>xxxxx</authUsername>
      <authPassword>xxxxxxx</authPassword>
      <authUsernameField>xxxxx</authUsernameField>
      <authPasswordField>xxxxxx</authPasswordField>
      <authPreemptive>true</authPreemptive>
      <trustAllSSLCertificates>true</trustAllSSLCertificates>
      <authMethod>basic</authMethod>
      <authURL>https://updates.forescout.com</authURL>
  </httpClientFactory>

and log:

INFO  [SitemapStore] Anonymous Coward: Initializing sitemap store...
INFO  [SitemapStore] Anonymous Coward: Done initializing sitemap store.
log4j:WARN No appenders could be found for logger (org.apache.http.client.protocol.ResponseProcessCookies).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
INFO  [StandardSitemapResolver] Resolving sitemap: https://updates.forescout.com/sitemap.xml
INFO  [StandardSitemapResolver]          Resolved: https://updates.forescout.com/sitemap.xml
INFO  [HttpCrawler] 1 start URLs identified.
INFO  [CrawlerEventManager]           CRAWLER_STARTED
INFO  [AbstractCrawler] Anonymous Coward: Crawling references...
INFO  [CrawlerEventManager]       REJECTED_BAD_STATUS: https://updates.forescout.com/ (HttpFetchResponse [crawlState=BAD_STATUS, statusCode=401, reasonPhrase=Authorization Required]

essiembre · 2017-11-15T07:16:00Z

To get the maximum verbosity set the following to TRACE in log4j.properties:

log4j.logger.org.apache.http=TRACE

If you do not mind sending me temporary credentials via email, I will look into it when I get a chance.

essiembre · 2017-11-15T07:21:57Z

Having a second look at your log, I see it rejects the authentication instead of attempting it. Odd... maybe as a test you can try to set the following in your document fetcher:

<validStatusCodes>200,401</validStatusCodes>

Let me know if that makes a difference.

jacksonp2008 · 2017-11-15T17:06:20Z

Sorry, that didn't do it, although I do have a clue.

If I use curl:

curl -u user:pas$word https://updates.forescout.com

if fails with the 401 even though the creds are good.

Add \ in front of $ and it works:

curl -u user:pas\$word https://updates.forescout.com

Is it possible that we are seeing an issue with special characters in the password?

I did try several ways, none worked out:

'pas$word'
"pas$word"
pas$word

thanks

essiembre · 2017-11-16T06:48:03Z

Interesting... a possibility for sure. Are you storing your password in the Collector XML config? If so, did you try escaping the $ with a backslash in the config? It may be interpreted as a variable. If that's the case and it works with escaping, you have other options too. You can define the password in a variables/properties file and reference it as a variable in the config, or you can encrypt the password (see online documentation).

If that does not make a difference, it may need to be escaped by the Collector somehow before sending it to your server and will require more investigation.

As a workaround solution, does it work if you change the password to one without $ in it? Just to 100% confirm the issue is with the $.

essiembre · 2017-11-17T05:01:38Z

Good news: it is working as expected and I was able to make it work without anything special. It turns out you have put <httpClientFactory> under <httpcollector> while it goes under your <crawler> section (as per documentation). Moving it there did it.

Please have a try and confirm.

jacksonp2008 · 2017-11-17T15:31:45Z

Awesome news, it ran here as well, thanks for all your help on this Pascal!

kristiWabion · 2018-11-09T09:56:35Z

Hello, any update on using SSO Auth with Norconex?

jacksonp2008 · 2018-11-10T02:11:17Z

Hi Kristi On this end (customer side) we never got it settled, but still an issue to be resolved. We are firing this project back up and plan to focus more in Q1 starting Jan. Regards,

…

-Steve (415) 320-1102 <https://www.google.com/voice/#phones>

On Fri, Nov 9, 2018 at 1:56 AM kristiWabion ***@***.***> wrote: Hello, any update on using SSO Auth with Norconex? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#421 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABurT4-5WpFOAcu8OYXoM71hfAgeJOQ0ks5utVFYgaJpZM4QSPB5> .

essiembre added the question label Nov 5, 2017

essiembre added the feature-request label Nov 10, 2017

essiembre mentioned this issue Nov 10, 2017

Authentication for secure source is not working Norconex HttpCollector #203

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation for Auth - OKTA SSO #421

Documentation for Auth - OKTA SSO #421

jacksonp2008 commented Nov 5, 2017

essiembre commented Nov 5, 2017

jacksonp2008 commented Nov 8, 2017

essiembre commented Nov 8, 2017

jacksonp2008 commented Nov 9, 2017 •

edited by essiembre

Loading

essiembre commented Nov 9, 2017

essiembre commented Nov 10, 2017

jacksonp2008 commented Nov 14, 2017

jacksonp2008 commented Nov 14, 2017 •

edited

Loading

essiembre commented Nov 15, 2017 •

edited

Loading

essiembre commented Nov 15, 2017

jacksonp2008 commented Nov 15, 2017 •

edited

Loading

essiembre commented Nov 16, 2017

essiembre commented Nov 17, 2017 •

edited

Loading

jacksonp2008 commented Nov 17, 2017

kristiWabion commented Nov 9, 2018

jacksonp2008 commented Nov 10, 2018 via email

Documentation for Auth - OKTA SSO #421

Documentation for Auth - OKTA SSO #421

Comments

jacksonp2008 commented Nov 5, 2017

essiembre commented Nov 5, 2017

jacksonp2008 commented Nov 8, 2017

essiembre commented Nov 8, 2017

jacksonp2008 commented Nov 9, 2017 • edited by essiembre Loading

essiembre commented Nov 9, 2017

essiembre commented Nov 10, 2017

jacksonp2008 commented Nov 14, 2017

jacksonp2008 commented Nov 14, 2017 • edited Loading

essiembre commented Nov 15, 2017 • edited Loading

essiembre commented Nov 15, 2017

jacksonp2008 commented Nov 15, 2017 • edited Loading

essiembre commented Nov 16, 2017

essiembre commented Nov 17, 2017 • edited Loading

jacksonp2008 commented Nov 17, 2017

kristiWabion commented Nov 9, 2018

jacksonp2008 commented Nov 10, 2018 via email

jacksonp2008 commented Nov 9, 2017 •

edited by essiembre

Loading

jacksonp2008 commented Nov 14, 2017 •

edited

Loading

essiembre commented Nov 15, 2017 •

edited

Loading

jacksonp2008 commented Nov 15, 2017 •

edited

Loading

essiembre commented Nov 17, 2017 •

edited

Loading