-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation for Auth - OKTA SSO #421
Comments
I am not too familiar with OKTA SSO, but I can see they support a few authentication methods. Maybe one is supported by the GenericHttpClientFactory. If not, you can extend this class to provide your custom authentication mechanism using OAuth 2.0 and carrying your security token around. You can have a look at https://www.norconex.com/how-to-crawl-facebook/ which does something similar. If the authentication for your sites is standard/generic enough and you can share credentials for a sample site, maybe we'll be able to add built-in support for another authentication scheme. |
Thanks. OKTA is not social auth, rather an application for enterprises to provide single sign-on via SAML for web applications. https://developer.okta.com/use_cases/integrate_with_okta/sso-with-saml It may be possible to send username/password, and 2FA. I'll do some research on this on the OKTA side. Can you please point me to how I would extend GenericHttpClientFactory for this? Generally, how would I send a simple username/password to a site and then crawl it? |
I did not mean to suggest OKTA was a social auth. The link to the blog post was to point you to an example extending the Collector. There is an example of a class extending Extending It seems like OKTA provides a Java API which should make your life easier: https://developer.okta.com/code/java/index If you have a way to provide me with a protected URL with a temporary test account (sent by email), we could make adding SAML support a feature request if you like. |
Very interested and thank-you! -- I need to do some work on my end. OKTA does offer a free test account, might help. I am totally open to getting you access to test. I've got some basic auth sites to figure out first... tried one below, inside <httpClientFactory class="com.norconex.collector.http.client.impl.GenericHttpClientFactory">
<authUsername>super secret account</authUsername>
<authPassword>super secret password</authPassword>
<authUsernameField>super secret account</authUsernameField>
<authPasswordField>super secret password</authPasswordField>
<authMethod>form</authMethod>
</httpClientFactory> Cred's are good, but it returns:
Must be missing something. tried no love, but very close. |
Can you share test credentials for that one too? From a very quick look at the site, it seems to be using "basic" (or maybe "digest"). I do not know if that's related, but there is a new flag that fixes basic auth issues for some people, described here: #420 (comment) Here is the flag: <authPreemptive>true</authPreemptive> Give it a try and let me know. |
Marking as a feature-request to support SAML. |
Just getting back to this, thank-you Pacal. I did add the authpreemptive but it didn't fix. I'm not sure how to debug further, is there a way to enable verbose logging so I can see what's going on behind the scenes. If I can't figure it out, I may take you up on your kind offer to login. |
Just fyi, I setup postman and did a "basic" auth request with the credentials and it works fine but with the crawler it fails. Here is my config:
and log:
|
To get the maximum verbosity set the following to TRACE in
If you do not mind sending me temporary credentials via email, I will look into it when I get a chance. |
Having a second look at your log, I see it rejects the authentication instead of attempting it. Odd... maybe as a test you can try to set the following in your document fetcher: <validStatusCodes>200,401</validStatusCodes> Let me know if that makes a difference. |
Sorry, that didn't do it, although I do have a clue. If I use curl:
if fails with the 401 even though the creds are good. Add \ in front of $ and it works:
Is it possible that we are seeing an issue with special characters in the password? I did try several ways, none worked out: 'pas$word' thanks |
Interesting... a possibility for sure. Are you storing your password in the Collector XML config? If so, did you try escaping the $ with a backslash in the config? It may be interpreted as a variable. If that's the case and it works with escaping, you have other options too. You can define the password in a variables/properties file and reference it as a variable in the config, or you can encrypt the password (see online documentation). If that does not make a difference, it may need to be escaped by the Collector somehow before sending it to your server and will require more investigation. As a workaround solution, does it work if you change the password to one without $ in it? Just to 100% confirm the issue is with the $. |
Good news: it is working as expected and I was able to make it work without anything special. It turns out you have put Please have a try and confirm. |
Awesome news, it ran here as well, thanks for all your help on this Pascal! |
Hello, any update on using SSO Auth with Norconex? |
Hi Kristi
On this end (customer side) we never got it settled, but still an issue to
be resolved. We are firing this project back up and plan to focus more in
Q1 starting Jan.
Regards,
…-Steve
(415) 320-1102 <https://www.google.com/voice/#phones>
On Fri, Nov 9, 2018 at 1:56 AM kristiWabion ***@***.***> wrote:
Hello, any update on using SSO Auth with Norconex?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#421 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABurT4-5WpFOAcu8OYXoM71hfAgeJOQ0ks5utVFYgaJpZM4QSPB5>
.
|
Testing this product, so far so good. I will have a number of sites that use OKTA SSO which I will need to crawl. Any pointers on how to do this?
The text was updated successfully, but these errors were encountered: