Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange Yahoo search data #57

Open
kingo55 opened this issue Apr 25, 2014 · 14 comments
Open

Strange Yahoo search data #57

kingo55 opened this issue Apr 25, 2014 · 14 comments
Labels

Comments

@kingo55
Copy link
Contributor

kingo55 commented Apr 25, 2014

Am seeing some strange Yahoo data showing up as "search".

A client ran a homepage takeover on Yahoo a few weeks back and sent a lot of traffic from the yahoo homepage from hostname "au.yahoo.com".

I know this isn't search traffic, so when I queried it in Snowplow, this hostname had no real keywords.

In contrast, the hostname "au.search.yahoo.com" had quite a few search terms.

Is this a case of not provided?

@kingo55
Copy link
Contributor Author

kingo55 commented Apr 26, 2014

Perhaps these two images explain it best:

Normal search terms:
sp-refr_urlhost2

Not provided / advertising?:
sp-refr_urlhost1

Here's the query:

SELECT
  Distinct refr_term,
  "count"(*)
FROM
  "atomic".events
WHERE
  refr_urlhost = 'au.search.yahoo.com'
GROUP BY
  1
ORDER BY
  2 DESC

@alexanderdean
Copy link
Contributor

Right - sounds like we should add 'au.yahoo.com' to the unknown section at the top of the referers.yml, to reduce these search false positives (i.e. au.yahoo.com will be identified as unknown rather than search). Makes sense?

@kingo55
Copy link
Contributor Author

kingo55 commented Apr 26, 2014

Sure. I am just a little hesitant because it looks like a pattern among many of the Yahoo domains in the yaml. Plus I don't have a huge amount of data to confirm this against what I'm seeing in my Snowplow install.

To name a few which also may be generating false positives:

  - yahoo.com
  - ar.yahoo.com
  - au.yahoo.com
  - br.yahoo.com
  - chinese.yahoo.com
  - de.yahoo.com
  - dk.yahoo.com
  - es.yahoo.com

@alexanderdean
Copy link
Contributor

Hey @kingo55 - @fblundun is back working on this library at the moment. Shall we add those 8 domains you list into our unknown section at the top to prevent false positives?

@kingo55
Copy link
Contributor Author

kingo55 commented Jun 26, 2014

Ah awesome. Mind if I run a quick test first?

I want to make sure we're not going to mislabel genuine search traffic.
E.g. Secure searches where the keyword isn't provided.

PS. Massively excited to see all the other improvements coming!
On 26 Jun 2014 22:42, "Alexander Dean" [email protected] wrote:

Hey @kingo55 https://github.com/kingo55 - @fblundun
https://github.com/fblundun is back working on this library at the
moment. Shall we add those 8 domains you list into our unknown section at
the top to prevent false positives?


Reply to this email directly or view it on GitHub
#57 (comment)
.

@alexanderdean
Copy link
Contributor

No probs - go ahead Rob!

@kingo55
Copy link
Contributor Author

kingo55 commented Jun 26, 2014

Ok, I can't find a way to produce referrers with au.yahoo.com, but Yahoo's
secure search appears to pass a referrer on with the visitor. You may want
to decide if you want to keep that particular referrer in.

Just second guessing since I notice Google Analytics includes that traffic
under search traffic, too.

On 26 Jun 2014 23:28, "Alexander Dean" [email protected] wrote:

No probs - go ahead Rob!


Reply to this email directly or view it on GitHub.

@kingo55
Copy link
Contributor Author

kingo55 commented Jun 26, 2014

Yahoo search referrals always seem to come under r.search.yahoo.com/__ylt=...

On 27 Jun 2014 00:02, "Rob Kingston" [email protected] wrote:

Ok, I can't find a way to produce referrers with au.yahoo.com, but
Yahoo's secure search appears to pass a referrer on with the visitor. You
may want to decide if you want to keep that particular referrer in.

Just second guessing since I notice Google Analytics includes that traffic
under search traffic, too.

On 26 Jun 2014 23:28, "Alexander Dean" [email protected] wrote:

No probs - go ahead Rob!


Reply to this email directly or view it on GitHub.

@fblundun
Copy link
Contributor

Hi @kingo55 ,
Just asking to clarify: Do you think that those eight domains should all go into the unknown section? Or do you think that some of them might be genuine "search" referers?

@fblundun fblundun reopened this Jun 27, 2014
@kingo55
Copy link
Contributor Author

kingo55 commented Jun 27, 2014

Hi @fblunden

Which domains are you referring to?
On 28 Jun 2014 00:14, "Fred Blundun" [email protected] wrote:

Reopened #57 #57.


Reply to this email directly or view it on GitHub
#57 (comment).

@fblundun
Copy link
Contributor

Your list from April:

  - yahoo.com
  - ar.yahoo.com
  - au.yahoo.com
  - br.yahoo.com
  - chinese.yahoo.com
  - de.yahoo.com
  - dk.yahoo.com
  - es.yahoo.com

@kingo55
Copy link
Contributor Author

kingo55 commented Jun 27, 2014

Ah yes I was looking at the wrong PR.

Up to you guys how you want to treat those referrers. My findings indicated
they should be OK to add as unknown. I was skeptical that they might mask
Yahoo secure searches, but I don't think so anymore.
On 28 Jun 2014 00:23, "Fred Blundun" [email protected] wrote:

Your list from April:

  • yahoo.com
  • ar.yahoo.com
  • au.yahoo.com
  • br.yahoo.com
  • chinese.yahoo.com
  • de.yahoo.com
  • dk.yahoo.com
  • es.yahoo.com


Reply to this email directly or view it on GitHub
#57 (comment)
.

@alexanderdean
Copy link
Contributor

Okay great - let's add them to the list of unknowns then Fred! Thanks Rob...

@alexanderdean
Copy link
Contributor

Hmm - there's a lot of other country domains in there. I'm nervous someone is going to not like us changing this. Suggest we pause and think about #19 instead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants