README: Ground Truth Labeling

Files

ground_truth_runner.py

Requires label_ground_truth.py.

Iterates over past crawl data databases in /crawl folder, and labels positive, negative, and unknown cookie matching instances for each crawl by calling label_ground_truth.py.

To run ground_truth_runner.py: ground_truth_runner.py [-h] --par | --no-par [--progress-bar | --no-progress-bar] [-v {0,1,2}]

Typical usage: ground_truth_runner.py --par

label_ground_truth.py

Iterates over inputted crawl database. From individual redirect rows (graph edge), labels redirect as positive, negative, or unknown for cookie matching. Returns these labels and their respective domains to ground_truth_runner.py.

This file is not intended to be used directly.

Papadapolous Cookie Synchronization Method

Paper

Extract all browser cookies set, via openWPM javascript_cookies table
- Filter out session cookies (cookies without expiration date)
- Parse cookie values using common delimiters (:, &)
Detect possible cookie_id sharing events in the http_redirects table
- Identify ID-looking strings (> 10 alphanumeric) in:
  - requested redirect parameters
  - requested redirect path
  - requested redirect location header.
- If this ID is seen for the first time, store in hashtable with URL's domain. If this ID has been seen before, consider it as a shared ID, and the requests carrying it as ID-sharing requests.
- Use entity_map.json to determine organizations of domains, to discriminate between intentional ID leaking, and internal ID sharing (avoid false-positives).
A detected shared ID is considered a cookie sync if the shared ID matches an extracted browser cookie from the first step.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

README: Ground Truth Labeling

Files

ground_truth_runner.py

label_ground_truth.py

Papadapolous Cookie Synchronization Method

Files

README.md

Latest commit

History

README.md

File metadata and controls

README: Ground Truth Labeling

Files

ground_truth_runner.py

label_ground_truth.py

Papadapolous Cookie Synchronization Method