Skip to content

Latest commit

 

History

History
159 lines (98 loc) · 8.4 KB

File metadata and controls

159 lines (98 loc) · 8.4 KB

Netflix Viewing Activity Visualizer

Visualize your personal netflix statistics, a small sample project utilizing new features found in Java 8, 9 and 10.

Before you ask, no, I did not accumulate all those hours and episodes on my own. While the viewing history of sub accounts is seperated some people tend to use the service utilizing my profile.

If you create your own graphics and stumble upon a great looking color theme feel free to send me a message and I'll add a small collection of presets.

Sample Output

netflixinfographics

Usage

0. Prerequisites

  • Java 10 Runtime Environment.
    Used to parse the viewing activity file and download additional information for movies and series.
  • R 3.5.1 and RStudio (optional) for visualization
  • A Trakt account to gain access to a movie metadata database.
    1. Create an account at Trakt.tv
    2. Create an "App" to retrieve your api key. https://trakt.tv/oauth/applications/new. All you need to do is choose a random name and put a redirect uri doesn't need to be valid (e.g. "https://localhost.de")
    3. Copy the client id and save it somewhere for later usage
  • Download the distribution.zip archive and extract the files

Check your java version by opening the terminal and type java -version

1 Gather information regarding your viewing activity

First we need to retrieve the viewing history file from Netflix. Sadly Netflix offers only a very limited data set, namely the series/movie title as well as the date it was watched. To generate more interesting statistics additional information like the runtime, genre, actors ... are needed. The java program will query the trakt database and do it's best to collect whatever material it can get it's hands on.

Manually download viewing activity file

Go to https://www.netflix.com/viewingactivity scroll to the bottom and download your viewing activity file.

download

Locate the NetflixAnalyzer.jar file and place the downloaded csv alongside the jar

Open the terminal and type

cd PathToJarFile
java -jar NetflixAnalyzer.jar traktClientId 

howto0_censored

Hint: on windows you only need to type cd and drag and drop the .jar file in the terminal. This will copy and paste the file location automatically for you. Alternatively you can click the url bar of the explorer and copy paste the path.

Click enter and after a minute 3 additional csv files will appear. Warnings are perfectly fine.

howto1

Convert the data to an awesome looking infographic

Now R comes into play. Fire up R Studio and open the CreateInfographics.r file. (Open the R folder and double click the file).

Important: After the R file opened click File -> Reopen with encoding and choose UTF-8.

Scroll down to the settings section (around line 43) and adjust the paths (optinally adjust the color settings). Now you are good to go. Select the entire code block and click run. Ctrl + A -> Ctrl + Enter

rhowto

After a few seconds the infographics should be generated.

Compile yourself

If you wish to modify the code go ahead and clone the repository git clone https://github.com/KilianB/NetflixViewingActivityVisualizer.git. mvn package will run the tests compile classes and bundle the binaries in the distribution.zip.

Data Accuracy

The netflix viewing activity data can be described as minimalistic at best. Once you started watching an episode/movie it will appear in the history file. We have no way to distinguish if someone just peeked at an item or fully watched it therefor the runtime will be overestimated. On the flipside, if an episode/movie was watched a second time the first entry will be removed from the history file resulting in an underestimation. All you can do is to reguarily download your viewing activity file and merge it to get a better representation of the data.

Selenium + a daily batch job + h2database anyone? Maybe a great next weekend project.

A small amount of items are misclassified (movie as a show, show as a movie), either due to the fact that Trakt is not aware of those shows or because the parser's regex isn't good enough. Netflix doesn't make it easy either. Movies may have multiple colons, quotation marks, series may have titles without season number etc. A range of examples can be found in the Unit test TestNetflixParser.java

2 Ideas to increase the retrieval rate:

  1. Attemp to improve the parser (have a look at NetflixParser.java)

So far I used a rather easy regex, be my guest and improve it:

	private final static Pattern splitLine = Pattern.compile("\"(?<Title>.*)\""+INPUT_DELIMITER+"\"(?<Date>.*)\"");
	
	/**
	 * If we have a show try to separate the series, season and episode title
	 */
	private final static Pattern showPattern = Pattern.compile(
			"(?<series>.*(?:(?:Season|Staffel|Part) [0-9]+)(?=:)): (?<epTitle>.*)",
			java.util.regex.Pattern.UNICODE_CHARACTER_CLASS);
	
	/**
	 * Once we get the season extract the season number
	 */
	private final static Pattern seasonPattern = Pattern.compile(
			"(?<series>.*(?=:)): (?<seasonText>[^0-9]*)(?<season>[0-9]*)",
			java.util.regex.Pattern.UNICODE_CHARACTER_CLASS);

A catch. Some series don't have season numbers which this regex assumes they do!.

  1. If Trakt does not return a result when querying for a movie/show try the opposite and see if we receive anything useful.

Take a look at NetflixAnalyzer.java

Disclaimer

The source was never intended to go public therefore no time was spend on the code being understandable or optimized. The main objective was to use some new concepts I haven't had much exposure to recently.

The interesting stuff happens in the gigantic blob NetflixAnalyzer.java

  • Java 8
    • Method reference ::
    • Predicates
  • Java 9:
    • streams
    • lambda expressions
  • Java 10 *local variable type inference

I have no idea how to use R. Everything in the R file should not be considered good coding. I am aware of the glitches in gridTextMulticolor. I just wrote it to be good enough for the use cases I encountered.

License & Credit

The project is licensed under GPLv3. Icons used were downloaded from flaticon and freepik and are licensed by CC 3.0 BY. Individual authors :

The basic theme and layout is based on a blog post by Al-Ahmadgaid Asaad.