Analyzing Citibike bike-sharing data with Tableau.
The city of Des Moines has requested an analysis of data from New York Citibike. They're thinking of implementing a similar system in Des Moines, and will use conclusions from the analysis to refine their supporting business strategy. We will look specifically at data regarding user ride times (both checkout/turn-in times and ride durations), and also analyze that data further by gender demographic.
Click here to view the interactive Tableau Story
In order for us to properly utilize and visualize the "Trip Duration" data, it must be converted from its raw interger format to a date-time format. This is accomplished by reading the source .csv file into a pandas dataframe (in Jupyter Notebook) and converting the column with the to_datetime() method.
- Link to Jupyter Notebook file "NYC_CitiBike_Challenge.ipynb" can be found HERE
Once the correct data-types have been verified, the dataframe is then exported to a new .csv file ("data.csv").
- Our first chart tracks the average trip duration per number of users. We can see that the vast majority of users check out the bikes for less than 20 minutes, reaching peak usage around just 5 minutes per user.
- Five minutes doesn't seem like a very long time to check out a bike, and might be worth investigating if that's purely because the users don't need to cover a large distance, or if the bikes themselves are unpleasant to use, somehow.
- When we break down the first visualization by gender, we notice similar usage patterns persist. Usage for all genders crests at around 5 minutes, then tapers off dramatically. After around 50 minutes or so, only a few hundred users are riding the bikes.
- The heatmap shown above tracks the density of bike checkouts, aggregated by hour and week day. We can see dense usage during commute hours, during the week. While this was expected, there are a couple of fluctations in the pattern that may warrant some further investigation:
- Higher than average usage on Thursdays, especially in the evenings. Is this the result of a "spare the air" initiative, or a large weekly scheduled activity of some sort? Is street traffic particularly bad on Thursdays, and if so, why?
- Lower than average usage on Wednesdays, particularly during the evening commute hour. This contrasts sharply with the elevated usage on the following day, are the highs and lows related?
- Weekend usage follows a pattern of general usage during the day, between around 9am to 7pm, with slightly more usage on Saturdays than Sundays.
- The above breakdown of the previous visualization suggests that the majority of users, of all genders, share similar habits pertaining to checkout times. The light coloration on the chart showing female users is a reflection of the much smaller number of users, but the patterns of usage persist.
Dashboard: Gender Breakdown of users (By Birth Year), Gender breakdown of total users, User Trips by Gender (by Week Day), and Breakdown of total users by User Type
The dashboard shown above contains many relevant measurements regarding CitiBike's userbase.
- The pie chart, top right, shows the gender breakdown for all users. We can see that the large majority of citibike's users are male. When we look the pie chart in the lower right corner (Breakdown of total users by User type), we can see that the majority of users are subscribers, rather than one-time-users. Comparing these, we learn that the bulk of CitiBike's users are Male Subscribers.
- The top left series of area charts show the density of CitiBike's userbase, broken down by gender and birth year. The data count seems to crest, for all users, around the year 1990, and then falls off dramatically. This is a bit odd, since this data was collected in 2019, which means that the bulk of the program's users are 29 and older. Even if we assume that a user must be at least 16 to utilize the program (the latest recorded user birth year is 2003), there is a huge drop off in users between the ages of 16 and 29 (most notably between ages 16 and 21).
- The heatmap shown at bottom-middle of the dashboard shows patterns of usage by week day, broken down by gender and usertype. While the portion showing data for male subscribers seems to confirm our previous findings regarding usage during week days (particularly the above-average utilization on Thursdays), one other important thing is clear: CitiBike doesn't allow its subscribers the option not to disclose their gender, or elect a third/non-binary option. This makes it fairly difficult to compare the data between customers and subscribers, but we can gleem from the slightly elevated usage for customers of "unknown" gender on Saturdays that the weekends see more single-use customers. We can infer from this that users who rely on the bikes for commute purposes are more likely to be subscribers.
Click here to view the interactive Tableau Story
There are several clear takeaways from our data, but also some new questions that warrant further investigation.
- The things we know for sure:
- The bulk of users are male subscribers.
- Subscribers use the bikes most during commute hours on week days, implying they are using the bikes to travel to and from work (as opposed to recreationally).
- Most users are 29 or older.
- Peak bike usage occurs on Thursdays, between 5pm and 8pm.
- Most users check out the for less than 20 minutes, with the most users checking out for only 5 minutes. This implies that either they live close to where they work, or they're checking out bikes at public transit hubs that are near their jobs.
- Opportunities for investigation:
- What is causing the spike in usage on Thursdays? What is causing the dropoff in usage on Wednesdays? Are the two patterns related?
- How would the usage patterns differ if a non-binary/3rd gender option was offered to subscribers?
- How can the service be better marketed to young people, to get customer numbers up as they become eligible to use the service?
- How can the service be better marketed to women? Only 25% of all users are women, yet they exibit the same usage patterns as men.
In the future, I recommend creating map charts to see if there are areas that get more usage from female customers. Perhaps the low usage among women is the result of an environmental factor, such as unsafe/poorly lit areas.
A similar chart can be created to show usage patters for users of different ages. Perhaps there are areas that young people frequent more, and more availability of bikes near transit hubs in those areas could lead to more use among users between the ages of 16 and 29.