Case Study: Cyclistic Bikeshare Analysis

Skills Demonstrated

  • Python and pandas for data analysis

  • Seaborn and matplotlib for visualization

  • Data cleaning and manipulation - cleaning & viz code: Jupyter Notebook

  • Exploratory Data Analysis (EDA). Initial dataset is 5.8 million rows x 13 columns.

  • Univariate hypothesis testing

Business Objective

Cyclistic is a fictional bikeshare company based in Chicago, IL. Our director of marketing wants to expand our bikeshare business by converting current casual riders into annual subscribing members. Use the last 12 months of available Divvy data from Jan 2022 - Jan 2023 to:

Show us how annual members and casual riders use our bikes differently and provide some recommendations so we can launch a more effective campaign to convert casual riders into subscribers.

Executive Summary

Findings

  • The data suggests that leisure and recreation are key use cases for our casual riders, compared to our annual members who are more likely to use our bikes as part of their commute to and from work.

  • Casual riders use our bikes for an average of ~18 mins, whereas member riders clock in at ~12 mins.

  • Casual riders make up the overwhelming majority of rides > 40 mins.

  • Casual riders do not have a spike in usage during the morning commute window (7AM - 10AM) whereas member riders do.

  • Casual riders tend to ride more frequently on the weekends whereas member riders tend to ride more during the week.

  • Casual riders tend to start their trips in tourism/leisure areas, such as the Navy Pier, Lake Shore Drive, Millennium Park, Lincoln Park, Michigan Ave, and Theater on the Lake. Member riders start their rides more inland and closer to businesses and residential areas of the city.

  • Casual riders barely ride during the winter months, but ramp up significantly in May and peak in July. Member riders exhibit similar seasonality but exhibit strong ridership through the Fall.

Recommendations

  • Launch the campaign in the Spring as casual ridership increases.

  • Create value propositions for those who take longer rides (20+ mins)

  • Explore establishing partnerships with restaurants and attractions along Lake Shore Drive, Navy Pier, and Millennium Park and bundle membership with perks at partner restaurants, parks, etc.

  • Explore establishing partnerships to make membership more attractive during the low-ridership months by providing perks with partner ridesharing and transportation services such as Uber, Lyft, etc.

The last two recommendations are the most tentative. To strengthen those recommendations, I would need access to extra data about members and usage. These recommendations address the issues of seasonality in ridership and the leisure use case of casual riders. Specifically, they seek to convert casual riders into annual subscribers by creating value propositions for membership during periods of high and low ridership.

Of course, in an actual business environment there would need to be more collaboration between teams and departments to model the feasibility and profitability of any recommendation. This is beyond the scope of the present case study.


Analysis

To examine the differences in riding habits between our casual and member riders, I will analyze our rider data by following dimensions:

  1. Ride time

  2. Time of day

  3. Weekday vs Weekend

  4. Location

  5. Seasonality

  6. Bike Type Preference

the longest ride

The average ride time for casual riders is ~18 minutes, and the average ride time for member riders is ~12 minutes. This is a statistically significant difference with p<0.05.

Casual riders also have a higher median ride time than our member ride, and they have a skewed distribution toward taking longer rides:

The thick black lines at the end of the boxplots represent values outside of the upper fence of the IQR.

Casual riders make up the majority of those who take trips longer than 40 minutes:

The horizontal black line above reflects an equilibrium level between casual and member riders.

(Give me the) Time of day

Our bikes see some usage in the morning, but most of the usage is from the late afternoon into the early evening (3 PM - 7 PM):

The morning spike in ridership from 7 AM - 10 AM mostly consists of our member riders. This suggests that casual riders are not using our bikes as part of a morning commute as often as our members:

The data shows a spike in member riders in the morning and afternoon into the early evening, which corresponds with typical rush hour traffic volume at 9 AM and 5 PM. The lack of a corresponding morning spike by casual riders may suggest that casual riders aren’t typically using our bikes for commuting. We can investigate this further by examining bike usage by days of the week.

weekend warriors

The following graph shows a high level distribution of weekday vs weekend rides by rider type:

The data indicates that the proportion of casual riders increases on the weekends. We can drill down a little further to examine the proportion of riders by the day of the week:

We see that casual riders use our bikes the least in the middle of the week, but ramp up by the weekend, with Saturday having the highest proportion of casual riders.

The following graph helps us put two pieces of information together about our casual rider population: 1) they ride more frequently on the weekends, and 2) they ride our bikes for longer than our members:

This graph shows us that casual riders ride more frequently and for longer on the weekends. This suggests that casual riders use our bikes for a different purpose than our member riders. Whereas our member riders appear to use our bikes as a part of their commute, the casual riders behavior of longer ride lengths on the weekends indicate leisure or recreation rather than work.

Next, we can look at starting locations in our rider population to see if the data points in that direction.

Location, location, location

Is there a difference in where our casual and member riders start their rides? Let’s take a look at the top 30 stations where riders start their trip:

What a data point! The bike dock at Streeter Drive & Grand Avenue sees a significant number of bike rides starting from that location. The most popular starting destination for our bike ride is Streeter Dr. and Grand Ave. This is right by the Navy Pier and the Centennial Wheel.

Many of these other popular starting destinations are also right along the water, such as DuSable and Monroe. Our second most popular starting point is on Wells St and Concord Ln, right by Lincoln Park.

Let’s break down our top 30 locations by rider type to see if we can get any more insight into the data:

The graph above displays our top 30 most popular stations sorted by the number of casual rides started from that location. We should note that the top 5 locations all have more casual rides and are also places of tourism or leisure.

We already know that our casual rider population tends to take longer rides and ride more frequently on the weekends. The cumulative evidence suggests that leisure and recreation represent significant use cases for our casual riders.

Next, we look at seasonality to see if our casual riders exhibit any differences compared to our annual members.

Seasons come and seasons go

It’s no secret that Chicago has brutal winters. We should expect to see a seasonality effect with low ridership in the colder months that increases as the weather gets nicer.

The data does indeed exhibit a seasonality effect, with lower ridership in the colder months and higher ridership in the warmer months. There are 2 things to note about the data:

  1. There are proportionally less casual riders during the cold months when compared to the member rider population. This suggests that casual riders aren’t as interested in riding our bikes when it’s cold out, whereas our member riders are still seen braving the cold and riding.

  2. Casual ridership peaks in July and decreases steadily throughout the Fall. Member ridership stays pretty even throughout the Summer and into early Fall.

Both rider types exhibit seasonality, but why does our casual ridership peak in July while our member riders continue riding into the early Fall? There’s no way to tell for certain, but we can make some educated guesses:

  • Seasonality often goes hand-in-hand with tourism and leisure. Since our bikeshare program is local to Chicago, it is likely that member riders are Chicago residents who will continue to ride throughout the year. It’s possible that the peak in July is due to visitors and tourists who are vacationing in Chicago.

  • Annual members are those who have already paid for a subscription, so they may be more inclined to make the most of the service by riding throughout the year. It is possible that a portion of our casual riding population is figuring out if bikesharing is for them. The warmer weather is correlated with more outdoors activities, and it may be the case that those who don’t ride our bikes regularly will be trying them out to see if they like it. If they don’t, they will stop riding. But if they do, perhaps we are seeing a conversion from a casual rider to an annual membership. *If we had more data, such as membership data, we could check to see if the data supports this notion.

The chart below shows a cross-section of seasonality and rider type at our top 6 locations. The data shows that our casual riders outweigh our member riders at those stations associated with leisure and recreation:

At these stations associated with tourism and recreation, we see a trend—casual riders outweigh member riders. This data is more evidence to support the claim that recreation and leisure represent significant use cases for our casual rider population.

We will continue our analysis and see if casual and member riders exhibit any differences in the type of bike that they ride.

what’s your type?

Cyclistic has two kinds of bikes, classic and electric bikes. Let’s see if there are any differences in the kinds of bikes chosen by our casual and member riders.

It looks like there are more classic bikes being ridden than electric bikes. However, we can’t make conclusions about our riders preferences from this limited data. We don’t have other important information like the overall availability or number of both kinds of bikes. In a production environment, we would need to ask for more data to get a sense of the utilization rate in order to say more about whether or not our riders exhibit any preferences for either kind of bike.

conclusion

Our casual and annual member populations exhibit significant differences when it comes to how they use our bikes. Casual riders tend to ride for longer and start their trips near stations associated with leisure and recreation, like the Navy Pier and Millennium Park. The number of member riders spikes in the mornings and evenings during the weekday, coinciding with work commutes, whereas the casual riders tend to ride more on the weekends. Past the 40 minute mark, casual riders significantly outweigh member riders.

Thanks for making it all the way down here, I hope that you enjoyed this analysis! I’d love to connect if you’d like to work together on a data project for your business or organization.