Metis Project 1 : Analysis for WomenTechWomenYes Summer Gala
We just finished our first week at Metis in Seattle, Washington, USA. The past few days have been a whirlwind of both review and new materials. As our first project, we leveraged the Python modules Pandas and Seaborn to perform rudimentary EDA and data visualization. In this article, I’ll take you through our approach to framing the problem, designing the data pipeline, and ultimately implementing the code. You can find the project on GitHub, along with links to all the data.
I worked with Aisulu Omar from Kazakhstan, Alex Lou, and Dr. Jeremy Lehner both from America on this project.
I am Nigerian and you can find me on Linkedln.
The Challenge:
WomenTechWomenYes (WTWY), a(-n imaginary) non-profit organization in New York City, is raising money for their annual summer gala. . For marketing purposes, they place street teams at entrances to subway stations to collect email addresses. Those who sign up are sent free tickets to the gala. Our goal is to use MTA subway data and other external data sources, to help optimize the placement of the teams, so that they can gather the most signatures of people who will attend and contribute to WTWY’s cause.
Our Data:
We used three main data sources for this project.
Yelp Fusion Api to figure out the zip codes of each station.
University of Michigan Median and Mean income data to zip code.
Our Approach:
For our approach, we discussed as a team on what would be our Minimum Viable Product (MVP) for our client and we came up with three goals.
Find the busiest stations by traffic to easily deploy the street teams.
Find the busiest days of the week at the train stations.
Find and join income data to the busiest stations to figure out who will donate to our cause.
Our Steps:
We took the following steps to get our results, these steps are general data science steps to a solution and are usually iterative.
Data gathering from our data sources.
Data cleaning
Data aggregating
Data insights
Client Recommendations
Our Insights and Reasons
After downloading , cleaning, and aggregating the datasets, we noticed the following:
Wednesdays are the busiest days
Busiest Day of the Week is Wednesday by Number of Entries
The top 5 Busiest stations by traffic are:
34th St - Penn Station
42nd St - Grand Central
34th St - Herald Square
14th St - Union Sq
42nd St - Times Sq
Top 5 Busiest NYC Stations in the Summer
Why?
This is because the top 5 stations are located near the Midtown Area of New York City which is commonly busy during the summers.
Major restaurants, Landmarks , Colleges and Technology Companies are situated around this area.
Google Map route of the Top 5 Stations in Proximity to one another
After joining income data to the busiest stations and filtering for those who made $70,000 and above, we found:
Grand Central - 42 Street to be the station with the highest income.
Our Recommendation:
From our analysis we recommend that WomenTechWomenYes deploy street teams on Wednesdays during peak hours to 34 ST Penn Station and 42nd Grand Central to best target their appropriate audience.
Conclusion:
I would like to thank the Metis team and my classmates for their thoughtful questions and feedback. As we continue with future projects, I hope to incorporate those lessons in them. Our slides can be found below