The goal of this project is to calculate the population density of subway stations to easily allow people to avoid crowds. I planned to do this by reviewing several MTA subway metrics and using them to estimate the population density at subway stations. So far I have reviewed the data sets and analyzed key patterns for overlapping data. I have not yet made a visualization but I will probably use a histogram to show pouplation on certain days of the week.

Dataset: https://new.mta.info/agency/new-york-city-transit/subway-bus-ridership-2020
This dataset had data relating to ridership on trains, buses and highways. I used the data from the columns for the date, the Subway: Total Estimated Ridership, Subway: % of Comparable Pre-Pandemic Day and Bridges and Tunnels: % of Comparable Pre-Pandemic Day.

Technique: I used the libraries pandas, numpy and pandasql to first clean the data. I then used df.plot.scatter() to make a scatter plot to see the relation between date and subway ridership.