How can we better wrangle, analyze, and visualize data?
Few techniques that could come in handy :]
This article would briefly explain SQL queries, statistical tests, and visualization methods using the New York subway weather data.
These are covered in detail in the course “Intro to Data Science” by Udacity which I highly recommend.
Wrangling subway data
- Using SQL queries: Pandasql makes accessing, reading, and interpreting data stored in data frames much easier, especially for someone who is new to python or pandas. It helps us choose the columns/data we need to make predictions or arrive at conclusions. To gain a better intuition into how SQL works please do refer to this course.
Does the day of the week affect hourly ridership in the subway? This query would help you seek the answer to it.
Firstly create a new data frame with the columns you need, especially when the data frame is too big since this helps you refer to the data in the columns easily. Then type in your query and finally use pandasql to view your results.