Well, all of us definitely feel good to achieve high in all we do, and this competition helped me experience this on Kaggle.
This article is gonna summarize what this competition on Digit Recognition using CNN’s is all about and how you need to submit your predictions in the form of a csv file on Kaggle.
Did the movie get a positive or negative review?
This article would explain the steps to building a sentiment classification model using the “IMDB dataset of 50k movie reviews”.
Few techniques that could come in handy :]
This article would briefly explain SQL queries, statistical tests, and visualization methods using the New York subway weather data.
These are covered in detail in the course “Intro to Data Science” by Udacity which I highly recommend.
Why not try out machine learning along with data analysis??
Machine learning is a branch of artificial intelligence where we construct models/systems that learn and study data to make predictions.
This is what I tried out as well using the analysis I did on the titanic data set. A model that can predict the chance of survival for any passenger based on data about them.
Let us begin creating our model. What would the first step be?
The data set I chose is “Titanic: Machine Learning from Disaster” from Kaggle and which contains two separate train and test data files.
I guess this is one question everyone who codes asks themself frequently. Well, one best way would be to try and create your own projects and try modifying snippets of code to make them more elegant and effective.
And in particular, I found these super cool mini-projects at the end of the “Learn Python for free” course provided by Scrimba.com. This course comprises around 59 interactive screencasts that help beginners learn from scratch or the experienced refresh their knowledge and maybe look at different ways of code implementation.
Let’s look at one of the projects in detail and the steps…
From a beginner’s perspective !!
My first data analysis project as a newbie in data science to identify the different factors that affected survival rates among passengers who were aboard ‘The Titanic’.
This project would mainly focus on survival rates of passengers depending on their sex, age, socio-economic status and a few other factors. I have mainly used the library pandas and also integrated a few sql techniques along. I also included various coding practices I came across.
Understanding the variables in the dataset. What does each column represent?