Well, all of us definitely feel good to achieve high in all we do, and this competition helped me experience this on Kaggle.
This article is gonna summarize what this competition on Digit Recognition using CNN’s is all about and how you need to submit your predictions in the form of a csv file on Kaggle.
The data files contain gray-scale images of hand-drawn digits, from zero through nine.
As seen in the output, the training data set has 785 columns. The first column, called “label”, is the digit that was drawn by the user. …
Did the movie get a positive or negative review?
This article would explain the steps to building a sentiment classification model using the “IMDB dataset of 50k movie reviews”.
We convert the data in the form of a csv file to a data frame and make note of the data type in each column.
All necessary libraries and packages for text preprocessing and building the model are imported.
This process refers to steps taken to transfer text from human language to a machine-readable format for further analysis. The better your text is preprocessed, the more accurate the results obtained are. …
Few techniques that could come in handy :]
This article would briefly explain SQL queries, statistical tests, and visualization methods using the New York subway weather data.
These are covered in detail in the course “Intro to Data Science” by Udacity which I highly recommend.
Why not try out machine learning along with data analysis??
Machine learning is a branch of artificial intelligence where we construct models/systems that learn and study data to make predictions.
This is what I tried out as well using the analysis I did on the titanic data set. A model that can predict the chance of survival for any passenger based on data about them.
Let us begin creating our model. What would the first step be?
The data set I chose is “Titanic: Machine Learning from Disaster” from Kaggle and which contains two separate train and test data files.
Analyzing the train and test data is essential, which is explained in detail in my article “Investigating Titanic Data”. …
From a beginner’s perspective !!
My first data analysis project as a newbie in data science to identify the different factors that affected survival rates among passengers who were aboard ‘The Titanic’.
This project would mainly focus on survival rates of passengers depending on their sex, age, socio-economic status and a few other factors. I have mainly used the library pandas and also integrated a few sql techniques along. I also included various coding practices I came across.
Understanding the variables in the dataset. What does each column represent?