Investigating Titanic Data

Abinaya Jayaprakash
6 min readOct 2, 2020

From a beginner’s perspective !!

My first data analysis project as a newbie in data science to identify the different factors that affected survival rates among passengers who were aboard ‘The Titanic’.

Introduction

This project would mainly focus on survival rates of passengers depending on their sex, age, socio-economic status and a few other factors. I have mainly used the library pandas and also integrated a few sql techniques along. I also included various coding practices I came across.

Data Wrangling

Understanding the variables in the dataset. What does each column represent?

  • PassengerId = the unique number that identifies each passenger
  • Survived = Value of “1” indicates the passenger survived and “0” indicates otherwise
  • Pclass = Passenger class (1 = 1st class, 2 = 2nd, 3 = 3rd)
  • Name = Name of passenger
  • Sex = Sex of Passenger
  • Age = Age of Passenger
  • SibSp = Number of Siblings/Spouses of the passenger aboard

--

--

Abinaya Jayaprakash

Srilankan living in Berlin. Graduate Trainee - Technology at Deutsche Bank. Interested in Data science & Machine Learning