Investigating Titanic Data

Abinaya Jayaprakash
6 min readOct 2, 2020

From a beginner’s perspective !!

My first data analysis project as a newbie in data science to identify the different factors that affected survival rates among passengers who were aboard ‘The Titanic’.

Introduction

This project would mainly focus on survival rates of passengers depending on their sex, age, socio-economic status and a few other factors. I have mainly used the library pandas and also integrated a few sql techniques along. I also included various coding practices I came across.

Data Wrangling

Understanding the variables in the dataset. What does each column represent?

  • PassengerId = the unique number that identifies each passenger
  • Survived = Value of “1” indicates the passenger survived and “0” indicates otherwise
  • Pclass = Passenger class (1 = 1st class, 2 = 2nd, 3 = 3rd)
  • Name = Name of passenger
  • Sex = Sex of Passenger
  • Age = Age of Passenger
  • SibSp = Number of Siblings/Spouses of the passenger aboard
  • Parch = Number of Parents/Children of the passenger aboard
  • Ticket = Ticket number of Passenger
  • Fare = Passenger ticket fare
  • Cabin = Cabin passenger travelled in
  • Embarked = Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton)

*Special Notes*

1. Pclass: proxy for the passenger’s socio-economic status (1 = Upper, 2 = Middle ,3 = Lower)

2. Sibling: brother, sister, stepbrother, or stepsister of passenger aboard

3. Spouse: husband or wife of passenger aboard(mistresses and fiancees ignored)

4. Parent: mother or father of passenger aboard

5. Child: son, daughter, stepson, or stepdaughter of passenger aboard

*Additional Potential Questions*

Abinaya Jayaprakash

Srilankan living in Berlin. Mathematics master student at Freie Universitat. Interested in Data science & Machine Learning

Recommended from Medium

Lists