From a beginner’s perspective !!
My first data analysis project as a newbie in data science to identify the different factors that affected survival rates among passengers who were aboard ‘The Titanic’.
This project would mainly focus on survival rates of passengers depending on their sex, age, socio-economic status and a few other factors. I have mainly used the library pandas and also integrated a few sql techniques along. I also included various coding practices I came across.
Understanding the variables in the dataset. What does each column represent?
- PassengerId = the unique number that identifies each passenger
- Survived = Value of “1” indicates the passenger survived and “0” indicates otherwise
- Pclass = Passenger class (1 = 1st class, 2 = 2nd, 3 = 3rd)
- Name = Name of passenger
- Sex = Sex of Passenger
- Age = Age of Passenger
- SibSp = Number of Siblings/Spouses of the passenger aboard
- Parch = Number of Parents/Children of the passenger aboard
- Ticket = Ticket number of Passenger
- Fare = Passenger ticket fare
- Cabin = Cabin passenger travelled in
- Embarked = Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton)
1. Pclass: proxy for the passenger’s socio-economic status (1 = Upper, 2 = Middle ,3 = Lower)
2. Sibling: brother, sister, stepbrother, or stepsister of passenger aboard
3. Spouse: husband or wife of passenger aboard(mistresses and fiancees ignored)
4. Parent: mother or father of passenger aboard
5. Child: son, daughter, stepson, or stepdaughter of passenger aboard
*Additional Potential Questions*