Code
library(plotly)
library(tidyverse)
library(stringr)
library(doFuture)
library(tidymodels)Tony Duan
January 1, 2023



Rows: 891
Columns: 12
$ PassengerId <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,…
$ Survived <int> 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1…
$ Pclass <int> 3, 1, 3, 1, 3, 3, 1, 3, 3, 2, 3, 1, 3, 3, 3, 2, 3, 2, 3, 3…
$ Name <chr> "Braund, Mr. Owen Harris", "Cumings, Mrs. John Bradley (Fl…
$ Sex <chr> "male", "female", "female", "female", "male", "male", "mal…
$ Age <dbl> 22, 38, 26, 35, 35, NA, 54, 2, 27, 14, 4, 58, 20, 39, 14, …
$ SibSp <int> 1, 1, 0, 1, 0, 0, 0, 3, 0, 1, 1, 0, 0, 1, 0, 0, 4, 0, 1, 0…
$ Parch <int> 0, 0, 0, 0, 0, 0, 0, 1, 2, 0, 1, 0, 0, 5, 0, 0, 1, 0, 0, 0…
$ Ticket <chr> "A/5 21171", "PC 17599", "STON/O2. 3101282", "113803", "37…
$ Fare <dbl> 7.2500, 71.2833, 7.9250, 53.1000, 8.0500, 8.4583, 51.8625,…
$ Cabin <chr> "", "C85", "", "C123", "", "", "E46", "", "", "", "G6", "C…
$ Embarked <chr> "S", "C", "S", "S", "S", "Q", "S", "S", "S", "C", "S", "S"…
Rows: 418
Columns: 11
$ PassengerId <int> 892, 893, 894, 895, 896, 897, 898, 899, 900, 901, 902, 903…
$ Pclass <int> 3, 3, 2, 3, 3, 3, 3, 2, 3, 3, 3, 1, 1, 2, 1, 2, 2, 3, 3, 3…
$ Name <chr> "Kelly, Mr. James", "Wilkes, Mrs. James (Ellen Needs)", "M…
$ Sex <chr> "male", "female", "male", "male", "female", "male", "femal…
$ Age <dbl> 34.5, 47.0, 62.0, 27.0, 22.0, 14.0, 30.0, 26.0, 18.0, 21.0…
$ SibSp <int> 0, 1, 0, 0, 1, 0, 0, 1, 0, 2, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0…
$ Parch <int> 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ Ticket <chr> "330911", "363272", "240276", "315154", "3101298", "7538",…
$ Fare <dbl> 7.8292, 7.0000, 9.6875, 8.6625, 12.2875, 9.2250, 7.6292, 2…
$ Cabin <chr> "", "", "", "", "", "", "", "", "", "", "", "", "B45", "",…
$ Embarked <chr> "Q", "S", "Q", "S", "S", "S", "Q", "S", "C", "S", "S", "S"…
https://www.kaggle.com/competitions/titanic/overview
https://github.com/Kaggle/kaggle-api
https://github.com/mkearney/kaggler
---
title: "Titanic - Machine Learning from Disaster"
author: "Tony Duan"
date: "2023-01-01"
categories: [data]
execute:
warning: false
error: false
format:
html:
toc: true
code-fold: show
code-tools: true
---
{width="420"}
```{r}
library(plotly)
library(tidyverse)
library(stringr)
library(doFuture)
library(tidymodels)
```
## 1. download data
## 2. read data
```{r}
list.files("data")
```
```{r}
train <- read.csv("data/train.csv")
test <- read.csv("data/test.csv")
gender_submission <- read.csv("data/gender_submission.csv")
```
## 3. data Dictionary


```{r}
glimpse(train)
```
```{r}
glimpse(test)
```
```{r}
glimpse(gender_submission)
```
## 4. data cleanning
## 6. model
## Reference
https://www.kaggle.com/competitions/titanic/overview
https://github.com/Kaggle/kaggle-api
https://github.com/mkearney/kaggler