r - Applying logistic regression in titanic dataset -


i have famous titanic data set kaggle's website. want predict survival of passengers using logistic regression. using glm() function in r. first divide data frame(total rows = 891) 2 data frames i.e. train(from row 1 800) , test(from row 801 891). code follows

` >> data <- read.csv("train.csv", stringsasfactors = false)  >> names(data)   `[1] "passengerid" "survived"    "pclass"      "name"        "sex"             "age"         "sibsp"        [8] "parch"       "ticket"      "fare"        "cabin"       "embarked" `    #replacing na values in age column mean value of non na values of age. >> data$age[is.na(data$age)] <- mean(data$age, na.rm = true)  #converting sex binary values. 1 males , 0 females. >> sexcode <- ifelse(data$sex == "male",1,0) #dividing data train , test data frames >> train <- data[1:800,]  >> test <- data[801:891,] #setting model using glm()  >> model <- glm(survived~sexcode[1:800]+age+pclass+fare,family=binomial(link='logit'),data=train, control = list(maxit = 50))  #creating data frame >> newtest <- data.frame(sexcode[801:891],test$age,test$pclass,test$fare)  >> prediction <- predict(model,newdata = newtest,type='response') 

`

and run last line of code

prediction <- predict(model,newdata = newtest,type='response') 

i following error

error in eval(expr, envir, enclos) : object 'age' not found

can please explain problem is. have checked newteset variable , there doesn't seem problem in that.

here link titanic data set https://www.kaggle.com/c/titanic/download/train.csv

first, should add sexcode directly dataframe:

data$sexcode <- ifelse(data$sex == "male",1,0) 

then, commented, have problem in columns names in newtest dataframe because create manually. can use directly test dataframe.

so here full working code:

  data <- read.csv("train.csv", stringsasfactors = false)   data$age[is.na(data$age)] <- mean(data$age, na.rm = true)   data$sexcode <- ifelse(data$sex == "male",1,0)    train <- data[1:800,]   test <- data[801:891,]    model <- glm(survived~sexcode+age+pclass+fare,family=binomial(link='logit'),data=train, control = list(maxit = 50))    prediction <- predict(model,newdata = test,type='response') 

Comments

Popular posts from this blog

java - Jasper subreport showing only one entry from the JSON data source when embedded in the Title band -

serialization - Convert Any type in scala to Array[Byte] and back -

SonarQube Plugin for Jenkins does not find SonarQube Scanner executable -