r - Applying logistic regression in titanic dataset -
i have famous titanic data set kaggle's website. want predict survival of passengers using logistic regression. using glm() function in r. first divide data frame(total rows = 891) 2 data frames i.e. train(from row 1 800) , test(from row 801 891). code follows
` >> data <- read.csv("train.csv", stringsasfactors = false) >> names(data) `[1] "passengerid" "survived" "pclass" "name" "sex" "age" "sibsp" [8] "parch" "ticket" "fare" "cabin" "embarked" ` #replacing na values in age column mean value of non na values of age. >> data$age[is.na(data$age)] <- mean(data$age, na.rm = true) #converting sex binary values. 1 males , 0 females. >> sexcode <- ifelse(data$sex == "male",1,0) #dividing data train , test data frames >> train <- data[1:800,] >> test <- data[801:891,] #setting model using glm() >> model <- glm(survived~sexcode[1:800]+age+pclass+fare,family=binomial(link='logit'),data=train, control = list(maxit = 50)) #creating data frame >> newtest <- data.frame(sexcode[801:891],test$age,test$pclass,test$fare) >> prediction <- predict(model,newdata = newtest,type='response')
`
and run last line of code
prediction <- predict(model,newdata = newtest,type='response')
i following error
error in eval(expr, envir, enclos) : object 'age' not found
can please explain problem is. have checked newteset variable , there doesn't seem problem in that.
here link titanic data set https://www.kaggle.com/c/titanic/download/train.csv
first, should add sexcode
directly dataframe:
data$sexcode <- ifelse(data$sex == "male",1,0)
then, commented, have problem in columns names in newtest
dataframe because create manually. can use directly test
dataframe.
so here full working code:
data <- read.csv("train.csv", stringsasfactors = false) data$age[is.na(data$age)] <- mean(data$age, na.rm = true) data$sexcode <- ifelse(data$sex == "male",1,0) train <- data[1:800,] test <- data[801:891,] model <- glm(survived~sexcode+age+pclass+fare,family=binomial(link='logit'),data=train, control = list(maxit = 50)) prediction <- predict(model,newdata = test,type='response')
Comments
Post a Comment