statistics - R logistic regression model.matrix -
i new r , trying understand solution of logistic regression. done far remove unused variables, split data train , test datasets. trying t understand part of talks model.matrix. getting r , statistics , not sure of model.matrix , contracts. here code:
## create design matrix; indicators categorical variables (factors) xdel <- model.matrix(delay~.,data=datafd_new)[,-1] xtrain <- xdel[train,] xnew <- xdel[-train,] ytrain <- del$delay[train] ynew <- del$delay[-train] m1=glm(delay~.,family=binomial,data=data.frame(delay=ytrain,xtrain)) summary(m1)
can please tell me usage of model.matrix? why cant directly create dummy variables of categorical variables , put them in glm? confused. usage of model.matrix?
marius' comment explains how - below code gives example (which felt helpful since poster still confused).
# create example dataset. 'catvar' represents categorical variable despite being coded numbers. x = data.frame("catvar" = sample(c(1, 2, 3), 100, replace = t), "numvar" = rnorm(100), "y" = sample(c(0, 1), 100, replace = t)) # check whether you're categorical variables coded correctly. (they'll 'factor' if so) sapply(x, class) #catvar coded 'numeric', wrong. # tell 'r' catvar categorical. if categorical variables classed factors, can skip step x$catvar = factor(x$catvar) sapply(x, class) # check variables coded correctly # fit model dataframe (i.e. without needing convert x model matrix) fit = glm(y ~ numvar + catvar, data = x, family = "binomial")
Comments
Post a Comment