statistics - R logistic regression model.matrix -


i new r , trying understand solution of logistic regression. done far remove unused variables, split data train , test datasets. trying t understand part of talks model.matrix. getting r , statistics , not sure of model.matrix , contracts. here code:

## create design matrix; indicators categorical variables (factors) xdel <- model.matrix(delay~.,data=datafd_new)[,-1] xtrain <- xdel[train,] xnew <- xdel[-train,] ytrain <- del$delay[train] ynew <- del$delay[-train] m1=glm(delay~.,family=binomial,data=data.frame(delay=ytrain,xtrain)) summary(m1) 

can please tell me usage of model.matrix? why cant directly create dummy variables of categorical variables , put them in glm? confused. usage of model.matrix?

marius' comment explains how - below code gives example (which felt helpful since poster still confused).

# create example dataset. 'catvar' represents categorical variable despite being coded numbers. x = data.frame("catvar" = sample(c(1, 2, 3), 100, replace = t),                "numvar" = rnorm(100),                 "y" = sample(c(0, 1), 100, replace = t))  # check whether you're categorical variables coded correctly. (they'll 'factor' if so) sapply(x, class) #catvar coded 'numeric', wrong.  # tell 'r' catvar categorical. if categorical variables classed factors, can skip step x$catvar = factor(x$catvar) sapply(x, class) # check variables coded correctly  # fit model dataframe (i.e. without needing convert x model matrix) fit = glm(y ~ numvar + catvar, data = x, family = "binomial") 

Comments

Popular posts from this blog

Is there a better way to structure post methods in Class Based Views -

performance - Why is XCHG reg, reg a 3 micro-op instruction on modern Intel architectures? -

jquery - Responsive Navbar with Sub Navbar -