jmv

Linear Regression

Linear regression is used to explore the relationship between a continuous dependent variable, and one or more continuous and/or categorical explanatory variables. Other statistical methods, such as ANOVA and ANCOVA, are in reality just forms of linear regression.

Example usage

data('Prestige', package='carData')

linReg(data = Prestige, dep = income,
       covs = vars(education, prestige, women),
       blocks = list(list('education', 'prestige', 'women')))

#
#  LINEAR REGRESSION
#
#  Model Fit Measures
#  ───────────────────────────
#    Model    R        R²
#  ───────────────────────────
#        1    0.802    0.643
#  ───────────────────────────
#
#
#  MODEL SPECIFIC RESULTS
#
#  MODEL 1
#
#
#  Model Coefficients
#  ────────────────────────────────────────────────────────
#    Predictor    Estimate    SE         t         p
#  ────────────────────────────────────────────────────────
#    Intercept      -253.8    1086.16    -0.234     0.816
#    women           -50.9       8.56    -5.948    < .001
#    prestige        141.4      29.91     4.729    < .001
#    education       177.2     187.63     0.944     0.347
#  ────────────────────────────────────────────────────────
#

Arguments

data the data as a data frame
dep the dependent variable from data, variable must be numeric
covs the covariates from data
factors the fixed factors from data
blocks a list containing vectors of strings that name the predictors that are added to the model. The elements are added to the model according to their order in the list
refLevels a list of lists specifying reference levels of the dependent variable and all the factors
intercept 'refLevel' (default) or 'grandMean', coding of the intercept. Either creates contrast so that the intercept represents the reference level or the grand mean
r TRUE (default) or FALSE, provide the statistical measure R for the models
r2 TRUE (default) or FALSE, provide the statistical measure R-squared for the models
r2Adj TRUE or FALSE (default), provide the statistical measure adjusted R-squared for the models
aic TRUE or FALSE (default), provide Aikaike's Information Criterion (AIC) for the models
bic TRUE or FALSE (default), provide Bayesian Information Criterion (BIC) for the models
rmse TRUE or FALSE (default), provide RMSE for the models
modelTest TRUE (default) or FALSE, provide the model comparison between the models and the NULL model
anova TRUE or FALSE (default), provide the omnibus ANOVA test for the predictors
ci TRUE or FALSE (default), provide a confidence interval for the model coefficients
ciWidth a number between 50 and 99.9 (default: 95) specifying the confidence interval width
stdEst TRUE or FALSE (default), provide a standardized estimate for the model coefficients
ciStdEst TRUE or FALSE (default), provide a confidence interval for the model coefficient standardized estimates
ciWidthStdEst a number between 50 and 99.9 (default: 95) specifying the confidence interval width
norm TRUE or FALSE (default), perform a Shapiro-Wilk test on the residuals
qqPlot TRUE or FALSE (default), provide a Q-Q plot of residuals
resPlots TRUE or FALSE (default), provide residual plots where the dependent variable and each covariate is plotted against the standardized residuals.
durbin TRUE or FALSE (default), provide results of the Durbin- Watson test for autocorrelation
collin TRUE or FALSE (default), provide VIF and tolerence collinearity statistics
cooks TRUE or FALSE (default), provide summary statistics for the Cook's distance
emMeans a formula containing the terms to estimate marginal means for, supports up to three variables per term
ciEmm TRUE (default) or FALSE, provide a confidence interval for the estimated marginal means
ciWidthEmm a number between 50 and 99.9 (default: 95) specifying the confidence interval width for the estimated marginal means
emmPlots TRUE (default) or FALSE, provide estimated marginal means plots
emmTables TRUE or FALSE (default), provide estimated marginal means tables
emmWeights TRUE (default) or FALSE, weigh each cell equally or weigh them according to the cell frequency

Returns

A results object containing:

results$modelFit a table
results$modelComp a table
results$models an array of groups

Tables can be converted to data frames with asDF or as.data.frame(). For example:

results$modelFit$asDF

as.data.frame(results$modelFit)

Elements in arrays can be accessed with [[n]]. For example:

results$models[[1]] # accesses the first element