Extension and Customization
library(modelsummary)
Unsupported models: modelsummary_list
The simplest way to summarize an unsupported model is to create a modelsummary_list
object. This approach is super flexible, but it requires manual intervention, and it can become tedious if you need to summarize many models. The next section shows how to add formal support for an unsupported model type.
A modelsummary_list
is a list with two element that conform to the broom
package specification: tidy
and glance
. tidy
is a data.frame with at least three columns: term
, estimate
, and std.error
. glance
is a data.frame with only a single row, and where each column will be displayed at the bottom of the table in the goodness-of-fit section. Finally, we wrap those two elements in a list and assign it a modelsummary_list
class:
<- data.frame(
ti term = c("coef1", "coef2", "coef3"),
estimate = 1:3,
std.error = c(pi, exp(1), sqrt(2)))
<- data.frame(
gl stat1 = "blah",
stat2 = "blah blah")
<- list(
mod tidy = ti,
glance = gl)
class(mod) <- "modelsummary_list"
modelsummary(mod)
(1) | |
---|---|
coef1 | 1.000 |
(3.142) | |
coef2 | 2.000 |
(2.718) | |
coef3 | 3.000 |
(1.414) | |
stat1 | blah |
stat2 | blah blah |
Unsupported models: glance
and tidy
modelsummary
relies on two functions from the broom
package to extract model information: tidy
and glance
. If broom
doesn’t support the type of model you are trying to summarize, modelsummary
won’t support it out of the box. Thankfully, it is extremely easy to add support for most models using custom methods.
For example, models produced by the MCMCglmm
package are not currently supported by broom
. To add support, you simply need to create a tidy
and a glance
method:
# load packages and data
library(modelsummary)
library(MCMCglmm)
data(PlodiaPO)
# add custom functions to extract estimates (tidy) and goodness-of-fit (glance) information
<- function(x, ...) {
tidy.MCMCglmm <- summary(x, ...)
s <- data.frame(
ret term = row.names(s$solutions),
estimate = s$solutions[, 1],
conf.low = s$solutions[, 2],
conf.high = s$solutions[, 3])
ret
}
<- function(x, ...) {
glance.MCMCglmm <- data.frame(
ret dic = x$DIC,
n = nrow(x$X))
ret
}
# estimate a simple model
<- MCMCglmm(PO ~ 1 + plate, random = ~ FSfamily, data = PlodiaPO, verbose=FALSE, pr=TRUE)
model
# summarize the model
modelsummary(model, statistic = 'conf.int')
Three important things to note.
First, the methods are named tidy.MCMCglmm
and glance.MCMCglmm
because the model object I am trying to summarize is of class MCMCglmm
. You can find the class of a model by running: class(model)
.
Second, both of the methods include the ellipsis ...
argument.
Third, in the example above we used the statistic = 'conf.int'
argument. This is because the tidy
method produces conf.low
and conf.high
columns. In most cases, users will define std.error
column in their custom tidy
methods, so the statistic
argument will need to be adjusted.
If you create new tidy
and glance
methods, please consider contributing them to broom
so that the rest of the community can benefit from your work: https://github.com/tidymodels/broom
Modifying information: tidy_custom
and glance_custom
Users may want to include more information than is made available by the default extractor function. For example, models produced by the MASS::polr
do not produce p values by default, which means that we cannot use the stars=TRUE
argument in modelsummary
. However, it is possible to extract this information by using the lmtest::coeftest
function. To include such custom information, we will define new glance_custom
and tidy_custom
methods.
We begin by estimating a model with the MASS::polr
:
library(MASS)
<- polr(as.ordered(gear) ~ mpg + drat, data = mtcars)
mod_ordinal
get_estimates(mod_ordinal)
term estimate std.error conf.level conf.low conf.high statistic df.error p.value component s.value group
1 3|4 13.962948761 4.04107300 0.95 5.6851860 22.2407116 3.45525774 28 0.0017706303 alpha 9.1
2 4|5 16.898937342 4.39497069 0.95 7.8962480 25.9016267 3.84506258 28 0.0006356348 alpha 10.6
3 mpg -0.008646682 0.09034201 0.95 -0.1916706 0.1708667 -0.09571053 28 0.9244322098 beta 0.1
4 drat 3.949431923 1.30665144 0.95 1.6191505 6.8457246 3.02255965 28 0.0053120619 beta 7.6
The get_estimates
function shows that our default extractor does not produce a p.value
column. As a result, setting stars=TRUE
in modelsummary
will produce an error.
We know that the MASS::polr
produces an object of class polr
:
class(mod_ordinal)
[1] "polr"
To extract more (custom) information from a model of this class, we thus define a method called tidy_custom.polr
which returns a data.frame with two columns: term
and p.value
:
<- function(x, ...) {
tidy_custom.polr <- lmtest::coeftest(x)
s <- data.frame(
out term = row.names(s),
p.value = s[, "Pr(>|t|)"])
out }
When this method is defined, modelsummary
can automatically extract p values from all models of this class, and will now work properly with stars=TRUE
:
modelsummary(mod_ordinal, stars = TRUE)
(1) | |
---|---|
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001 | |
3|4 | 13.963** |
(4.041) | |
4|5 | 16.899*** |
(4.395) | |
mpg | -0.009 |
(0.090) | |
drat | 3.949** |
(1.307) | |
Num.Obs. | 32 |
AIC | 51.1 |
BIC | 57.0 |
RMSE | 3.44 |
New information: tidy_custom
and glance_custom
Sometimes users will want to include information that is not supplied by those functions. A pretty easy way to include extra information is to define new glance_custom
and tidy_custom
methods. To illustrate, we estimate two linear regression models using the lm
function:
library(modelsummary)
<- list()
mod 1]] <- lm(hp ~ mpg + drat, mtcars)
mod[[2]] <- lm(wt ~ mpg + drat + am, mtcars) mod[[
In R
, the lm
function produces models of class “lm”:
class(mod[[1]])
[1] "lm"
Let’s say you would like to print the dependent variable for each model of this particular class. All you need to do is define a new method called glance_custom.lm
. This method should return a data.frame (or tibble) with 1 row, and 1 column per piece of information you want to display. For example:
<- function(x, ...) {
glance_custom.lm <- as.character(formula(x)[2])
dv <- data.frame("DV" = dv)
out return(out)
}
Now, let’s customize the body of the table. The vcov
argument already allows users to customize uncertainty estimates. But imagine you want to override the coefficient estimates of your “lm” models. Easy! All you need to do is define a tidy_custom.lm
method which returns a data.frame (or tibble) with one column called “term” and one column called “estimate”.
Here, we’ll substitute estimates by an up/down-pointing triangles which represents their signs:
<- function(x, ...) {
tidy_custom.lm <- summary(x)$coefficients
s <- data.frame(
out term = row.names(s),
estimate = ifelse(s[,1] > 0, '▲', '▼'))
return(out)
}
After you define the glance_custom
and tidy_custom
methods, modelsummary
will automatically display your customized model information:
modelsummary(mod)
(1) | (2) | |
---|---|---|
(Intercept) | ▲ | ▲ |
(55.415) | (0.728) | |
mpg | ▼ | ▼ |
(1.792) | (0.019) | |
drat | ▲ | ▼ |
(20.198) | (0.245) | |
am | ▼ | |
(0.240) | ||
Num.Obs. | 32 | 32 |
R2 | 0.614 | 0.803 |
R2 Adj. | 0.588 | 0.782 |
AIC | 337.9 | 46.4 |
BIC | 343.7 | 53.7 |
Log.Lik. | -164.940 | -18.201 |
F | 23.100 | 38.066 |
RMSE | 41.91 | 0.43 |
DV | hp | wt |
Note that you can define a std.error
column in tidy_custom.lm
to replace the uncertainty estimates instead of the coefficients.
Customization: New model class
An even more fundamental way to customize the output would be to completely bypass modelsummary
’s extractor functions by assigning a new class name to your model. For example,
# estimate a linear model
<- lm(hp ~ mpg + drat, mtcars)
mod_custom
# assign it a new class
class(mod_custom) <- "custom"
# define tidy and glance methods
<- function(x, ...) {
tidy.custom data.frame(
term = names(coef(x)),
estimate = letters[1:length(coef(x))],
std.error = seq_along(coef(x))
)
}
<- function(x, ...) {
glance.custom data.frame(
"Model" = "Custom",
"nobs" = stats:::nobs.lm(x)
)
}
# summarize
modelsummary(mod_custom)
(1) | |
---|---|
(Intercept) | a |
(1.000) | |
mpg | b |
(2.000) | |
drat | c |
(3.000) | |
Num.Obs. | 32 |
Model | Custom |
Warning: When defining new tidy
and glance
methods, it is important to include an ellipsis argument (...
).
Note that in the glance.custom()
method, we called stats:::nobs.lm()
instead of the default stats::nobs()
method, because the latter know does not know where to dispatch models of our new “custom” class. Being more explicit solves the problem.
An alternative would be to set a new class that inherits from the previous one, and to use a global option to set broom
as the default extractor function (otherwise modelsummary
will use its standard lm
extractors by inheritance):
options(modelsummary_get = "broom")
class(mod_custom) <- c("custom", "lm")
Customization: modelsummary_list
Another flexible way to customize model output is to use output = "modelsummary_list"
. With this output option, modelsummary()
returns a list with two elements: tidy
contains parameter estimates, standard errors, etc., and glance
contains model statistics such as the AIC. For example,
<- lm(hp ~ mpg + drat, mtcars)
mod <- modelsummary(mod, output = "modelsummary_list")
mod_list $tidy mod_list
term estimate std.error statistic df.error p.value s.value group conf.low conf.high
1 (Intercept) 278.515455 55.414866 5.0260061 29 2.359726e-05 15.4 NA NA
2 mpg -9.985499 1.791837 -5.5727709 29 5.172030e-06 17.6 NA NA
3 drat 19.125752 20.197756 0.9469246 29 3.515013e-01 1.5 NA NA
$glance mod_list
aic bic r.squared adj.r.squared rmse nobs F logLik
1 337.8809 343.7438 0.6143611 0.5877653 41.90687 32 23.09994 -164.9404
Both tidy
and glance
can now be customized, and the updated model can be passed back to modelsummary using modelsummary(mod_list)
. All information that is displayed in the table is contained in mod_list
, so this pattern allows for very flexible adjustments of output tables.
A useful example for this pattern concerns mixed models using lme4
. Assume we want to compare the effect of using different degrees-of-freedom adjustments on the significance of the coefficients. The models have identical parameter estimates, standard errors, and model fit statistics - we only want to change the p-values. We use the parameters
package to compute the adjusted p-values.
library("lme4")
<- lmer(mpg ~ drat + (1 | am), data = mtcars)
mod <- modelsummary(mod, output = "modelsummary_list", effects = "fixed")
mod_list # create a copy, where we'll change the p-values
<- as.list(mod_list)
mod_list_kenward $tidy$p.value <- parameters::p_value_kenward(mod)$p
mod_list_kenward
modelsummary(list("Wald" = mod_list, "Kenward" = mod_list_kenward),
statistic = "{std.error} ({p.value}) {stars}")
Wald | Kenward | |
---|---|---|
(Intercept) | -5.159 | -5.159 |
6.409 (0.428) | 6.409 (0.680) | |
drat | 7.045 | 7.045 |
1.736 (<0.001) *** | 1.736 (0.086) + | |
Num.Obs. | 32 | 32 |
R2 Marg. | 0.402 | 0.402 |
R2 Cond. | 0.440 | 0.440 |
AIC | 188.7 | 188.7 |
BIC | 194.6 | 194.6 |
ICC | 0.1 | 0.1 |
RMSE | 4.28 | 4.28 |