Summary tables using 2-sided formulae: crosstabs, frequencies, table 1s and more.
Source:R/datasummary.R
datasummary.Rd
datasummary
can use any summary function which produces one numeric or
character value per variable. The examples section of this documentation
shows how to define custom summary functions.
modelsummary
also supplies several shortcut summary functions which can be used in datasummary()
formulas: Min, Max, Mean, Median, Var, SD, NPercent, NUnique, Ncol, P0, P25, P50, P75, P100.
See the Details and Examples sections below, and the vignettes on the modelsummary
website:
https://modelsummary.com/
https://modelsummary.com/articles/datasummary.html
Usage
datasummary(
formula,
data,
output = "default",
fmt = 2,
title = NULL,
notes = NULL,
align = NULL,
add_columns = NULL,
add_rows = NULL,
sparse_header = TRUE,
escape = TRUE,
...
)
Arguments
- formula
A two-sided formula to describe the table: rows ~ columns. See the Examples section for a mini-tutorial and the Details section for more resources. Grouping/nesting variables can appear on both sides of the formula, but all summary functions must be on one side.
- data
A data.frame (or tibble)
- output
filename or object type (character string)
Supported filename extensions: .docx, .html, .tex, .md, .txt, .csv, .xlsx, .png, .jpg
Supported object types: "default", "html", "markdown", "latex", "latex_tabular", "data.frame", "gt", "kableExtra", "huxtable", "flextable", "DT", "jupyter". The "modelsummary_list" value produces a lightweight object which can be saved and fed back to the
modelsummary
function.The "default" output format can be set to "kableExtra", "gt", "flextable", "huxtable", "DT", or "markdown"
If the user does not choose a default value, the packages listed above are tried in sequence.
Session-specific configuration:
options("modelsummary_factory_default" = "gt")
Persistent configuration:
config_modelsummary(output = "markdown")
Warning: Users should not supply a file name to the
output
argument if they intend to customize the table with external packages. See the 'Details' section.LaTeX compilation requires the
booktabs
andsiunitx
packages, butsiunitx
can be disabled or replaced with global options. See the 'Details' section.
- fmt
how to format numeric values: integer, user-supplied function, or
modelsummary
function.Integer: Number of decimal digits
User-supplied functions:
Any function which accepts a numeric vector and returns a character vector of the same length.
modelsummary
functions:fmt = fmt_significant(2)
: Two significant digits (at the term-level)fmt = fmt_sprintf("%.3f")
: See?sprintf
fmt = fmt_identity()
: unformatted raw values
- title
string
- notes
list or vector of notes to append to the bottom of the table.
- align
A string with a number of characters equal to the number of columns in the table (e.g.,
align = "lcc"
). Valid characters: l, c, r, d."l": left-aligned column
"c": centered column
"r": right-aligned column
"d": dot-aligned column. For LaTeX/PDF output, this option requires at least version 3.0.25 of the siunitx LaTeX package. These commands must appear in the LaTeX preamble (they are added automatically when compiling Rmarkdown documents to PDF):
\usepackage{booktabs}
\usepackage{siunitx}
\newcolumntype{d}{S[ input-open-uncertainty=, input-close-uncertainty=, parse-numbers = false, table-align-text-pre=false, table-align-text-post=false ]}
- add_columns
a data.frame (or tibble) with the same number of rows as your main table.
- add_rows
a data.frame (or tibble) with the same number of columns as your main table. By default, rows are appended to the bottom of the table. You can define a "position" attribute of integers to set the row positions. See Examples section below.
- sparse_header
TRUE or FALSE. TRUE eliminates column headers which have a unique label across all columns, except for the row immediately above the data. FALSE keeps all headers. The order in which terms are entered in the formula determines the order in which headers appear. For example,
x~mean*z
will print themean
-related header above thez
-related header.`- escape
boolean TRUE escapes or substitutes LaTeX/HTML characters which could prevent the file from compiling/displaying. This setting does not affect captions or notes.
- ...
all other arguments are passed through to the table-making functions kableExtra::kbl, gt::gt, DT::datatable, etc. depending on the
output
argument. This allows users to pass arguments directly todatasummary
in order to affect the behavior of other functions behind the scenes.
Details
Visit the 'modelsummary' website for more usage examples: https://modelsummary.com
The 'datasummary' function is a thin wrapper around the 'tabular' function from the 'tables' package. More details about table-making formulas can be found in the 'tables' package documentation: ?tables::tabular
Hierarchical or "nested" column labels are only available for these output formats: kableExtra, gt, html, rtf, and LaTeX. When saving tables to other formats, nested labels will be combined to a "flat" header.
Global Options
The behavior of modelsummary
can be modified by setting global options. For example:
options(modelsummary_model_labels = "roman")
The rest of this section describes each of the options above.
Model labels: default column names
These global option changes the style of the default column headers:
options(modelsummary_model_labels = "roman")
options(modelsummary_panel_labels = "roman")
The supported styles are: "model", "panel", "arabic", "letters", "roman", "(arabic)", "(letters)", "(roman)""
The panel-specific option is only used when shape="rbind"
Table-making packages
modelsummary
supports 4 table-making packages: kableExtra
, gt
,
flextable
, huxtable
, and DT
. Some of these packages have overlapping
functionalities. For example, 3 of those packages can export to LaTeX. To
change the default backend used for a specific file format, you can use
the options
function:
options(modelsummary_factory_html = 'kableExtra')
options(modelsummary_factory_latex = 'gt')
options(modelsummary_factory_word = 'huxtable')
options(modelsummary_factory_png = 'gt')
Table themes
Change the look of tables in an automated and replicable way, using the modelsummary
theming functionality. See the vignette: https://modelsummary.com/articles/appearance.html
modelsummary_theme_gt
modelsummary_theme_kableExtra
modelsummary_theme_huxtable
modelsummary_theme_flextable
modelsummary_theme_dataframe
Model extraction functions
modelsummary
can use two sets of packages to extract information from
statistical models: the easystats
family (performance
and parameters
)
and broom
. By default, it uses easystats
first and then falls back on
broom
in case of failure. You can change the order of priorities or include
goodness-of-fit extracted by both packages by setting:
options(modelsummary_get = "broom")
options(modelsummary_get = "easystats")
options(modelsummary_get = "all")
Formatting numeric entries
By default, LaTeX tables enclose all numeric entries in the \num{}
command
from the siunitx package. To prevent this behavior, or to enclose numbers
in dollar signs (for LaTeX math mode), users can call:
options(modelsummary_format_numeric_latex = "plain")
options(modelsummary_format_numeric_latex = "mathmode")
A similar option can be used to display numerical entries using MathJax in HTML tables:
options(modelsummary_format_numeric_html = "mathjax")
Examples
library(modelsummary)
# The left-hand side of the formula describes rows, and the right-hand side
# describes columns. This table uses the "mpg" variable as a row and the "mean"
# function as a column:
datasummary(mpg ~ mean, data = mtcars)
# This table uses the "mean" function as a row and the "mpg" variable as a column:
datasummary(mean ~ mpg, data = mtcars)
# Display several variables or functions of the data using the "+"
# concatenation operator. This table has 2 rows and 2 columns:
datasummary(hp + mpg ~ mean + sd, data = mtcars)
# Nest variables or statistics inside a "factor" variable using the "*" nesting
# operator. This table shows the mean of "hp" and "mpg" for each value of
# "cyl":
mtcars$cyl <- as.factor(mtcars$cyl)
datasummary(hp + mpg ~ cyl * mean, data = mtcars)
# If you don't want to convert your original data
# to factors, you can use the 'Factor()'
# function inside 'datasummary' to obtain an identical result:
datasummary(hp + mpg ~ Factor(cyl) * mean, data = mtcars)
# You can nest several variables or statistics inside a factor by using
# parentheses. This table shows the mean and the standard deviation for each
# subset of "cyl":
datasummary(hp + mpg ~ cyl * (mean + sd), data = mtcars)
# Summarize all numeric variables with 'All()'
datasummary(All(mtcars) ~ mean + sd, data = mtcars)
# Define custom summary statistics. Your custom function should accept a vector
# of numeric values and return a single numeric or string value:
minmax <- function(x) sprintf("[%.2f, %.2f]", min(x), max(x))
mean_na <- function(x) mean(x, na.rm = TRUE)
datasummary(hp + mpg ~ minmax + mean_na, data = mtcars)
# To handle missing values, you can pass arguments to your functions using
# '*Arguments()'
datasummary(hp + mpg ~ mean * Arguments(na.rm = TRUE), data = mtcars)
# For convenience, 'modelsummary' supplies several convenience functions
# with the argument `na.rm=TRUE` by default: Mean, Median, Min, Max, SD, Var,
# P0, P25, P50, P75, P100, NUnique, Histogram
#datasummary(hp + mpg ~ Mean + SD + Histogram, data = mtcars)
# These functions also accept a 'fmt' argument which allows you to
# round/format the results
datasummary(hp + mpg ~ Mean * Arguments(fmt = "%.3f") + SD * Arguments(fmt = "%.1f"), data = mtcars)
# Save your tables to a variety of output formats:
f <- hp + mpg ~ Mean + SD
#datasummary(f, data = mtcars, output = 'table.html')
#datasummary(f, data = mtcars, output = 'table.tex')
#datasummary(f, data = mtcars, output = 'table.md')
#datasummary(f, data = mtcars, output = 'table.docx')
#datasummary(f, data = mtcars, output = 'table.pptx')
#datasummary(f, data = mtcars, output = 'table.jpg')
#datasummary(f, data = mtcars, output = 'table.png')
# Display human-readable code
#datasummary(f, data = mtcars, output = 'html')
#datasummary(f, data = mtcars, output = 'markdown')
#datasummary(f, data = mtcars, output = 'latex')
# Return a table object to customize using a table-making package
#datasummary(f, data = mtcars, output = 'gt')
#datasummary(f, data = mtcars, output = 'kableExtra')
#datasummary(f, data = mtcars, output = 'flextable')
#datasummary(f, data = mtcars, output = 'huxtable')
# add_rows
new_rows <- data.frame(a = 1:2, b = 2:3, c = 4:5)
attr(new_rows, 'position') <- c(1, 3)
datasummary(mpg + hp ~ mean + sd, data = mtcars, add_rows = new_rows)
References
Arel-Bundock V (2022). “modelsummary: Data and Model Summaries in R.” Journal of Statistical Software, 103(1), 1-23. doi:10.18637/jss.v103.i01 .'