Cross tabulations for categorical variables

Description

Convenience function to tabulate counts, cell percentages, and row/column percentages for categorical variables. See the Details section for a description of the internal design. For more complex cross tabulations, use datasummary directly. See the Details and Examples sections below, and the vignettes on the modelsummary website:

  • https://modelsummary.com/

  • https://modelsummary.com/articles/datasummary.html

Usage

datasummary_crosstab(
  formula,
  statistic = 1 ~ 1 + N + Percent("row"),
  data,
  output = getOption("modelsummary_output", default = "default"),
  fmt = 1,
  title = getOption("modelsummary_title", default = NULL),
  notes = getOption("modelsummary_notes", default = NULL),
  align = getOption("modelsummary_align", default = NULL),
  add_columns = getOption("modelsummary_add_columns", default = NULL),
  add_rows = getOption("modelsummary_add_rows", default = NULL),
  sparse_header = getOption("modelsummary_sparse_header", default = TRUE),
  escape = getOption("modelsummary_escape", default = TRUE),
  ...
)

Arguments

formula A two-sided formula to describe the table: rows ~ columns, where rows and columns are variables in the data. Rows and columns may contain interactions, e.g., var1 * var2 ~ var3.
statistic A formula of the form 1 ~ 1 + N + Percent(“row”). The left-hand side may only be empty or contain a 1 to include row totals. The right-hand side may contain: 1 for column totals, N for counts, Percent() for cell percentages, Percent(“row”) for row percentages, Percent(“col”) for column percentages.
data A data.frame (or tibble)
output

filename or object type (character string)

  • Supported filename extensions: .docx, .html, .tex, .md, .txt, .csv, .xlsx, .png, .jpg

  • Supported object types: "default", "html", "markdown", "latex", "latex_tabular", "typst", "data.frame", "tinytable", "gt", "kableExtra", "huxtable", "flextable", "DT", "jupyter". The "modelsummary_list" value produces a lightweight object which can be saved and fed back to the modelsummary function.

  • The "default" output format can be set to "tinytable", "kableExtra", "gt", "flextable", "huxtable", "DT", or "markdown"

    • If the user does not choose a default value, the packages listed above are tried in sequence.

    • Session-specific configuration: options(“modelsummary_factory_default” = “gt”)

    • Persistent configuration: config_modelsummary(output = “markdown”)

  • Warning: Users should not supply a file name to the output argument if they intend to customize the table with external packages. See the ‘Details’ section.

  • LaTeX compilation requires the booktabs and siunitx packages, but siunitx can be disabled or replaced with global options. See the ‘Details’ section.

fmt

how to format numeric values: integer, user-supplied function, or modelsummary function.

  • Integer: Number of decimal digits

  • User-supplied functions:

    • Any function which accepts a numeric vector and returns a character vector of the same length.

  • modelsummary functions:

    • fmt = fmt_significant(2): Two significant digits (at the term-level)

    • fmt = fmt_sprintf(“%.3f”): See ?sprintf

    • fmt = fmt_identity(): unformatted raw values

title string. Cross-reference labels should be added with Quarto or Rmarkdown chunk options when applicable. When saving standalone LaTeX files, users can add a label such as \label{tab:mytable} directly to the title string, while also specifying escape=FALSE.
notes list or vector of notes to append to the bottom of the table.
align

A string with a number of characters equal to the number of columns in the table (e.g., align = “lcc”). Valid characters: l, c, r, d.

  • "l": left-aligned column

  • "c": centered column

  • "r": right-aligned column

  • "d": dot-aligned column. For LaTeX/PDF output, this option requires at least version 3.0.25 of the siunitx LaTeX package. See the LaTeX preamble help section below for commands to insert in your LaTeX preamble.

add_columns a data.frame (or tibble) with the same number of rows as your main table.
add_rows a data.frame (or tibble) with the same number of columns as your main table. By default, rows are appended to the bottom of the table. You can define a "position" attribute of integers to set the row positions. See Examples section below.
sparse_header TRUE or FALSE. TRUE eliminates column headers which have a unique label across all columns, except for the row immediately above the data. FALSE keeps all headers. The order in which terms are entered in the formula determines the order in which headers appear. For example, x~mean*z will print the mean-related header above the z-related header.’
escape boolean TRUE escapes or substitutes LaTeX/HTML characters which could prevent the file from compiling/displaying. TRUE escapes all cells, captions, and notes. Users can have more fine-grained control by setting escape=FALSE and using an external command such as: modelsummary(model, “latex”) |> tinytable::format_tt(tab, j=1:5, escape=TRUE)
all other arguments are passed through to the table-making functions tinytable::tt, kableExtra::kbl, gt::gt, DT::datatable, etc. depending on the output argument. This allows users to pass arguments directly to datasummary in order to affect the behavior of other functions behind the scenes.

Details

datasummary_crosstab is a wrapper around the datasummary function. This wrapper works by creating a customized formula and by feeding it to datasummary. The customized formula comes in two parts.

First, we take a two-sided formula supplied by the formula argument. All variables of that formula are wrapped in a Factor() call to ensure that the variables are treated as categorical.

Second, the statistic argument gives a two-sided formula which specifies the statistics to include in the table. datasummary_crosstab modifies this formula automatically to include "clean" labels.

Finally, the formula and statistic formulas are combined into a single formula which is fed directly to the datasummary function to produce the table.

Variables in formula are automatically wrapped in Factor().

Version 2.0.0, kableExtra, and tinytable

Since version 2.0.0, modelsummary uses tinytable as its default table-drawing backend. Learn more at: https://vincentarelbundock.github.io/tinytable/",

Revert to kableExtra for one session:

options(modelsummary_factory_default = ‘kableExtra’) options(modelsummary_factory_latex = ‘kableExtra’) options(modelsummary_factory_html = ‘kableExtra’)

Global Options

The behavior of modelsummary can be modified by setting global options. In particular, most of the arguments for most of the package’s functions cna be set using global options. For example:

  • options(modelsummary_output = “modelsummary_list”)

  • options(modelsummary_statistic = ‘({conf.low}, {conf.high})’)

  • options(modelsummary_stars = TRUE)

Options not specific to given arguments are listed below.

Model labels: default column names

These global option changes the style of the default column headers:

  • options(modelsummary_model_labels = “roman”)

  • options(modelsummary_panel_labels = “roman”)

The supported styles are: "model", "panel", "arabic", "letters", "roman", "(arabic)", "(letters)", "(roman)"

The panel-specific option is only used when shape=“rbind”

Table-making packages

modelsummary supports 6 table-making packages: tinytable, kableExtra, gt, flextable, huxtable, and DT. Some of these packages have overlapping functionalities. To change the default backend used for a specific file format, you can use ’ the options function:

options(modelsummary_factory_html = ‘kableExtra’) options(modelsummary_factory_word = ‘huxtable’) options(modelsummary_factory_png = ‘gt’) options(modelsummary_factory_latex = ‘gt’) options(modelsummary_factory_latex_tabular = ‘kableExtra’)

Table themes

Change the look of tables in an automated and replicable way, using the modelsummary theming functionality. See the vignette: https://modelsummary.com/articles/appearance.html

  • modelsummary_theme_gt

  • modelsummary_theme_kableExtra

  • modelsummary_theme_huxtable

  • modelsummary_theme_flextable

  • modelsummary_theme_dataframe

Model extraction functions

modelsummary can use two sets of packages to extract information from statistical models: the easystats family (performance and parameters) and broom. By default, it uses easystats first and then falls back on broom in case of failure. You can change the order of priorities or include goodness-of-fit extracted by both packages by setting:

options(modelsummary_get = “easystats”)

options(modelsummary_get = “broom”)

options(modelsummary_get = “all”)

Formatting numeric entries

By default, LaTeX tables enclose all numeric entries in the command from the siunitx package. To prevent this behavior, or to enclose numbers in dollar signs (for LaTeX math mode), users can call:

options(modelsummary_format_numeric_latex = “plain”)

options(modelsummary_format_numeric_latex = “mathmode”)

A similar option can be used to display numerical entries using MathJax in HTML tables:

options(modelsummary_format_numeric_html = “mathjax”)

LaTeX preamble

When creating LaTeX via the tinytable backend (default in version 2.0.0 and later), it is useful to include the following commands in the LaTeX preamble of your documents. Note that they are added automatically when compiling Rmarkdown or Quarto documents (except when the modelsummary() calls are cached).

\usepackage{tabularray}
\usepackage{float}
\usepackage{graphicx}
\usepackage[normalem]{ulem}
\UseTblrLibrary{booktabs}
\UseTblrLibrary{siunitx}
\newcommand{\tinytableTabularrayUnderline}[1]{\underline{#1}}
\newcommand{\tinytableTabularrayStrikeout}[1]{\sout{#1}}
\NewTableCommand{\tinytableDefineColor}[3]{\definecolor{#1}{#2}{#3}}

Examples

library("modelsummary")

library(modelsummary)

# crosstab of two variables, showing counts, row percentages, and row/column totals
datasummary_crosstab(cyl ~ gear, data = mtcars)
cyl 3 4 5 All
4 N 1 8 2 11
% row 9.1 72.7 18.2 100.0
6 N 2 4 1 7
% row 28.6 57.1 14.3 100.0
8 N 12 0 2 14
% row 85.7 0.0 14.3 100.0
All N 15 12 5 32
% row 46.9 37.5 15.6 100.0
# crosstab of two variables, showing counts only and no totals
datasummary_crosstab(cyl ~ gear, statistic = ~ N, data = mtcars)
cyl 3 4 5
4 N 1 8 2
6 N 2 4 1
8 N 12 0 2
# crosstab of three variables
  datasummary_crosstab(am * cyl ~ gear, data = mtcars)
am cyl 3 4 5 All
0 4 N 1 2 0 3
% row 33.3 66.7 0.0 100.0
6 N 2 2 0 4
% row 50.0 50.0 0.0 100.0
8 N 12 0 0 12
% row 100.0 0.0 0.0 100.0
1 4 N 0 6 2 8
% row 0.0 75.0 25.0 100.0
6 N 0 2 1 3
% row 0.0 66.7 33.3 100.0
8 N 0 0 2 2
% row 0.0 0.0 100.0 100.0
All N 15 12 5 32
% row 46.9 37.5 15.6 100.0
# crosstab with two variables and column percentages
datasummary_crosstab(am ~ gear, statistic = ~ Percent("col"), data = mtcars)
am 3 4 5
0 % col 100.0 33.3 0.0
1 % col 0.0 66.7 100.0