Descriptive Summary Table for Study Characteristics (User-Friendly)

Creates a clean, publication-ready summary table using `gtsummary::tbl_summary()`. Designed for beginner analysts, this function applies sensible defaults and flexible options to display categorical and continuous variables with or without stratification. It supports one-line summaries of dichotomous variables, handles missing data gracefully, and includes an optional "Overall" column for comparison.

Usage

descriptive_table(
  data,
  exposures,
  by = NULL,
  percent = c("column", "row"),
  digits = 1,
  show_missing = c("ifany", "no"),
  show_dichotomous = c("all_levels", "single_row"),
  show_overall = c("no", "first", "last"),
  statistic = NULL,
  value = NULL
)

Arguments

data: A data frame containing your study dataset.
exposures: A character vector specifying the variable names (columns) in `data` that should be included in the summary table. These can be categorical or continuous.
by: Optional. A single character string giving the name of a grouping variable (e.g., outcome). If supplied, the table will show stratified summaries by this variable.
percent: Character. Either `"column"` (default) or `"row"`. - `"column"` calculates percentages within each group defined by `by` (i.e., denominator = column total). - `"row"` calculates percentages across `by` groups (i.e., denominator = row total). If `by` is not specified, `"column"` is used and `"row"` is ignored.
digits: Integer. Controls how many decimal places are shown for percentages and means. Defaults to 1.
show_missing: Character. One of `"ifany"` (default) or `"no"`. - `"ifany"` shows missing value counts only when missing values exist. - `"no"` hides missing counts entirely.
show_dichotomous: Character. One of `"all_levels"` (default) or `"single_row"`. - `"all_levels"` displays all levels of binary (dichotomous) variables. - `"single_row"` shows only one row (typically "Yes", "Present", or a user-defined level), making the table more compact.
show_overall: Character. One of `"no"` (default), `"first"`, or `"last"`. If `by` is supplied: - `"first"` includes a column for overall summaries before the stratified columns. - `"last"` includes the overall column at the end. - `"no"` disables the overall column.
statistic: Optional named vector of summary types for specific variables. For example, use `statistic = c(age = "mean", bmi = "median")` to override default summaries. Accepted values: `"mean"`, `"median"`, `"mode"`, `"count"`.
value: Optional. A list of formulas specifying which level of a binary variable to show when `show_dichotomous = "single_row"`. For example, `value = list(sex ~ "Female")` will report only the "Female" row.

Value

A `gtsummary::tbl_summary` object with additional class `"descriptive_table"`. Can be printed, customized, merged, or exported.

Examples

# \donttest{
if (requireNamespace("mlbench", quietly = TRUE)) {
  data("PimaIndiansDiabetes2", package = "mlbench")
  library(dplyr)
pima <- PimaIndiansDiabetes2 |>
  mutate(
    diabetes = ifelse(diabetes == "pos", 1, 0),

    bmi = cut(
      mass,
      breaks = c(-Inf, 18.5, 24.9, 29.9, Inf),
      labels = c("Underweight", "Normal", "Overweight", "Obese")
    )
  )
  descriptive_table(pima, exposures = c("age", "bmi"),
                    by = "diabetes")
}
#> 
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#> 
#>     filter, lag
#> The following objects are masked from ‘package:base’:
#> 
#>     intersect, setdiff, setequal, union


  Characteristic
      0

N = 500¹
      1

N = 268¹
    
age
31.2 (11.7)
37.1 (11.0)
bmi



    Underweight
4 (0.8%)
0 (0.0%)
    Normal
95 (19.3%)
7 (2.6%)
    Overweight
139 (28.3%)
40 (15.0%)
    Obese
253 (51.5%)
219 (82.3%)
    Unknown
9
2
¹ Mean (SD); n (%)
    
# }

Characteristic	0 N = 500¹	1 N = 268¹
age	31.2 (11.7)	37.1 (11.0)
bmi
Underweight	4 (0.8%)	0 (0.0%)
Normal	95 (19.3%)	7 (2.6%)
Overweight	139 (28.3%)	40 (15.0%)
Obese	253 (51.5%)	219 (82.3%)
Unknown	9	2
¹ Mean (SD); n (%)