formatstats • formatstats

library(formatstats)

One of the most useful features of the {papaja} package is the apa_print() function, which takes a statistical object and formats the output to print the statistical information inline in R Markdown documents. The apa_print() function is an easy way to extract and format this statistical information for documents following APA style. However, APA style has some rather strange quirks, and users may want some flexibility in how their statistics are formatted. Moreover, apa_print() uses LaTeX syntax, which works great for PDFs but generates images for mathematical symbols when outputting to Word documents.

The formatstats package uses APA style as the default, but allows more flexible formatting such as including the leading 0 before numbers with maximum values of 1. All functions accept a type argument that specifies either "md" for Markdown (default) or "latex" for LaTeX. This package can format statistical objects, statistical values, and numbers more generally.

Formatting statistical values

Central tendency and error

Data vectors

Often, we need to include simple descriptive statistics in our documents, such as measures of central tendency and error. formatstats includes a suite of functions that can calculate different summary measures of central tendency (mean or median) and error (confidence interval, standard error, standard deviation, interquartile range) from a numeric data vector. With the base function format_summary(), you can specify central tendency with the summary argument and error with the error argument. For instance, format_summary(vec, summary = "mean", error = "se") calculates mean and standard error.

formatstats includes a number of wrapper functions that cover common measures of central tendency and error including:

And if you don’t want to include error, use format_mean() or format_median().

Code	Output
`format_summary(mtcars$mpg, error = "ci")`	M = 20.1, 95% CI [17.9, 22.3]
`format_meanci(mtcars$mpg)`	M = 20.1, 95% CI [17.9, 22.3]
`format_medianiqr(mtcars$mpg)`	Mdn = 19.2 (IQR = 7.4)
`format_mean(mtcars$mpg)`	M = 20.1

Pre-calculated summaries

In addition to calculating values directly from the vectors, these functions can format already-calculated measures. So if you already have your mean and error calculated, just pass the vector of central tendency, lower error limit, and upper error limit to the values argument to format them. For instance, format_meanci(values = c(12.5, 11.2, 13.7)) produces M = 12.5, 95% CI [11.2, 13.7]. Make sure you pass the arguments in this order, as the function checks whether the send argument is less than or equal to the first and the third is greater than or equal to the first.

Formatting output

These functions can control many aspects of formatting for the values and labels of summary statistics. Digits after the decimal are controlled with digits (default is 1). The tendlabel argument defines whether the default abbreviation is used (“M” or “Mdn”), the full word (“Mean” or “Median”), or no label is provided. Each of these can be italicized or not with the italics argument, subscripts can be included with the subscript argument, and units added with the units argument.

Code	Output
`format_mean(mtcars$mpg)`	M = 20.1
`format_mean(mtcars$mpg, tendlabel = "word")`	Mean = 20.1
`format_mean(mtcars$mpg, tendlabel = "none")`	20.1
`format_mean(mtcars$mpg, italics = FALSE)`	M = 20.1
`format_mean(mtcars$mpg, subscript = "A")`	M_A = 20.1
`format_mean(mtcars$mpg, units = "m")`	M = 20.1 m

Error can be displayed in a number of different ways. Setting the display argument to "limits" (default) includes upper and lower limits in brackets. If intervals rather than limits are preferred, they can be appended after the mean/median with ± using "pm" or in parentheses with "par". Error is not displayed if display = "none". The presence of the error label is controlled by the logical argument errorlabel. When set to FALSE, no error label is included. For confidence intervals, the cilevel argument takes a numeric scalar from 0-1 to define the confidence level.

Code	Output
`format_meanci(mtcars$mpg)`	M = 20.1, 95% CI [17.9, 22.3]
`format_meanci(mtcars$mpg, display = "pm")`	M = 20.1 ± 2.2
`format_meanci(mtcars$mpg, display = "par")`	M = 20.1 (95% CI = 2.2)
`format_meanci(mtcars$mpg, display = "none")`	M = 20.1
`format_meanci(mtcars$mpg, errorlabel = FALSE)`	M = 20.1, [17.9, 22.3]
`format_meanci(mtcars$mpg, cilevel = 0.90)`	M = 20.1, 90% CI [18.3, 21.9]

P-values

P-values are pretty easy to format with format_p(). The digits argument controls the number of digits after the decimal, and if the value is lower, p < is used. Unfortunately, APA style involves lopping off the leading zero in p-values, but setting pzero = TRUE turns off this silly setting. The p-value label is controlled by label, where the user can specify the exact label text. By default, this is a lower case, italicized p. Non-italicized can be defined with italics = FALSE. P-value labels can be omitted by setting label = "".

Code	Output
`format_p(0.001)`	p = .001
`format_p(0.001, digits = 2)`	p < .01
`format_p(0.321, digits = 2)`	p = .32
`format_p(0.001, pzero = TRUE)`	p = 0.001
`format_p(0.001, label = "P")`	P = .001
`format_p(0.001, italics = FALSE)`	p = .001
`format_p(0.001, label = "")`	.001

Bayes factors

The format_bf() function can either format a vector of numeric values or extract and format Bayes factors from a BFBayesFactor object from the {BayesFactor} package. Bayes factors are not as standardized in how they are formatted. One issue is that Bayes factors can be referenced from either the alternative hypothesis (H₁) or the null hypothesis (H₀). Also, as a ratio, digits after the decimal are more important below 1 than above 1.

To respond to the digits issue, the format_bf() function controls digits for Bayes factor less than 1 (digits1) separately from those greater than 1 (digits2). In fact, the defaults are different for these two arguments. Further, Bayes factors can be very large or very small when evidence strongly favors one hypothesis over another. Therefore, the cutoff argument set a threshold above which (or below 1/cutoff) the returned value is truncated (e.g., BF > 1000).

Code	Output
`format_bf(4321)`	BF₁₀ = 4.3×10³
`format_bf(4321, digits1 = 2)`	BF₁₀ = 4.32×10³
`format_bf(4321, cutoff = 1000)`	BF₁₀ > 1000
`format_bf(0.04321)`	BF₁₀ = 0.04
`format_bf(0.04321, digits2 = 3)`	BF₁₀ = 0.043
`format_bf(0.04321, cutoff = 10)`	BF₁₀ < 0.1

The default label for Bayes factors is BF₁₀. The text of the label can be changed with the label argument, where setting label = "" omits the label. Italics can be removed with italics = FALSE, and the subscript can be set to 01 (subscript = "01") or removed (subscript = "").

Code	Output
`format_bf(4321)`	BF₁₀ = 4.3×10³
`format_bf(4321, italics = FALSE)`	BF₁₀ = 4.3×10³
`format_bf(4321, subscript = "")`	BF = 4.3×10³
`format_bf(4321, label = "Bayes factor", italics = FALSE, subscript = "")`	Bayes factor = 4.3×10³
`format_bf(4321, label = "")`	4.3×10³

Formatting statistical objects

Running a statistical test in R typically returns a list with lots of information about the test, often including things like statistical test values and p-values. The aim of this set of functions is to extract and format statistics for a suite of commonly used statistical objects (correlation and t-tests).

Correlations

The format_corr() function inputs objects returned by the cor.test() function and detects whether the object is from a Pearson, Kendall, or Spearman correlation. It then reports and formats the appropriate correlation coefficient and p-value.

Let’s start by creating a few different correlations.

mpg_disp_corr_pearson <- cor.test(mtcars$mpg, mtcars$disp, method = "pearson")
mpg_disp_corr_spearman <- cor.test(mtcars$mpg, mtcars$disp, method = "spearman", exact = FALSE)
mpg_disp_corr_kendall <- cor.test(mtcars$mpg, mtcars$disp, method = "kendall", exact = FALSE)

For Pearson correlations, we get the correlation coefficient and the confidence intervals. Since Spearman and Kendall correlations are non-parametric, confidence intervals are not returned. Confidence intervals can be omitted from Pearson correlations by setting ci = FALSE.

Code	Output
`format_corr(mpg_disp_corr_pearson)`	r = -.85, 95% CI [-0.92, -0.71], p < .001
`format_corr(mpg_disp_corr_pearson, ci = FALSE)`	r = -.85, p < .001
`format_corr(mpg_disp_corr_spearman)`	ρ = -.91, p < .001
`format_corr(mpg_disp_corr_kendall)`	τ = -.77, p < .001

Format the number of digits of coefficients with digits and digits of p-values with pdigits. Include the leading zeros for coefficients and p-values with pzero = TRUE. Remove italics with italics = FALSE.

Code	Output
`format_corr(mpg_disp_corr_pearson)`	r = -.85, 95% CI [-0.92, -0.71], p < .001
`format_corr(mpg_disp_corr_pearson, digits = 1, pdigits = 2)`	r = -.8, 95% CI [-0.9, -0.7], p < .01
`format_corr(mpg_disp_corr_pearson, pzero = TRUE)`	r = -0.85, 95% CI [-0.92, -0.71], p < 0.001
`format_corr(mpg_disp_corr_pearson, italics = FALSE)`	r = -.85, 95% CI [-0.92, -0.71], p < .001
`format_corr(mpg_disp_corr_spearman, italics = FALSE)`	ρ = -.91, p < .001
`format_corr(mpg_disp_corr_kendall, italics = FALSE)`	τ = -.77, p < .001

T-tests

The format_ttest() function inputs objects returned by the t.test() or wilcox.test() functions and detects whether the object is from a Student’s or Wilcoxon t-test, including one-sample, independent-sample, and paired-sample versions. It then reports and formats the mean value (or mean difference), confidence intervals for mean value/difference, appropriate test statistic, degrees of freedom (for parametric tests), and p-value.

Let’s start by creating a few different t-tests

mpg_disp_ttest_gear_carb <- t.test(mtcars$gear, mtcars$carb)
mpg_disp_ttest_gear_carb_paired <- t.test(mtcars$gear, mtcars$carb, paired = TRUE)
mpg_disp_ttest_gear_carb_onesample <- t.test(mtcars$gear, mu = 4)
mpg_disp_wtest_gear_carb <- wilcox.test(mtcars$gear, mtcars$carb, exact = FALSE)
mpg_disp_wtest_gear_carb_paired <- wilcox.test(mtcars$gear, mtcars$carb, paired = TRUE, exact = FALSE)
mpg_disp_wtest_gear_carb_onesample <- wilcox.test(mtcars$gear, mu = 4, exact = FALSE)

For Student’s t-tests, we get the mean value or difference and the confidence intervals. Means and confidence intervals can be omitted by setting full = FALSE.

Code	Output
`format_ttest(mpg_disp_ttest_gear_carb)`	M = 0.9, 95% CI [0.2, 1.5], t(43.4) = 2.8, p = .008
`format_ttest(mpg_disp_ttest_gear_carb_paired)`	M = 0.9, 95% CI [0.3, 1.4], t(31) = 3.1, p = .004
`format_ttest(mpg_disp_ttest_gear_carb_onesample)`	M = 3.7, 95% CI [3.4, 4.0], t(31) = -2.4, p = .023
`format_ttest(mpg_disp_ttest_gear_carb_onesample, full = FALSE)`	t(31) = -2.4, p = .023
`format_ttest(mpg_disp_wtest_gear_carb)`	W = 727.5, p = .003
`format_ttest(mpg_disp_wtest_gear_carb_paired)`	V = 267.0, p = .004
`format_ttest(mpg_disp_wtest_gear_carb_onesample)`	V = 52.5, p = .027

Code	Output
`format_ttest(mpg_disp_ttest_gear_carb)`	M = 0.9, 95% CI [0.2, 1.5], t(43.4) = 2.8, p = .008
`format_ttest(mpg_disp_ttest_gear_carb, digits = 2, pdigits = 2)`	M = 0.88, 95% CI [0.24, 1.51], t(43.40) = 2.79, p < .01
`format_ttest(mpg_disp_ttest_gear_carb, pzero = TRUE)`	M = 0.9, 95% CI [0.2, 1.5], t(43.4) = 2.8, p = 0.008
`format_ttest(mpg_disp_ttest_gear_carb, italics = FALSE)`	M = 0.9, 95% CI [0.2, 1.5], t(43.4) = 2.8, p = .008
`format_ttest(mpg_disp_wtest_gear_carb)`	W = 727.5, p = .003
`format_ttest(mpg_disp_wtest_gear_carb, italics = FALSE)`	W = 727.5, p = .003

Formatting numbers

In addition to formatting specific statistics, this package can format numbers more generally. The format_num() function controls general formatting of numbers of digits with digits and the presence of the leading zero with pzero.

Code	Output
`format_num(0.1234)`	0.1
`format_num(0.1234, digits = 2)`	0.12
`format_num(0.1234, pzero = FALSE)`	.1

For large or small values, using scientific notation may be a more useful way to format the numbers. The format_scientific() function converts to scientific notation, again offering control of the number of digits as well as whether output type is Markdown or LaTeX.

Code	Output
`format_scientific(1234)`	1.2×10³
`format_scientific(0.0000001234)`	1.2×10^-7
`format_scientific(0.0000001234, digits = 2)`	1.23×10^-7