One of the many useful features of the {papaja}
package
is the apa_print()
function, which takes a statistical object and formats the output to
print the statistical information inline in R Markdown documents. The
apa_print()
function is an easy way to extract and format
this statistical information for documents following APA style. However,
APA style has some rather strange quirks, and users may want some
flexibility in how their statistics are formatted. Moreover,
apa_print()
uses LaTeX syntax, which works great for PDFs
but generates images for mathematical symbols when outputting to Word
documents.
The cocoon package uses APA style as the default, but
allows more flexible formatting such as including the leading 0 before
numbers with maximum values of 1. All functions accept a
type
argument that specifies either "md"
for
Markdown (default) or "latex"
for LaTeX. This package can
format statistical objects, statistical values, and numbers more
generally.
Formatting statistical objects
Running a statistical test in R typically returns a list with lots of
information about the test, often including things like statistical test
values and p-values. The aim of the format_stats()
function
is to extract and format statistics for a suite of commonly used
statistical objects (correlation and t-tests).
Correlations
The format_stats()
function can input objects returned
by the cor.test()
or
correlation::correlation()
function and detects whether the
object is from a Pearson, Kendall, or Spearman correlation. It then
reports and formats the appropriate correlation coefficient and
p-value.
Let’s start by creating a few different correlations.
mpg_disp_corr_pearson <- cor.test(mtcars$mpg, mtcars$disp, method = "pearson")
mpg_disp_corr_spearman <- cor.test(mtcars$mpg, mtcars$disp, method = "spearman", exact = FALSE)
mpg_disp_corr_kendall <- cor.test(mtcars$mpg, mtcars$disp, method = "kendall", exact = FALSE)
For Pearson correlations, we get the correlation coefficient and the
confidence intervals. Since Spearman and Kendall correlations are
non-parametric, confidence intervals are not returned. Confidence
intervals can be omitted from Pearson correlations by setting
full = FALSE
.
Code | Output |
---|---|
format_stats(mpg_disp_corr_pearson) |
r = -.85, 95% CI [-0.92, -0.71], p < .001 |
format_stats(mpg_disp_corr_pearson, full = FALSE) |
r = -.85, p < .001 |
format_stats(mpg_disp_corr_spearman) |
ρ = -.91, p < .001 |
format_stats(mpg_disp_corr_kendall) |
τ = -.77, p < .001 |
Format the number of digits of coefficients with digits
and digits of p-values with pdigits
. Include the leading
zeros for coefficients and p-values with pzero = TRUE
.
Remove italics with italics = FALSE
.
Code | Output |
---|---|
format_stats(mpg_disp_corr_pearson) |
r = -.85, 95% CI [-0.92, -0.71], p < .001 |
format_stats(mpg_disp_corr_pearson, digits = 1, pdigits = 2) |
r = -.8, 95% CI [-0.9, -0.7], p < .01 |
format_stats(mpg_disp_corr_pearson, pzero = TRUE) |
r = -0.85, 95% CI [-0.92, -0.71], p < 0.001 |
format_stats(mpg_disp_corr_pearson, italics = FALSE) |
r = -.85, 95% CI [-0.92, -0.71], p < .001 |
format_stats(mpg_disp_corr_spearman, italics = FALSE) |
ρ = -.91, p < .001 |
format_stats(mpg_disp_corr_kendall, italics = FALSE) |
τ = -.77, p < .001 |
T-tests
The format_stats()
function can also input objects
returned by the t.test()
or wilcox.test()
functions and detect whether the object is from a Student’s or Wilcoxon
t-test, including one-sample, independent-sample, and paired-sample
versions. It then reports and formats the mean value (or mean
difference), confidence intervals for mean value/difference, appropriate
test statistic, degrees of freedom (for parametric tests), and
p-value.
Let’s start by creating a few different t-tests
mpg_disp_ttest_gear_carb <- t.test(mtcars$gear, mtcars$carb)
mpg_disp_ttest_gear_carb_paired <- t.test(mtcars$gear, mtcars$carb, paired = TRUE)
mpg_disp_ttest_gear_carb_onesample <- t.test(mtcars$gear, mu = 4)
mpg_disp_wtest_gear_carb <- wilcox.test(mtcars$gear, mtcars$carb, exact = FALSE)
mpg_disp_wtest_gear_carb_paired <- wilcox.test(mtcars$gear, mtcars$carb, paired = TRUE, exact = FALSE)
mpg_disp_wtest_gear_carb_onesample <- wilcox.test(mtcars$gear, mu = 4, exact = FALSE)
For Student’s t-tests, we get the mean value or difference and the
confidence intervals. Means and confidence intervals can be omitted by
setting full = FALSE
.
Code | Output |
---|---|
format_stats(mpg_disp_ttest_gear_carb) |
M = 0.9, 95% CI [0.2, 1.5], t(43.4) = 2.8, p = .008 |
format_stats(mpg_disp_ttest_gear_carb_paired) |
M = 0.9, 95% CI [0.3, 1.4], t(31) = 3.1, p = .004 |
format_stats(mpg_disp_ttest_gear_carb_onesample) |
M = 3.7, 95% CI [3.4, 4.0], t(31) = -2.4, p = .023 |
format_stats(mpg_disp_ttest_gear_carb_onesample, full = FALSE) |
t(31) = -2.4, p = .023 |
format_stats(mpg_disp_wtest_gear_carb) |
W = 727.5, p = .003 |
format_stats(mpg_disp_wtest_gear_carb_paired) |
V = 267.0, p = .004 |
format_stats(mpg_disp_wtest_gear_carb_onesample) |
V = 52.5, p = .027 |
Format the number of digits of coefficients with digits
and digits of p-values with pdigits
. Include the leading
zeros for coefficients and p-values with pzero = TRUE
.
Remove italics with italics = FALSE
.
Code | Output |
---|---|
format_stats(mpg_disp_ttest_gear_carb) |
M = 0.9, 95% CI [0.2, 1.5], t(43.4) = 2.8, p = .008 |
format_stats(mpg_disp_ttest_gear_carb, digits = 2, pdigits = 2) |
M = 0.88, 95% CI [0.24, 1.51], t(43.40) = 2.79, p < .01 |
format_stats(mpg_disp_ttest_gear_carb, pzero = TRUE) |
M = 0.9, 95% CI [0.2, 1.5], t(43.4) = 2.8, p = 0.008 |
format_stats(mpg_disp_ttest_gear_carb, italics = FALSE) |
M = 0.9, 95% CI [0.2, 1.5], t(43.4) = 2.8, p = .008 |
format_stats(mpg_disp_wtest_gear_carb) |
W = 727.5, p = .003 |
format_stats(mpg_disp_wtest_gear_carb, italics = FALSE) |
W = 727.5, p = .003 |
Bayes factors
The format_stats()
function can also extract and format
Bayes factors from a BFBayesFactor
object from the {BayesFactor}
package. Bayes factors are not as standardized in how they are
formatted. One issue is that Bayes factors can be referenced from either
the alternative hypothesis (H1) or the null hypothesis
(H0). Also, as a ratio, digits after the decimal are more
important below 1 than above 1.
To respond to the digits issue, the format_stats()
function controls digits for Bayes factors less than 1
(digits1
) separately from those greater than 1
(digits2
). In fact, the defaults are different for these
two arguments. Further, Bayes factors can be very large or very small
when evidence strongly favors one hypothesis over another. Therefore,
the cutoff
argument set a threshold above which (or below
1/cutoff) the returned value is truncated (e.g., BF > 1000).
bf_corr <- BayesFactor::correlationBF(mtcars$mpg, mtcars$disp)
bf_ttest <- BayesFactor::ttestBF(mtcars$vs, mtcars$am)
bf_lm <- BayesFactor::lmBF(mpg ~ am, data = mtcars)
Code | Output |
---|---|
format_stats(bf_lm) |
BF10 = 87.3 |
format_stats(bf_lm, digits1 = 2) |
BF10 = 87.30 |
format_stats(bf_corr) |
BF10 = 2.7×106 |
format_stats(bf_corr, cutoff = 1000) |
BF10 > 1000 |
format_stats(bf_ttest) |
BF10 = 0.26 |
format_stats(bf_ttest, digits2 = 3) |
BF10 = 0.262 |
format_stats(bf_ttest, cutoff = 3) |
BF10 < 0.33 |
The default label for Bayes factors is BF10. The
text of the label can be changed with the label
argument,
where setting label = ""
omits the label. Italics can be
removed with italics = FALSE
, and the subscript can be set
to 01 (subscript = "01"
) or removed
(subscript = ""
).
Code | Output |
---|---|
format_stats(bf_lm) |
BF10 = 87.3 |
format_stats(bf_lm, italics = FALSE) |
BF10 = 87.3 |
format_stats(bf_lm, subscript = "") |
BF = 87.3 |
format_stats(bf_lm, label = "Bayes factor", italics = FALSE, subscript = "") |
Bayes factor = 87.3 |
format_stats(bf_lm, label = "") |
87.3 |
Formatting statistical values
Central tendency and error
Data vectors
Often, we need to include simple descriptive statistics in our
documents, such as measures of central tendency and error.
cocoon includes a suite of functions that can calculate
different summary measures of central tendency (mean or median) and
error (confidence interval, standard error, standard deviation,
interquartile range) from a numeric data vector. With the base function
format_summary()
, you can specify central tendency with the
summary
argument and error with the error
argument. For instance,
format_summary(vec, summary = "mean", error = "se")
calculates mean and standard error.
cocoon includes a number of wrapper functions that cover common measures of central tendency and error including:
And if you don’t want to include error, use
format_mean()
or format_median()
.
Code | Output |
---|---|
format_summary(mtcars$mpg, error = "ci") |
M = 20.1, 95% CI [17.9, 22.3] |
format_meanci(mtcars$mpg) |
M = 20.1, 95% CI [17.9, 22.3] |
format_medianiqr(mtcars$mpg) |
Mdn = 19.2 (IQR = 7.4) |
format_mean(mtcars$mpg) |
M = 20.1 |
Pre-calculated summaries
In addition to calculating values directly from the vectors, these
functions can format already-calculated measures. So if you already have
your mean and error calculated, just pass the vector of central
tendency, lower error limit, and upper error limit to the
values
argument to format them. For instance,
format_meanci(values = c(12.5, 11.2, 13.7))
produces
M = 12.5, 95% CI [11.2, 13.7]. Make sure you pass the arguments
in this order, as the function checks whether the send argument is less
than or equal to the first and the third is greater than or equal to the
first.
Formatting output
These functions can control many aspects of formatting for the values
and labels of summary statistics. Digits after the decimal are
controlled with digits
(default is 1). The
tendlabel
argument defines whether the default abbreviation
is used (“M” or “Mdn”), the full word (“Mean” or “Median”), or no label
is provided. Each of these can be italicized or not with the
italics
argument, subscripts can be included with the
subscript
argument, and units added with the
units
argument.
Code | Output |
---|---|
format_mean(mtcars$mpg) |
M = 20.1 |
format_mean(mtcars$mpg, tendlabel = "word") |
Mean = 20.1 |
format_mean(mtcars$mpg, tendlabel = "none") |
20.1 |
format_mean(mtcars$mpg, italics = FALSE) |
M = 20.1 |
format_mean(mtcars$mpg, subscript = "A") |
MA = 20.1 |
format_mean(mtcars$mpg, units = "m") |
M = 20.1 m |
Error can be displayed in a number of different ways. Setting the
display
argument to "limits"
(default)
includes upper and lower limits in brackets. If intervals rather than
limits are preferred, they can be appended after the mean/median with ±
using "pm"
or in parentheses with "par"
. Error
is not displayed if display = "none"
. The presence of the
error label is controlled by the logical argument
errorlabel
. When set to FALSE
, no error label
is included. For confidence intervals, the cilevel
argument
takes a numeric scalar from 0-1 to define the confidence level.
Code | Output |
---|---|
format_meanci(mtcars$mpg) |
M = 20.1, 95% CI [17.9, 22.3] |
format_meanci(mtcars$mpg, display = "pm") |
M = 20.1 ± 2.2 |
format_meanci(mtcars$mpg, display = "par") |
M = 20.1 (95% CI = 2.2) |
format_meanci(mtcars$mpg, display = "none") |
M = 20.1 |
format_meanci(mtcars$mpg, errorlabel = FALSE) |
M = 20.1, [17.9, 22.3] |
format_meanci(mtcars$mpg, cilevel = 0.90) |
M = 20.1, 90% CI [18.3, 21.9] |
P-values
P-values are pretty easy to format with format_p()
. The
digits
argument controls the number of digits after the
decimal, and if the value is lower, p <
is used.
Unfortunately, APA style involves lopping off the leading zero in
p-values, but setting pzero = TRUE
turns off this silly
setting. The p-value label is controlled by label
, where
the user can specify the exact label text. By default, this is a lower
case, italicized p. Non-italicized can be defined with
italics = FALSE
. P-value labels can be omitted by setting
label = ""
.
Code | Output |
---|---|
format_p(0.001) |
p = .001 |
format_p(0.001, digits = 2) |
p < .01 |
format_p(0.321, digits = 2) |
p = .32 |
format_p(0.001, pzero = TRUE) |
p = 0.001 |
format_p(0.001, label = "P") |
P = .001 |
format_p(0.001, italics = FALSE) |
p = .001 |
format_p(0.001, label = "") |
.001 |
Bayes factors
Though the format_stats()
function extracts and formats
Bayes factors from the {BayesFactor}
package, sometimes you may have Bayes factors from other sources. The
format_bf()
function formats Bayes factors from numeric
values (either single scalar elements or vectors).
Code | Output |
---|---|
format_bf(4321) |
BF10 = 4.3×103 |
format_bf(4321, digits1 = 2) |
BF10 = 4.32×103 |
format_bf(4321, cutoff = 1000) |
BF10 > 1000 |
format_bf(0.04321) |
BF10 = 0.04 |
format_bf(0.04321, digits2 = 3) |
BF10 = 0.043 |
format_bf(0.04321, cutoff = 10) |
BF10 < 0.10 |
format_bf(4321, italics = FALSE) |
BF10 = 4.3×103 |
format_bf(4321, subscript = "") |
BF = 4.3×103 |
format_bf(4321, label = "Bayes factor", italics = FALSE, subscript = "") |
Bayes factor = 4.3×103 |
format_bf(4321, label = "") |
4.3×103 |
format_bf(c(4321, 0.04321)) |
BF10 = 4.3×103, BF10 = 0.04 |
Formatting numbers
In addition to formatting specific statistics, this package can
format numbers more generally. The format_num()
function
controls general formatting of numbers of digits with
digits
and the presence of the leading zero with
pzero
.
Code | Output |
---|---|
format_num(0.1234) |
0.1 |
format_num(0.1234, digits = 2) |
0.12 |
format_num(0.1234, pzero = FALSE) |
.1 |
For large or small values, using scientific notation may be a more
useful way to format the numbers. The format_scientific()
function converts to scientific notation, again offering control of the
number of digits
as well as whether output
type
is Markdown or LaTeX.
Code | Output |
---|---|
format_scientific(1234) |
1.2×103 |
format_scientific(0.0000001234) |
1.2×10-7 |
format_scientific(0.0000001234, digits = 2) |
1.23×10-7 |