One of the most useful features of the {papaja}
package
is the apa_print()
function, which takes a statistical object and formats the output to
print the statistical information inline in R Markdown documents. The
apa_print()
function is an easy way to extract and format
this statistical information for documents following APA style. However,
APA style has some rather strange quirks, and users may want some
flexibility in how their statistics are formatted. Moreover,
apa_print()
uses LaTeX syntax, which works great for PDFs
but generates images for mathematical symbols when outputting to Word
documents.
The formatstats package uses APA style as the default,
but allows more flexible formatting such as including the leading 0
before numbers with maximum values of 1. All functions accept a
type
argument that specifies either "md"
for
Markdown (default) or "latex"
for LaTeX. This package can
format statistical objects, statistical values, and numbers more
generally.
Formatting statistical values
Central tendency and error
Data vectors
Often, we need to include simple descriptive statistics in our
documents, such as measures of central tendency and error.
formatstats includes a suite of functions that can
calculate different summary measures of central tendency (mean or
median) and error (confidence interval, standard error, standard
deviation, interquartile range) from a numeric data vector. With the
base function format_summary()
, you can specify central
tendency with the summary
argument and error with the
error
argument. For instance,
format_summary(vec, summary = "mean", error = "se")
calculates mean and standard error.
formatstats includes a number of wrapper functions that cover common measures of central tendency and error including:
And if you don’t want to include error, use
format_mean()
or format_median()
.
Code | Output |
---|---|
format_summary(mtcars$mpg, error = "ci") |
M = 20.1, 95% CI [17.9, 22.3] |
format_meanci(mtcars$mpg) |
M = 20.1, 95% CI [17.9, 22.3] |
format_medianiqr(mtcars$mpg) |
Mdn = 19.2 (IQR = 7.4) |
format_mean(mtcars$mpg) |
M = 20.1 |
Pre-calculated summaries
In addition to calculating values directly from the vectors, these
functions can format already-calculated measures. So if you already have
your mean and error calculated, just pass the vector of central
tendency, lower error limit, and upper error limit to the
values
argument to format them. For instance,
format_meanci(values = c(12.5, 11.2, 13.7))
produces
M = 12.5, 95% CI [11.2, 13.7]. Make sure you pass the arguments
in this order, as the function checks whether the send argument is less
than or equal to the first and the third is greater than or equal to the
first.
Formatting output
These functions can control many aspects of formatting for the values
and labels of summary statistics. Digits after the decimal are
controlled with digits
(default is 1). The
tendlabel
argument defines whether the default abbreviation
is used (“M” or “Mdn”), the full word (“Mean” or “Median”), or no label
is provided. Each of these can be italicized or not with the
italics
argument, subscripts can be included with the
subscript
argument, and units added with the
units
argument.
Code | Output |
---|---|
format_mean(mtcars$mpg) |
M = 20.1 |
format_mean(mtcars$mpg, tendlabel = "word") |
Mean = 20.1 |
format_mean(mtcars$mpg, tendlabel = "none") |
20.1 |
format_mean(mtcars$mpg, italics = FALSE) |
M = 20.1 |
format_mean(mtcars$mpg, subscript = "A") |
MA = 20.1 |
format_mean(mtcars$mpg, units = "m") |
M = 20.1 m |
Error can be displayed in a number of different ways. Setting the
display
argument to "limits"
(default)
includes upper and lower limits in brackets. If intervals rather than
limits are preferred, they can be appended after the mean/median with ±
using "pm"
or in parentheses with "par"
. Error
is not displayed if display = "none"
. The presence of the
error label is controlled by the logical argument
errorlabel
. When set to FALSE
, no error label
is included. For confidence intervals, the cilevel
argument
takes a numeric scalar from 0-1 to define the confidence level.
Code | Output |
---|---|
format_meanci(mtcars$mpg) |
M = 20.1, 95% CI [17.9, 22.3] |
format_meanci(mtcars$mpg, display = "pm") |
M = 20.1 ± 2.2 |
format_meanci(mtcars$mpg, display = "par") |
M = 20.1 (95% CI = 2.2) |
format_meanci(mtcars$mpg, display = "none") |
M = 20.1 |
format_meanci(mtcars$mpg, errorlabel = FALSE) |
M = 20.1, [17.9, 22.3] |
format_meanci(mtcars$mpg, cilevel = 0.90) |
M = 20.1, 90% CI [18.3, 21.9] |
P-values
P-values are pretty easy to format with format_p()
. The
digits
argument controls the number of digits after the
decimal, and if the value is lower, p <
is used.
Unfortunately, APA style involves lopping off the leading zero in
p-values, but setting pzero = TRUE
turns off this silly
setting. The p-value label is controlled by label
, where
the user can specify the exact label text. By default, this is a lower
case, italicized p. Non-italicized can be defined with
italics = FALSE
. P-value labels can be omitted by setting
label = ""
.
Code | Output |
---|---|
format_p(0.001) |
p = .001 |
format_p(0.001, digits = 2) |
p < .01 |
format_p(0.321, digits = 2) |
p = .32 |
format_p(0.001, pzero = TRUE) |
p = 0.001 |
format_p(0.001, label = "P") |
P = .001 |
format_p(0.001, italics = FALSE) |
p = .001 |
format_p(0.001, label = "") |
.001 |
Bayes factors
The format_bf()
function can either format a vector of
numeric values or extract and format Bayes factors from a
BFBayesFactor
object from the {BayesFactor}
package. Bayes factors are not as standardized in how they are
formatted. One issue is that Bayes factors can be referenced from either
the alternative hypothesis (H1) or the null hypothesis
(H0). Also, as a ratio, digits after the decimal are more
important below 1 than above 1.
To respond to the digits issue, the format_bf()
function
controls digits for Bayes factor less than 1 (digits1
)
separately from those greater than 1 (digits2
). In fact,
the defaults are different for these two arguments. Further, Bayes
factors can be very large or very small when evidence strongly favors
one hypothesis over another. Therefore, the cutoff
argument
set a threshold above which (or below 1/cutoff) the returned value is
truncated (e.g., BF > 1000).
Code | Output |
---|---|
format_bf(4321) |
BF10 = 4.3×103 |
format_bf(4321, digits1 = 2) |
BF10 = 4.32×103 |
format_bf(4321, cutoff = 1000) |
BF10 > 1000 |
format_bf(0.04321) |
BF10 = 0.04 |
format_bf(0.04321, digits2 = 3) |
BF10 = 0.043 |
format_bf(0.04321, cutoff = 10) |
BF10 < 0.1 |
The default label for Bayes factors is BF10. The
text of the label can be changed with the label
argument,
where setting label = ""
omits the label. Italics can be
removed with italics = FALSE
, and the subscript can be set
to 01 (subscript = "01"
) or removed
(subscript = ""
).
Code | Output |
---|---|
format_bf(4321) |
BF10 = 4.3×103 |
format_bf(4321, italics = FALSE) |
BF10 = 4.3×103 |
format_bf(4321, subscript = "") |
BF = 4.3×103 |
format_bf(4321, label = "Bayes factor", italics = FALSE, subscript = "") |
Bayes factor = 4.3×103 |
format_bf(4321, label = "") |
4.3×103 |
Formatting statistical objects
Running a statistical test in R typically returns a list with lots of information about the test, often including things like statistical test values and p-values. The aim of this set of functions is to extract and format statistics for a suite of commonly used statistical objects (correlation and t-tests).
Correlations
The format_corr()
function inputs objects returned by
the cor.test()
function and detects whether the object is
from a Pearson, Kendall, or Spearman correlation. It then reports and
formats the appropriate correlation coefficient and p-value.
Let’s start by creating a few different correlations.
mpg_disp_corr_pearson <- cor.test(mtcars$mpg, mtcars$disp, method = "pearson")
mpg_disp_corr_spearman <- cor.test(mtcars$mpg, mtcars$disp, method = "spearman", exact = FALSE)
mpg_disp_corr_kendall <- cor.test(mtcars$mpg, mtcars$disp, method = "kendall", exact = FALSE)
For Pearson correlations, we get the correlation coefficient and the
confidence intervals. Since Spearman and Kendall correlations are
non-parametric, confidence intervals are not returned. Confidence
intervals can be omitted from Pearson correlations by setting
ci = FALSE
.
Code | Output |
---|---|
format_corr(mpg_disp_corr_pearson) |
r = -.85, 95% CI [-0.92, -0.71], p < .001 |
format_corr(mpg_disp_corr_pearson, ci = FALSE) |
r = -.85, p < .001 |
format_corr(mpg_disp_corr_spearman) |
ρ = -.91, p < .001 |
format_corr(mpg_disp_corr_kendall) |
τ = -.77, p < .001 |
Format the number of digits of coefficients with digits
and digits of p-values with pdigits
. Include the leading
zeros for coefficients and p-values with pzero = TRUE
.
Remove italics with italics = FALSE
.
Code | Output |
---|---|
format_corr(mpg_disp_corr_pearson) |
r = -.85, 95% CI [-0.92, -0.71], p < .001 |
format_corr(mpg_disp_corr_pearson, digits = 1, pdigits = 2) |
r = -.8, 95% CI [-0.9, -0.7], p < .01 |
format_corr(mpg_disp_corr_pearson, pzero = TRUE) |
r = -0.85, 95% CI [-0.92, -0.71], p < 0.001 |
format_corr(mpg_disp_corr_pearson, italics = FALSE) |
r = -.85, 95% CI [-0.92, -0.71], p < .001 |
format_corr(mpg_disp_corr_spearman, italics = FALSE) |
ρ = -.91, p < .001 |
format_corr(mpg_disp_corr_kendall, italics = FALSE) |
τ = -.77, p < .001 |
T-tests
The format_ttest()
function inputs objects returned by
the t.test()
or wilcox.test()
functions and
detects whether the object is from a Student’s or Wilcoxon t-test,
including one-sample, independent-sample, and paired-sample versions. It
then reports and formats the mean value (or mean difference), confidence
intervals for mean value/difference, appropriate test statistic, degrees
of freedom (for parametric tests), and p-value.
Let’s start by creating a few different t-tests
mpg_disp_ttest_gear_carb <- t.test(mtcars$gear, mtcars$carb)
mpg_disp_ttest_gear_carb_paired <- t.test(mtcars$gear, mtcars$carb, paired = TRUE)
mpg_disp_ttest_gear_carb_onesample <- t.test(mtcars$gear, mu = 4)
mpg_disp_wtest_gear_carb <- wilcox.test(mtcars$gear, mtcars$carb, exact = FALSE)
mpg_disp_wtest_gear_carb_paired <- wilcox.test(mtcars$gear, mtcars$carb, paired = TRUE, exact = FALSE)
mpg_disp_wtest_gear_carb_onesample <- wilcox.test(mtcars$gear, mu = 4, exact = FALSE)
For Student’s t-tests, we get the mean value or difference and the
confidence intervals. Means and confidence intervals can be omitted by
setting full = FALSE
.
Code | Output |
---|---|
format_ttest(mpg_disp_ttest_gear_carb) |
M = 0.9, 95% CI [0.2, 1.5], t(43.4) = 2.8, p = .008 |
format_ttest(mpg_disp_ttest_gear_carb_paired) |
M = 0.9, 95% CI [0.3, 1.4], t(31) = 3.1, p = .004 |
format_ttest(mpg_disp_ttest_gear_carb_onesample) |
M = 3.7, 95% CI [3.4, 4.0], t(31) = -2.4, p = .023 |
format_ttest(mpg_disp_ttest_gear_carb_onesample, full = FALSE) |
t(31) = -2.4, p = .023 |
format_ttest(mpg_disp_wtest_gear_carb) |
W = 727.5, p = .003 |
format_ttest(mpg_disp_wtest_gear_carb_paired) |
V = 267.0, p = .004 |
format_ttest(mpg_disp_wtest_gear_carb_onesample) |
V = 52.5, p = .027 |
Format the number of digits of coefficients with digits
and digits of p-values with pdigits
. Include the leading
zeros for coefficients and p-values with pzero = TRUE
.
Remove italics with italics = FALSE
.
Code | Output |
---|---|
format_ttest(mpg_disp_ttest_gear_carb) |
M = 0.9, 95% CI [0.2, 1.5], t(43.4) = 2.8, p = .008 |
format_ttest(mpg_disp_ttest_gear_carb, digits = 2, pdigits = 2) |
M = 0.88, 95% CI [0.24, 1.51], t(43.40) = 2.79, p < .01 |
format_ttest(mpg_disp_ttest_gear_carb, pzero = TRUE) |
M = 0.9, 95% CI [0.2, 1.5], t(43.4) = 2.8, p = 0.008 |
format_ttest(mpg_disp_ttest_gear_carb, italics = FALSE) |
M = 0.9, 95% CI [0.2, 1.5], t(43.4) = 2.8, p = .008 |
format_ttest(mpg_disp_wtest_gear_carb) |
W = 727.5, p = .003 |
format_ttest(mpg_disp_wtest_gear_carb, italics = FALSE) |
W = 727.5, p = .003 |
Formatting numbers
In addition to formatting specific statistics, this package can
format numbers more generally. The format_num()
function
controls general formatting of numbers of digits with
digits
and the presence of the leading zero with
pzero
.
Code | Output |
---|---|
format_num(0.1234) |
0.1 |
format_num(0.1234, digits = 2) |
0.12 |
format_num(0.1234, pzero = FALSE) |
.1 |
For large or small values, using scientific notation may be a more
useful way to format the numbers. The format_scientific()
function converts to scientific notation, again offering control of the
number of digits
as well as whether output
type
is Markdown or LaTeX.
Code | Output |
---|---|
format_scientific(1234) |
1.2×103 |
format_scientific(0.0000001234) |
1.2×10-7 |
format_scientific(0.0000001234, digits = 2) |
1.23×10-7 |