Function to compute the proportions of a set of several binary variables or means or medians of a set of quantitative variables, based on complex survey data. It produces a list containing a table, including the confidence intervals of the indicators and a ready-to-be published ggplot graphic. Exporting the results to an Excell file is possible. The confidence intervals are taking into account the complex survey design.
Usage
many_val(
data,
list_vars,
type,
list_vars_lab = NULL,
facet = NULL,
filter_exp = NULL,
...,
na.rm.facet = T,
na.vars = "rm",
prop_method = "beta",
reorder = FALSE,
show_ci = T,
show_n = FALSE,
show_value = TRUE,
show_labs = TRUE,
scale = NULL,
digits = 0,
unit = NULL,
dec = NULL,
col = NULL,
pal = "OBSS_alt3",
direction = 1,
desaturate = 0,
lighten = 0,
darken = 0,
dodge = 0.9,
font = "Roboto",
wrap_width_y = 25,
title = NULL,
subtitle = NULL,
xlab = NULL,
ylab = NULL,
caption = NULL,
lang = "fr",
theme = "fonctionr",
coef_font = 1,
export_path = NULL
)
many_prop(..., type = "prop")
many_median(..., type = "median")
many_mean(..., type = "mean")Arguments
- data
A dataframe or an object from the survey package or an object from the srvyr package.
- list_vars
A vector containing the names of the dummy/quantitative variables on which to compute the proportions/means/medians.
- type
"prop" to compute proportions ; "mean" to compute means ; "median" to compute medians.
- list_vars_lab
A vector containing the labels of the dummy/quantitative variables to be displayed on the graphic and in the table of result. Default uses the variable names in list_vars.
- facet
A variable defining the faceting group.
- filter_exp
An expression filtering the data, preserving the design. Notice that filter_exp works as srvyr::filter() : it excludes observations for which filter_exp results into NA. It is often the case when NA is present on one of the filter variables.
- ...
All options possible in as_survey_design in srvyr package.
- na.rm.facet
TRUE if you want to remove observations with NA on the facet variable. FALSE if you want to create a facet with the NA values for the facet variable. Default is TRUE.
- na.vars
The treatment of NA values in variables (list_vars). "rm" removes NA seperately in each individual variable, "rm.all" removes every individual that has at least one NA in one variable. Default is "rm".
- prop_method
Type of proportion method used to compute confidence intervals. See survey::svyciprop() for details. Default is beta method. This argument is only used in case of type = "prop".
- reorder
TRUE if you want to reorder the variables according to the proportions/means/medians. Default is FALSE.
- show_ci
TRUE if you want to show the error bars on the graphic. FALSE if you don't want to show the error bars. Default is TRUE.
- show_n
TRUE if you want to show on the graphic the number of observations in the sample for each variable. The number can varie if na.vars = "rm". FALSE if you do not want to show this number. Default is FALSE.
- show_value
TRUE if you want to show the proportions/means/median for each variable on the graphic. FALSE if you do not want to show the proportions/means/medians. Default is TRUE.
- show_labs
TRUE if you want to show axes labels. FALSE if you do not want to show any labels on axes. Default is TRUE.
- scale
Denominator of the proportions. Default is 100 to interpret numbers as percentages. This argument is only used in case of type = "prop".
- digits
Number of decimal places displayed on the values labels on the graphic. Default is 0.
- unit
Unit displayed on the graphic. Default is percent for type = "prop" and no unit for type = "mean" or "median".
- dec
Decimal mark displayed on the graphic. Default depends on lang: "," for fr and nl ; "." for en.
- col
Color of the bars if the user wants a monocolor graph. col must be a R color or an hexadecimal color code. As pal has a priority over col, if the user wants to use col, he must not use simultaneously the pal argument (even pal = NULL).
- pal
Colors of the bars if the user wants the bars to have different colors. pal must be vector of R colors or hexadecimal colors or a palette from packages MetBrewer or PrettyCols or a palette from fonctionr. pal has a priority over col.
- direction
Direction of the palette color. Default is 1. The opposite direction is -1.
- desaturate
Numeric specifying the amount of desaturation where 1 corresponds to complete desaturation (no colors, grey layers only), 0 to no desaturation, and values in between to partial desaturation. Default is 0. It affects only the palette (pal) and not the monocolor (col). See colorspace::desaturate for details. If desaturate and lighten/darken arguments are used, lighten/darken is applied in a second time (i.e. on the color transformed by desaturate).
- lighten
Numeric specifying the amount of lightening. Negative numbers cause darkening. Value shoud be ranged between -1 (black) and 1 (white). Default is 0. It affects only the palette (pal) and not the monocolor (col). See colorspace::lighten for details. If both argument ligthen and darken are used (not advised), darken is applied in a second time (i.e. on the color transformed by lighten).
- darken
Numeric specifying the amount of lightening. Negative numbers cause lightening. Value shoud be ranged between -1 (white) and 1 (black). Default is 0. It affects only the palette (pal) and not the monocolor (col). See colorspace::darken for details. If both argument ligthen and darken are used (not advised), darken is applied in a second time (i.e. on the color transformed by lighten).
- dodge
Width of the bars. Default is 0.9 to let a small space between bars. A value of 1 leads to no space betweens bars. Values higher than 1 are not advised because they cause an overlaping of the bars.
- font
Font used in the graphic. See load_and_active_fonts() for available fonts. Default is "Roboto".
- wrap_width_y
Number of characters before going to the line for the labels of the groups. Default is 25.
- title
Title of the graphic.
- subtitle
Subtitle of the graphic.
- xlab
X label on the graphic. As coord_flip() is used in the graphic, xlab refers to the x label on the graphic, after the coord_flip(), and not to the x variable in the data. Default (xlab = NULL) displays, for type = prop, "Proportion :" (if lang == "fr"), "Proportion:" (if lang == "en" ) or "Aandeel:" (if lang == "nl"), or, for type = "mean", "Moyenne :" (if lang == "fr"), "Mean:" (if lang == "en" ) or "Gemiddelde:" (if lang == "nl"), or, for type = "median", "Médiane :" (if lang == "fr"), "Median:" (if lang == "en" ) or "Mediaan:" (if lang == "nl"), followed by the labels of the variables (list_vars_lab). To show no X label, use xlab = "".
- ylab
Y label on the graphic. As coord_flip() is used in the graphic, ylab refers to the y label on the graphic, after the coord_flip(), and not to the y variable in the data. Default (ylab = NULL) displays no Y label.
- caption
Caption of the graphic.
- lang
Language of the indications on the graphic. Possibilities are "fr" (french), "nl" (dutch) and "en" (english). Default is "fr".
- theme
Theme of the graphic. Default is "fonctionr". "IWEPS" adds y axis lines and ticks. NULL uses the default grey ggplot2 theme.
- coef_font
A multiplier factor for font size of all fonts on the graphic. Default is 1. Usefull when exporting the graphic for a publication (e.g. in a Quarto document).
- export_path
Path to export the results in an xlsx file. The file includes two sheets: the table and the graphic.
Examples
# Loading of data
data(eusilc, package = "laeken")
# Recoding variables
eusilc$worker <- 0
eusilc$worker[eusilc$pl030 == "1"]<-1
eusilc$worker[eusilc$pl030 == "2"]<-1
eusilc$austrian<-0
eusilc$austrian[eusilc$pb220a == "AT"]<-1
# Computation, taking sample design into account
eusilc_many_prop <- many_prop(
eusilc,
list_vars = c(worker,austrian),
list_vars_lab = c("% of workers","% of Austrian"),
facet = rb090,
strata = db040,
ids = db030,
weight = rb050,
title = "Proportion of workers and Autrian according to gender",
subtitle = "Example with austrian SILC data from 'laeken' package"
)
#> Warning: NAs introduced by coercion
#> Warning: Active parameters in function r_options(): font, coef_font
#> Variables used: worker, austrian
#> Input: data.frame
#> Sampling design -> ids: db030, strata: db040, weights: rb050
#> Numbers of observation(s) removed by each filter (one after the other):
#> 0 observation(s) removed due to missing facet
#> Warning: With na.vars = 'rm', observations removed differ between variables
# Results in graph form
eusilc_many_prop$graph
# Results in table format
eusilc_many_prop$tab
#> # A tibble: 4 × 12
#> rb090 list_col prop prop_low prop_upp n_sample n_true_weighted
#> <fct> <fct> <dbl> <dbl> <dbl> <int> <dbl>
#> 1 male % of workers 0.495 0.484 0.506 7267 1969092.
#> 2 female % of workers 0.366 0.355 0.376 7560 1536897.
#> 3 male % of Austrian 0.739 0.728 0.750 7267 2942211.
#> 4 female % of Austrian 0.766 0.756 0.777 7560 3219916.
#> # ℹ 5 more variables: n_true_weighted_low <dbl>, n_true_weighted_upp <dbl>,
#> # n_tot_weighted <dbl>, n_tot_weighted_low <dbl>, n_tot_weighted_upp <dbl>