Skip to contents

Function describe the distribution of a discrete variable from complex survey data. It produces a table and a graphic.

Usage

distrib_discrete(
  data,
  quali_var,
  facet = NULL,
  filter_exp = NULL,
  ...,
  na.rm.facet = TRUE,
  na.rm.var = TRUE,
  probs = NULL,
  prop_method = "beta",
  reorder = FALSE,
  show_ci = TRUE,
  show_n = FALSE,
  show_value = TRUE,
  show_labs = TRUE,
  scale = 100,
  digits = 0,
  unit = "%",
  dec = ",",
  pal = "sienna2",
  dodge = 0.9,
  font = "Roboto",
  wrap_width_y = 25,
  title = NULL,
  subtitle = NULL,
  xlab = NULL,
  ylab = NULL,
  lang = "fr",
  caption = NULL,
  theme = NULL,
  export_path = NULL
)

distrib_d(...)

Arguments

data

A dataframe or an object from the survey package or an object from the srvyr package.

quali_var

The discrete variable that is described.

facet

A variable defining the faceting group.

filter_exp

An expression that filters the data, preserving the design.

...

All options possible in as_survey_design in srvyr package.

na.rm.facet

TRUE if you want to remove observations with NA on the facet variable. FALSE if you want to create a facet with the NA value for the facet variable. Default is TRUE.

na.rm.var

TRUE if you want to remove observations with NA on the discrete variable. FALSE if you want to create a modality with the NA value for the discrete variable. Default is TRUE.

probs

Vector of probabilities for H0 of the statistical test, in the correct order (will be rescaled to sum to 1). If probs = NULL, no statistical test is performed. Default is NULL.

prop_method

Type of proportion method to use to compute confidence intervals. See svyciprop in survey package for details. Default is the beta method.

reorder

TRUE if you want to reorder the categories according to their proportion. NA value, in case if na.rm.group = FALSE, is not included in the reorder. Default is FALSE.

show_ci

TRUE if you want to show the error bars on the graphic. FALSE if you do not want to show the error bars. Default is TRUE.

show_n

TRUE if you want to show on the graphic the number of individuals in the sample in each modality of quali_var. FALSE if you do not want to show this number. Default is FALSE.

show_value

TRUE if you want to show the proportion of each category on the graphic. FALSE if you do not want to show the proportion. Default is TRUE.

show_labs

TRUE if you want to show axes, titles and caption labels. FALSE if you do not want to show any label on axes and titles. Default is TRUE.

scale

Denominator of the proportion. Default is 100 to interprets numbers as percentages.

digits

Numbers of digits showed on the values labels on the graphic. Default is 0.

unit

Unit showed in the graphic. Default is percent.

dec

Decimal mark shown on the graphic. Default is ",".

pal

Colour of the bars. NA bar, in case if na.rm.group = FALSE, and total bar are always in grey.

dodge

Width of the bar, between 0 and 1. Default is 0.9.

font

Font used in the graphic. See load_and_active_fonts() for available fonts.

wrap_width_y

Number of characters before going to the line in the labels of the categories. Default is 25.

title

Title of the graphic.

subtitle

Subtitle of the graphic.

xlab

X label on the graphic. As coord_flip() is used in the graphic, xlab refers to the X label on the graphic, after the coord_flip(), and not to the x variable in the data. If xlab = NULL, X label on the graphic will be "Distribution (total : 100 percent)". To show no X label, use xlab = "".

ylab

Y label on the graphic. As coord_flip() is used in the graphic, ylab refers to the Y label on the graphic, after the coord_flip(), and not to the Y variable in the data. If ylab = NULL, Y label on the graphic will be quali_var. To show no Y label, use ylab = "".

lang

The language of the indications on the chart. Possibilities: "fr", "nl", "en". Default is "fr".

caption

Caption in the graphic.

theme

Theme od te graphic. IWEPS adds y axis lines and ticks.

export_path

Path to export the results in an xlsx file. The file includes three sheets : the table, the graphic and the statistical test (if probs is not NULL).

Value

A list that contains a table, a graphic and a statistical test

Examples

# Loading of data
data(eusilc, package = "laeken")

# Recoding eusilc$pl030 into eusilc$pl030_rec
eusilc$pl030_rec <- NA
eusilc$pl030_rec[eusilc$pl030 == "1"] <- "Working full time"
eusilc$pl030_rec[eusilc$pl030 == "2"] <- "Working part time"
eusilc$pl030_rec[eusilc$pl030 == "3"] <- "Unemployed"
eusilc$pl030_rec[eusilc$pl030 == "4"] <- "Student"
eusilc$pl030_rec[eusilc$pl030 == "5"] <- "Retired"
eusilc$pl030_rec[eusilc$pl030 == "6"] <- "Permanently disabled"
eusilc$pl030_rec[eusilc$pl030 == "7"] <- "Fulfilling domestic tasks"

# Computation, taking sample design into account
eusilc_dist_group_d <- distrib_d(
eusilc,
pl030_rec,
strata = db040,
ids = db030,
weight = rb050,
title = "Distribution of socio-economic status",
subtitle = "Example with austrian SILC data from 'laeken' package"
)
#> Input: data.frame
#> Sampling design -> ids:  db030, strata:  db040, weights:  rb050

# Results in graph form
eusilc_dist_group_d$graph


# Results in table format
eusilc_dist_group_d$tab
#> # A tibble: 7 × 8
#>   pl030_rec            prop prop_low prop_upp n_sample n_weighted n_weighted_low
#>   <fct>               <dbl>    <dbl>    <dbl>    <int>      <dbl>          <dbl>
#> 1 Fulfilling domest… 0.0948   0.0899   0.0998     1207    640311.        605978.
#> 2 Permanently disab… 0.0155   0.0129   0.0186      178    104930.         85796.
#> 3 Retired            0.267    0.258    0.277      3146   1806954.       1746273.
#> 4 Student            0.0586   0.0544   0.0630      736    395829.        365532.
#> 5 Unemployed         0.0449   0.0411   0.0489      518    303252.        276953.
#> 6 Working full time  0.425    0.416    0.434      5162   2869868.       2797833.
#> 7 Working part time  0.0941   0.0890   0.0995     1160    636121.        600709.
#> # ℹ 1 more variable: n_weighted_upp <dbl>