conmat-population
conmat-population.Rmd
library(conmat)
The main goal of conmat is to estimate contact rates between age groups. This means we require data describing the age population distribution. Effectively this is data that has a column describing age, and a column describing population, like this:
library(tidyverse)
#> ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
#> ✔ ggplot2 3.4.0 ✔ purrr 1.0.0
#> ✔ tibble 3.1.8 ✔ dplyr 1.0.10
#> ✔ tidyr 1.2.1 ✔ stringr 1.5.0
#> ✔ readr 2.1.3 ✔ forcats 0.5.2
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag() masks stats::lag()
dat_age <- tibble(
age = seq(0, 25, by = 5),
population = seq(1410, 1350, by = -12)
)
dat_age
#> # A tibble: 6 × 2
#> age population
#> <dbl> <dbl>
#> 1 0 1410
#> 2 5 1398
#> 3 10 1386
#> 4 15 1374
#> 5 20 1362
#> 6 25 1350
We use this kind of data frequently in conmat, and it means that your code might sometimes have lots of repetition like this:
calculation(
data,
age_col = age,
population_col = population
)
estimation(
data,
age_col = age,
population_col = population
)
While there isn’t anything particularly wrong about this, it does mean repetition in arguments, and the code could instead look like this:
calculation(data)
estimation(data)
We can achieve this by creating a special object that is a dataframe
that knows which columns represent age, and population. This is a
conmat_population
object.
We can create one with as_conmat_population
:
dat_age_pop <- as_conmat_population(
data = dat_age,
age = age,
population = population
)
dat_age_pop
#> # A tibble: 6 × 2 (conmat_population)
#> - age: age
#> - population: population
#> age population
#> <dbl> <dbl>
#> 1 0 1410
#> 2 5 1398
#> 3 10 1386
#> 4 15 1374
#> 5 20 1362
#> 6 25 1350
You can see when we print this out to the console that this class is
noted in parentheses (conmat_population
), and the columns
are noted.
Accessing age and population information
If you want to access the age and population information, there are 2 main functions:
These return symbols, which can be used in programming.
age(dat_age_pop)
#> age
population(dat_age_pop)
#> population
alternatively there are functions that return character information:
age_label(dat_age_pop)
#> [1] "age"
population_label(dat_age_pop)
#> [1] "population"
Brief example of using accessor functions
You could use this to extract out the values from the data and then summarise it, for example:
pop_var <- age_label(dat_age_pop)
dat_age_pop[[pop_var]]
#> [1] 0 5 10 15 20 25
mean(dat_age_pop[[pop_var]])
#> [1] 12.5
sd(dat_age_pop[[pop_var]])
#> [1] 9.354143
age_var <- population_label(dat_age_pop)
dat_age_pop[[age_var]]
#> [1] 1410 1398 1386 1374 1362 1350
mean(dat_age_pop[[age_var]])
#> [1] 1380
sd(dat_age_pop[[age_var]])
#> [1] 22.44994
You could then wrap this in a function if you like:
summary_pop <- function(data){
dat_age_pop[[pop_var]]
mean_pop <-mean(dat_age_pop[[pop_var]])
sd_pop <-sd(dat_age_pop[[pop_var]])
age_var <- population_label(dat_age_pop)
dat_age_pop[[age_var]]
mean_age <- mean(dat_age_pop[[age_var]])
sd_age <- sd(dat_age_pop[[age_var]])
return(
tibble::tibble(
mean_pop,
sd_pop,
mean_age,
sd_age
)
)
}
summary_pop(dat_age_pop)
#> # A tibble: 1 × 4
#> mean_pop sd_pop mean_age sd_age
#> <dbl> <dbl> <dbl> <dbl>
#> 1 12.5 9.35 1380 22.4
However if you would like to program with these variables, for
example write a function that uses functions like mutate
and arrange
, from dplyr
, you would need to get
the symbols and then evaluate them with !!
, like so:
my_age_summary <- function(data){
age_col <- age(data)
data %>%
summarise(
mean_age = mean(!!age_col)
)
}
my_age_summary(dat_age_pop)
#> # A tibble: 1 × 1
#> mean_age
#> <dbl>
#> 1 12.5
And for a slightly more complex use case
my_age_pop_summary <- function(data){
age_col <- age(data)
pop_col <- population(data)
data %>%
summarise(
across(c(!!age_col, !!pop_col),
c(mean = mean, sd = sd),
.names = "{.fn}_{.col}")
)
}
my_age_pop_summary(dat_age_pop)
#> # A tibble: 1 × 4
#> mean_age sd_age mean_population sd_population
#> <dbl> <dbl> <dbl> <dbl>
#> 1 12.5 9.35 1380 22.4
An example use from the package
Internally within conmat we do some modelling work that requires us to know the midpoint of the ages, and a coupld of other bits - here’s an example of how we write that code now:
add_modelling_info <- function(data){
age_col <- age(data)
age_var <- age_label(data)
pop_col <- population(data)
diffs <- diff(data[[age_var]])
bin_widths <- c(diffs, diffs[length(diffs)])
data %>%
dplyr::arrange(
!!age_col
) %>%
dplyr::mutate(
# model based on bin midpoint
bin_width = bin_widths,
midpoint = !!age_col + bin_width / 2,
# scaling down the population appropriately
log_pop = log(!!pop_col / bin_width)
)
}
add_modelling_info(dat_age_pop)
#> # A tibble: 6 × 5 (conmat_population)
#> - age: age
#> - population: population
#> age population bin_width midpoint log_pop
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0 1410 5 2.5 5.64
#> 2 5 1398 5 7.5 5.63
#> 3 10 1386 5 12.5 5.62
#> 4 15 1374 5 17.5 5.62
#> 5 20 1362 5 22.5 5.61
#> 6 25 1350 5 27.5 5.60