Conmat Population Data
conmat-population.Rmd
library(conmat)
The main goal of conmat is to estimate contact rates between age groups. This means we require data describing the age population distribution. Effectively this is data that has a column describing age, and a column describing population, like this:
library(tibble)
dat_age <- tibble(
age = seq(0, 25, by = 5),
population = seq(1410, 1350, by = -12)
)
dat_age
#> # A tibble: 6 × 2
#> age population
#> <dbl> <dbl>
#> 1 0 1410
#> 2 5 1398
#> 3 10 1386
#> 4 15 1374
#> 5 20 1362
#> 6 25 1350
We use this kind of data frequently in conmat, and it means that your code might sometimes have lots of repetition like this:
calculation(
data,
age_col = age,
population_col = population
)
estimation(
data,
age_col = age,
population_col = population
)
The issue with repeating arguments is that it is unnecessary and sometimes leads to forgetting to include them, or including them erroneously. The code could instead look like this:
calculation(data)
estimation(data)
We can achieve this by creating a special object that is a dataframe
that knows which columns represent age, and population. This is a
conmat_population
object.
We can create one with as_conmat_population
:
dat_age_pop <- as_conmat_population(
data = dat_age,
age = age,
population = population
)
dat_age_pop
#> # A tibble: 6 × 2 (conmat_population)
#> - age: age
#> - population: population
#> age population
#> <dbl> <dbl>
#> 1 0 1410
#> 2 5 1398
#> 3 10 1386
#> 4 15 1374
#> 5 20 1362
#> 6 25 1350
You can see when we print this out to the console that this class is
noted in parentheses (conmat_population
), and the columns
are noted.
Accessing age and population information
If you want to access the age and population information, there are 2 main functions:
These return symbols, which can be used in programming.
age(dat_age_pop)
#> age
population(dat_age_pop)
#> population
alternatively there are functions that return character information:
age_label(dat_age_pop)
#> [1] "age"
population_label(dat_age_pop)
#> [1] "population"
Brief example of using accessor functions
You could use this to extract out the values from the data and then summarise it, for example:
pop_var <- age_label(dat_age_pop)
dat_age_pop[[pop_var]]
#> [1] 0 5 10 15 20 25
mean(dat_age_pop[[pop_var]])
#> [1] 12.5
sd(dat_age_pop[[pop_var]])
#> [1] 9.354143
age_var <- population_label(dat_age_pop)
dat_age_pop[[age_var]]
#> [1] 1410 1398 1386 1374 1362 1350
mean(dat_age_pop[[age_var]])
#> [1] 1380
sd(dat_age_pop[[age_var]])
#> [1] 22.44994
You could then wrap this in a function if you like:
summary_pop <- function(data) {
dat_age_pop[[pop_var]]
mean_pop <- mean(dat_age_pop[[pop_var]])
sd_pop <- sd(dat_age_pop[[pop_var]])
age_var <- population_label(dat_age_pop)
dat_age_pop[[age_var]]
mean_age <- mean(dat_age_pop[[age_var]])
sd_age <- sd(dat_age_pop[[age_var]])
return(
tibble(
mean_pop,
sd_pop,
mean_age,
sd_age
)
)
}
summary_pop(dat_age_pop)
#> # A tibble: 1 × 4
#> mean_pop sd_pop mean_age sd_age
#> <dbl> <dbl> <dbl> <dbl>
#> 1 12.5 9.35 1380 22.4
However if you would like to program with these variables, for
example write a function that uses functions like mutate
and arrange
, from dplyr
, you would need to get
the symbols and then evaluate them with !!
, like so:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
my_age_summary <- function(data) {
age_col <- age(data)
data %>%
summarise(
mean_age = mean(!!age_col)
)
}
my_age_summary(dat_age_pop)
#> # A tibble: 1 × 1
#> mean_age
#> <dbl>
#> 1 12.5
And for a slightly more complex use case
my_age_pop_summary <- function(data) {
age_col <- age(data)
pop_col <- population(data)
data %>%
summarise(
across(c(!!age_col, !!pop_col),
c(mean = mean, sd = sd),
.names = "{.fn}_{.col}"
)
)
}
my_age_pop_summary(dat_age_pop)
#> # A tibble: 1 × 4
#> mean_age sd_age mean_population sd_population
#> <dbl> <dbl> <dbl> <dbl>
#> 1 12.5 9.35 1380 22.4
An example use from the package
Internally within conmat we do some modelling work that requires us to know the midpoint of the ages, and a couple of other bits - here’s an example of how we write that code now:
add_modelling_info <- function(data) {
age_col <- age(data)
age_var <- age_label(data)
pop_col <- population(data)
diffs <- diff(data[[age_var]])
bin_widths <- c(diffs, diffs[length(diffs)])
data %>%
dplyr::arrange(
!!age_col
) %>%
dplyr::mutate(
# model based on bin midpoint
bin_width = bin_widths,
midpoint = !!age_col + bin_width / 2,
# scaling down the population appropriately
log_pop = log(!!pop_col / bin_width)
)
}
add_modelling_info(dat_age_pop)
#> # A tibble: 6 × 5 (conmat_population)
#> - age: age
#> - population: population
#> age population bin_width midpoint log_pop
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0 1410 5 2.5 5.64
#> 2 5 1398 5 7.5 5.63
#> 3 10 1386 5 12.5 5.62
#> 4 15 1374 5 17.5 5.62
#> 5 20 1362 5 22.5 5.61
#> 6 25 1350 5 27.5 5.60