Add features required for modelling to the dataset — add_modelling

This function adds three main groups of features to the data. It is used internally in fit_single_contact_model() and predict_contacts_1y(). It requires columns named age_to and age_from. The three types of features it adds are described below:

Population distribution of contact ages from the function add_population_age_to(), which requires a column called "age_to" representing the age of the person who had contact. It creates a column called pop_age_to. add_population_age_to() takes an extra argument for population, which defaults to get_polymod_population(), but needs to be a conmat_population object, which specifies the age and population characteristics, or a data frame with columns, lower.age.limit, and population.
School work participation, which is from the function add_school_work_participation(). This requires columns age_to and age_from, but will operate on any column starting with age and adds columns: school_probability, work_probability, school_year_probability, and school_weighted_pop_fraction.
Offset is added on to the data using add_offset(). This requires variables school_weighted_pop_fraction (from add_school_work_participation()) and pop_age_to (from add_school_work_participation()). It adds two columns, log_contactable_population_school, and log_contactable_population.

Usage

add_modelling_features(contact_data, ...)

Arguments

contact_data: contact data with columns age_to and age_from
...: extra dots passed to population argument of add_population_age_to()

Value

data frame with 11 extra columns - the contents of contact_data, plus: pop_age_to, school_fraction_age_from, work_fraction_age_from, school_fraction_age_to, work_fraction_age_to, school_probability, work_probability, school_year_probability, school_weighted_pop_fraction, log_contactable_population_school, and log_contactable_population.

Examples

age_min <- 10
age_max <- 15
all_ages <- age_min:age_max
library(tidyr)
example_df <- expand_grid(
  age_from = all_ages,
  age_to = all_ages,
)
add_modelling_features(example_df)
#> # A tibble: 36 × 20
#>    age_from age_to pop_age_to intergen…¹ gam_a…² gam_a…³ gam_a…⁴ gam_a…⁵ gam_a…⁶
#>       <int>  <int>      <dbl>      <int>   <int>   <dbl>   <int>   <int>   <int>
#>  1       10     10      0.161          0       0       0     100      20      10
#>  2       10     11      0.163          1       1       1     110      21      11
#>  3       10     12      0.165          2       2       4     120      22      12
#>  4       10     13      0.168          3       3       9     130      23      13
#>  5       10     14      0.170          4       4      16     140      24      14
#>  6       10     15      0.173          5       5      25     150      25      15
#>  7       11     10      0.161          1       1       1     110      21      11
#>  8       11     11      0.163          0       0       0     121      22      11
#>  9       11     12      0.165          1       1       1     132      23      12
#> 10       11     13      0.168          2       2       4     143      24      13
#> # … with 26 more rows, 11 more variables: gam_age_pmin <int>,
#> #   school_fraction_age_from <dbl>, work_fraction_age_from <dbl>,
#> #   school_fraction_age_to <dbl>, work_fraction_age_to <dbl>,
#> #   school_probability <dbl>, work_probability <dbl>,
#> #   school_year_probability <dbl>, school_weighted_pop_fraction <dbl>,
#> #   log_contactable_population_school <dbl>, log_contactable_population <dbl>,
#> #   and abbreviated variable names ¹intergenerational, ²gam_age_offdiag, …