Add features required for modelling to the dataset
add_modelling_features.RdThis function adds three main groups of features to the data. It is used
internally in fit_single_contact_model() and predict_contacts_1y().
It requires columns named age_to and age_from. The three types of
features it adds are described below:
Population distribution of contact ages from the function
add_population_age_to(), which requires a column called "age_to" representing the age of the person who had contact. It creates a column calledpop_age_to.add_population_age_to()takes an extra argument for population, which defaults toget_polymod_population(), but needs to be aconmat_populationobject, which specifies theageandpopulationcharacteristics, or a data frame with columns,lower.age.limit, andpopulation.School work participation, which is from the function
add_school_work_participation(). This requires columnsage_toandage_from, but will operate on any column starting withageand adds columns:school_probability,work_probability,school_year_probability, andschool_weighted_pop_fraction.Offset is added on to the data using
add_offset(). This requires variablesschool_weighted_pop_fraction(fromadd_school_work_participation()) andpop_age_to(fromadd_school_work_participation()). It adds two columns,log_contactable_population_school, andlog_contactable_population.
Arguments
- contact_data
contact data with columns
age_toandage_from- ...
extra dots passed to
populationargument ofadd_population_age_to()
Value
data frame with 11 extra columns - the contents of contact_data,
plus: pop_age_to, school_fraction_age_from, work_fraction_age_from,
school_fraction_age_to, work_fraction_age_to, school_probability,
work_probability, school_year_probability, school_weighted_pop_fraction,
log_contactable_population_school, and log_contactable_population.
Examples
age_min <- 10
age_max <- 15
all_ages <- age_min:age_max
library(tidyr)
example_df <- expand_grid(
age_from = all_ages,
age_to = all_ages,
)
add_modelling_features(example_df)
#> # A tibble: 36 × 20
#> age_from age_to pop_age_to intergen…¹ gam_a…² gam_a…³ gam_a…⁴ gam_a…⁵ gam_a…⁶
#> <int> <int> <dbl> <int> <int> <dbl> <int> <int> <int>
#> 1 10 10 0.161 0 0 0 100 20 10
#> 2 10 11 0.163 1 1 1 110 21 11
#> 3 10 12 0.165 2 2 4 120 22 12
#> 4 10 13 0.168 3 3 9 130 23 13
#> 5 10 14 0.170 4 4 16 140 24 14
#> 6 10 15 0.173 5 5 25 150 25 15
#> 7 11 10 0.161 1 1 1 110 21 11
#> 8 11 11 0.163 0 0 0 121 22 11
#> 9 11 12 0.165 1 1 1 132 23 12
#> 10 11 13 0.168 2 2 4 143 24 13
#> # … with 26 more rows, 11 more variables: gam_age_pmin <int>,
#> # school_fraction_age_from <dbl>, work_fraction_age_from <dbl>,
#> # school_fraction_age_to <dbl>, work_fraction_age_to <dbl>,
#> # school_probability <dbl>, work_probability <dbl>,
#> # school_year_probability <dbl>, school_weighted_pop_fraction <dbl>,
#> # log_contactable_population_school <dbl>, log_contactable_population <dbl>,
#> # and abbreviated variable names ¹intergenerational, ²gam_age_offdiag, …