Add features required for modelling to the dataset
add_modelling_features.Rd
This function adds three main groups of features to the data. It is used
internally in fit_single_contact_model()
and predict_contacts_1y()
.
It requires columns named age_to
and age_from
. The three types of
features it adds are described below:
Population distribution of contact ages from the function
add_population_age_to()
, which requires a column called "age_to" representing the age of the person who had contact. It creates a column calledpop_age_to
.add_population_age_to()
takes an extra argument for population, which defaults toget_polymod_population()
, but needs to be aconmat_population
object, which specifies theage
andpopulation
characteristics, or a data frame with columns,lower.age.limit
, andpopulation
.School work participation, which is from the function
add_school_work_participation()
. This requires columnsage_to
andage_from
, but will operate on any column starting withage
and adds columns:school_probability
,work_probability
,school_year_probability
, andschool_weighted_pop_fraction
.Offset is added on to the data using
add_offset()
. This requires variablesschool_weighted_pop_fraction
(fromadd_school_work_participation()
) andpop_age_to
(fromadd_school_work_participation()
). It adds two columns,log_contactable_population_school
, andlog_contactable_population
.
Arguments
- contact_data
contact data with columns
age_to
andage_from
- ...
extra dots passed to
population
argument ofadd_population_age_to()
Value
data frame with 11 extra columns - the contents of contact_data
,
plus: pop_age_to, school_fraction_age_from, work_fraction_age_from,
school_fraction_age_to, work_fraction_age_to, school_probability,
work_probability, school_year_probability, school_weighted_pop_fraction,
log_contactable_population_school, and log_contactable_population.
Examples
age_min <- 10
age_max <- 15
all_ages <- age_min:age_max
library(tidyr)
example_df <- expand_grid(
age_from = all_ages,
age_to = all_ages,
)
add_modelling_features(example_df)
#> # A tibble: 36 × 20
#> age_from age_to pop_age_to intergen…¹ gam_a…² gam_a…³ gam_a…⁴ gam_a…⁵ gam_a…⁶
#> <int> <int> <dbl> <int> <int> <dbl> <int> <int> <int>
#> 1 10 10 0.161 0 0 0 100 20 10
#> 2 10 11 0.163 1 1 1 110 21 11
#> 3 10 12 0.165 2 2 4 120 22 12
#> 4 10 13 0.168 3 3 9 130 23 13
#> 5 10 14 0.170 4 4 16 140 24 14
#> 6 10 15 0.173 5 5 25 150 25 15
#> 7 11 10 0.161 1 1 1 110 21 11
#> 8 11 11 0.163 0 0 0 121 22 11
#> 9 11 12 0.165 1 1 1 132 23 12
#> 10 11 13 0.168 2 2 4 143 24 13
#> # … with 26 more rows, 11 more variables: gam_age_pmin <int>,
#> # school_fraction_age_from <dbl>, work_fraction_age_from <dbl>,
#> # school_fraction_age_to <dbl>, work_fraction_age_to <dbl>,
#> # school_probability <dbl>, work_probability <dbl>,
#> # school_year_probability <dbl>, school_weighted_pop_fraction <dbl>,
#> # log_contactable_population_school <dbl>, log_contactable_population <dbl>,
#> # and abbreviated variable names ¹intergenerational, ²gam_age_offdiag, …