Adds offset variables — add

Mostly used internally in add_modelling_features(). Adds two offset variables to be used in fit_single_contact_model():

log_contactable_population_school, and
log_contactable_population. These two variables require variables school_weighted_pop_fraction (from add_school_work_participation()) and pop_age_to (from add_school_work_participation()). This provides separate offsets for school setting when compared to the other settings such as home, work and other. The offset for school captures cohorting of students for schools and takes the logarithm of the weighted combination of contact population age distribution & school year probability calculated in add_school_work_participation(). See "details" for more information.

Usage

add_offset(contact_data)

Arguments

contact_data: contact data - must contain columns age_to, age_from, pop_age_to (from add_population_age_to(), and school_weighted_pop_fraction (from add_school_work_participation())).

Value

data.frame of contact_data with two extra columns: log_contactable_population_school and log_contactable_population

Details

why double offsets? There are two offsets specified, once in the model formula, and once in the "offset" argument of mgcv::bam. The offsets get added together when the model first fit. In addition, the setting specific offset from offset_variable, which is included in the GAM model as ... + offset(log_contactable_population) is used in prediction, whereas the other offset, included as an argument in the GAM as offset = log(participants) is only included when the model is initially created. See more detail in fit_single_contact_model().

Author

Nick Golding

Examples

age_min <- 10
age_max <- 15
all_ages <- age_min:age_max
library(tidyr)
example_df <- expand_grid(
  age_from = all_ages,
  age_to = all_ages,
)
example_df %>%
  add_population_age_to() %>%
  add_school_work_participation() %>%
  add_offset()
#> # A tibble: 36 × 14
#>    age_from age_to pop_age_to intergen…¹ schoo…² work_…³ schoo…⁴ work_…⁵ schoo…⁶
#>       <int>  <int>      <dbl>      <int>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
#>  1       10     10      0.161          0       1    0.05       1    0.05       1
#>  2       10     11      0.163          1       1    0.05       1    0.05       1
#>  3       10     12      0.165          2       1    0.05       1    0.2        1
#>  4       10     13      0.168          3       1    0.05       1    0.2        1
#>  5       10     14      0.170          4       1    0.05       1    0.2        1
#>  6       10     15      0.173          5       1    0.05       1    0.2        1
#>  7       11     10      0.161          1       1    0.05       1    0.05       1
#>  8       11     11      0.163          0       1    0.05       1    0.05       1
#>  9       11     12      0.165          1       1    0.05       1    0.2        1
#> 10       11     13      0.168          2       1    0.05       1    0.2        1
#> # … with 26 more rows, 5 more variables: work_probability <dbl>,
#> #   school_year_probability <dbl>, school_weighted_pop_fraction <dbl>,
#> #   log_contactable_population_school <dbl>, log_contactable_population <dbl>,
#> #   and abbreviated variable names ¹intergenerational,
#> #   ²school_fraction_age_from, ³work_fraction_age_from,
#> #   ⁴school_fraction_age_to, ⁵work_fraction_age_to, ⁶school_probability