Skip to contents

This function returns an interpolating function to get populations in 1y age increments from chunkier distributions produced by socialmixr::wpp_age().

Usage

get_age_population_function(data, ...)

# S3 method for conmat_population
get_age_population_function(data = population, ...)

# S3 method for data.frame
get_age_population_function(
  data = population,
  age_col = lower.age.limit,
  pop_col = population,
  ...
)

Arguments

data

dataset containing information on population of a given age/age group

...

extra arguments

age_col

bare variable name for the column with age information

pop_col

bare variable name for the column with population information

Value

An interpolating function to get populations in 1y age increments

Details

The function first prepares the data to fit a smoothing spline to the data for ages below the maximum age. It arranges the data by the lower limit of the age group to obtain the bin width/differences of the lower age limits. The mid point of the bin width is later added to the ages and the population is scaled as per the bin widths. The maximum age is later obtained and the populations for different above and below are filtered out along with the sum of populations with and without maximum age. A cubic smoothing spline is then fitted to the data for ages below the maximum with predictor variable as the ages with the mid point of the bins added to it where as the response variable is the log-scaled population. Using the smoothing spline fit, the predicted population of ages 0 to 200 is obtained and the predicted population is adjusted further using a ratio of the sum of the population across all ages from the data and predicted population. The ratio is based on whether the ages are under the maximum age as the total population across all ages differs for ages above and below the maximum age. The maximum age population is adjusted further to drop off smoothly, based on the weights. The final population is then linearly extrapolated over years past the upper bound from the data. For ages above the maximum age from data, the population is calculated as a weighted population of the maximum age that depends on the years past the upper bound. Older ages would have lower weights, therefore lower population.

Examples

polymod_pop <- get_polymod_population()

polymod_pop
#> # A tibble: 21 × 2 (conmat_population)
#>  - age: lower.age.limit
#>  - population: population
#>    lower.age.limit population
#>              <int>      <dbl>
#>  1               0   1852682.
#>  2               5   1968449.
#>  3              10   2138897.
#>  4              15   2312032.
#>  5              20   2407486.
#>  6              25   2423602.
#>  7              30   2585137.
#>  8              35   2969393.
#>  9              40   3041663.
#> 10              45   2809154.
#> # … with 11 more rows

# But these ages and populations are binned every 5 years. So we can now
# provide a specified age and get the estimated population for that 1 year
# age group. First we create the new function like so

age_pop_function <- get_age_population_function(
  data = polymod_pop
)
# Then we pass it a year to get the estimated population for a particular age
age_pop_function(4)
#> [1] 375940

# Or a vector of years, to get the estimated population for a particular age
# range
age_pop_function(1:4)
#> [1] 360379.5 365489.5 370672.4 375940.0

# Notice that we get a _pretty similar_ number of 0-4 if we sum it up, as
# the first row of the table:
head(polymod_pop, 1)
#> # A tibble: 1 × 2 (conmat_population)
#>  - age: lower.age.limit
#>  - population: population
#>   lower.age.limit population
#>             <int>      <dbl>
#> 1               0   1852682.
sum(age_pop_function(age = 0:4))
#> [1] 1827822

# Usage in dplyr
library(dplyr)
example_df <- slice_head(abs_education_state, n = 5)
example_df %>%
  mutate(population_est = age_pop_function(age))
#> # A tibble: 5 × 6
#>    year state aboriginal_and_torres_strait_islander_status   age n_ful…¹ popul…²
#>   <dbl> <chr> <chr>                                        <dbl>   <dbl>   <dbl>
#> 1  2006 ACT   Aboriginal and Torres Strait Islander            4       5 375940.
#> 2  2006 ACT   Non-Indigenous                                   4     109 375940.
#> 3  2006 NSW   Aboriginal and Torres Strait Islander            4     104 375940.
#> 4  2006 NSW   Non-Indigenous                                   4    1870 375940.
#> 5  2006 NT    Aboriginal and Torres Strait Islander            4     102 375940.
#> # … with abbreviated variable names ¹​n_full_and_part_time, ²​population_est