Add pressure testing functions#14
Conversation
| #' @export | ||
| gen_combined_df <- function(prev_dat, df2, interest_cols, key_cols) { | ||
| # TODO: input checks | ||
| # TODO: df2 needs a better name |
There was a problem hiding this comment.
df2 is essentially a cleaned version of of the touchstone_new latest report on packit where vaccine names are cleaned, subregions appended etc. happy to take suggestions on a better way to do it
There was a problem hiding this comment.
Thanks - could you suggest an informative name for the argument that somebody new to the package and the workflow would understand? data_current? data_cleaned? If you could pop a couple of lines in this thread about any expectations around it, I can add those to the function docs.
There was a problem hiding this comment.
df_clean is fine, thanks. df_clean is the imported dependency report (currently named touchstone_new) with the following cleaning steps applied:
- Malaria vaccine names amended
- Subregions appended
- Fixes for missing country names
- Removal of now-redundant columns
It may even be sufficient to just apply these steps todfitself, I think I just kept them separate when developing the report so I didn't have to keep pulling a clean dependency when testing things out.
|
This pull request:
Reach out on slack ( (Note that results may be inaccurate if you branched from an outdated version of the target branch.) |
| # for scales formatting | ||
| .x <- NULL | ||
|
|
||
| # TODO: should NA-producing values (< 1) be removed? |
There was a problem hiding this comment.
I think previously we have used a lowerbound <- 1e-8 to filter out very small values
| ) { | ||
| checkmate::assert_data_frame(df, min.cols = 1L, min.rows = 1L) | ||
|
|
||
| # TODO: can we find checks for prev_data size in reln to df? rows? cols? |
There was a problem hiding this comment.
I'm not sure I fully understand? But prev_dat and df would likely be matched on columns
| variable <- rlang::arg_match(variable) | ||
| checkmate::assert_character(group_cols, min.len = 1L, any.missing = FALSE) | ||
|
|
||
| # TODO: check what a sensible upper limit might be |
There was a problem hiding this comment.
I think in my pressure testing report, an arbitrary limt was difference exceeding ± 100 times the interquartile range (IQR) for the relevant country, vaccine and activity_type.
There was a problem hiding this comment.
To discuss on call whether to keep pressure testing functions and rapid-model-run-impact function separate or combine for ease
|
This pull request:
Reach out on slack ( (Note that results may be inaccurate if you branched from an outdated version of the target branch.) |
|
This pull request:
Reach out on slack ( (Note that results may be inaccurate if you branched from an outdated version of the target branch.) |
|
This pull request:
Reach out on slack ( (Note that results may be inaccurate if you branched from an outdated version of the target branch.) |
|
This pull request:
Reach out on slack ( (Note that results may be inaccurate if you branched from an outdated version of the target branch.) |
| ) | ||
| } | ||
|
|
||
| # data may have a `burden_outcome` column which should not be counted as |
There was a problem hiding this comment.
I think burden_outcome should be moved to the default key_cols COLNAMES_KEY_PRESSURE_TEST; and be picked up in the pressure_testing.Rmd L98-99.
There was a problem hiding this comment.
I looked into adding this to COLNAMES_KEY_PRESSURE_TEST, and this causes the problem that these names are used in functions where the burden is expected to be in wide format, so a separate column for each burden type. I think the parsimonious course of action is to add this to the checked columns here, but to leave it out from the constant vector for now. See example function filter_invalid_trajectories() below this one, where the outcomes are expected as "*_averted".
There was a problem hiding this comment.
I see. Thanks for the clarification.
| flag_duplicates(eg_impact) %>% | ||
| select( | ||
| modelling_group, country, disease, | ||
| vaccine, activity_type, year, n_key |
There was a problem hiding this comment.
as mentioned ealier, burden_outcome may also be picked up here.
There was a problem hiding this comment.
Added burden_outcome here.
| names_from = "burden_outcome", | ||
| values_from = "impact" | ||
| ) | ||
| prev_df$support_type <- "other" # unsure what values this can take |
There was a problem hiding this comment.
We are moving from support_type ("total", "gavi", "other") to percentage of gavi support (numeric between 0-1). And this will not be part of burden processing or impact calculation. It will only appear in our out-facing impact extrapolation reports for funders; and highly dependent on funders incremental updates (twice a year). I think we can ignore this classifier for the pressure testing.
|
This pull request:
Reach out on slack ( (Note that results may be inaccurate if you branched from an outdated version of the target branch.) |
This PR adds functions from the pressure testing report.
R/fn_impact_diagnostics.R.R/fn_plotting_prep_impact_diagnostics.RandR/fn_plotting_impact_diagnostics.R.Getting started
The easiest way to get started with reviewing is to check out the
developbranch and render the package website using {pkgdown}:pkgdown::build_site(). View the vignette on pressure testing under Articles which explains the added functions.Open task