library(coder)
Let’s consider some example data (ex_peopple
and
ex_icd10
) from vignette("ex_data")
.
Let’s categorize those patients by their Charlson comorbidity:
categorize(ex_people, codedata = ex_icd10, cc = charlson, id = "name", code = "icd10")
#> Classification based on: icd10
#> # A tibble: 100 × 25
#> name surgery myoca…¹ conge…² perip…³ cereb…⁴ demen…⁵ chron…⁶ rheum…⁷
#> <chr> <date> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl>
#> 1 Chen, Tre… 2023-02-28 FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> 2 Graves, A… 2022-11-20 FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> 3 Trujillo,… 2022-11-07 FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> 4 Simpson, … 2023-02-09 FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> 5 Chin, Nel… 2023-01-23 FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> 6 Le, Chris… 2022-08-27 FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> 7 Kang, Xuan 2022-11-29 FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> 8 Shuemaker… 2022-08-28 FALSE FALSE FALSE FALSE FALSE TRUE FALSE
#> 9 Boucher, … 2023-02-03 FALSE FALSE TRUE FALSE FALSE FALSE FALSE
#> 10 Le, Sorai… 2023-01-08 FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> # … with 90 more rows, 16 more variables: peptic.ulcer.disease <lgl>,
#> # mild.liver.disease <lgl>, diabetes.without.complication <lgl>,
#> # hemiplegia.or.paraplegia <lgl>, renal.disease <lgl>,
#> # diabetes.complication <lgl>, malignancy <lgl>,
#> # moderate.or.severe.liver.disease <lgl>, metastatic.solid.tumor <lgl>,
#> # AIDS.HIV <lgl>, charlson <dbl>, deyo_ramano <dbl>, dhoore <dbl>,
#> # ghali <dbl>, quan_original <dbl>, quan_updated <dbl>, and abbreviated …
Here, charlson
(as supplied by the cc
argument) is a “classcodes” object containing a classification scheme.
This is the specification of how to match ex_icd10$icd10
to
each condition recognized by the Charlson comorbidity classification. It
is based on regular expressions (see ?regex
).
There are 7 default “classcodes” objects in the package
(classcodes
column below). Each of them might have several
versions of regular expressions (column regex
) and weighted
indices (column indices
):
all_classcodes()
#> # A tibble: 7 × 3
#> classcodes regex indices
#> <chr> <chr> <chr>
#> 1 charlson icd10, icd9cm_deyo, icd9cm_enhanced, icd10_rcs, icd8_br… "charl…
#> 2 cps icd10 "only_…
#> 3 elixhauser icd10, icd10_short, icd9cm, icd9cm_ahrqweb, icd9cm_enha… "sum_a…
#> 4 hip_ae icd10, kva, icd10_fracture ""
#> 5 hip_ae_hailer icd10, kva ""
#> 6 knee_ae icd10, kva ""
#> 7 rxriskv atc_pratt, atc_caughey, atc_garland "pratt…
Each of those classcodes objects are documented (see for example
?charlson
). Those objects are basically tibbles (data
frames) with some additional attributes:
charlson#>
#> Classcodes object
#>
#> Regular expressions:
#> icd10, icd9cm_deyo, icd9cm_enhanced, icd10_rcs, icd8_brusselaers, icd9_brusselaers
#> Indices:
#> charlson, deyo_ramano, dhoore, ghali, quan_original, quan_updated
#>
#> # A tibble: 17 × 14
#> group descr…¹ icd10 icd9c…² icd9c…³ icd10…⁴ icd8_…⁵ icd9_…⁶ charl…⁷ deyo_…⁸
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
#> 1 myocar… Acute … I2([… 41[02] 41[02] "I2([1… 41[0-2] 41[02] 1 1
#> 2 conges… Heart … I(09… 428 39891|… "I(1[1… 4270|4… 4(02|2… 1 1
#> 3 periph… Periph… I7([… 44(39|… 0930|4… "(I7([… 44[0-5] 44[0-7… 1 1
#> 4 cerebr… Cerebr… G4[5… 43[0-8] 36234|… "G4[56… 43[0-8] 362C|4… 1 1
#> 5 dement… Senile… F0([… 290 29(0|4… "A810|… 290[01] 29[04] 1 1
#> 6 chroni… Chroni… (I27… 490|50… 4(16[8… "(I2[6… 49[0-3… 416|49… 1 1
#> 7 rheuma… System… M(0[… 7(1(0[… 4465|7… "M(0[5… 7(1[0-… 71[0-4… 1 1
#> 8 peptic… Gastri… K2[5… 53[1-4] 53[1-4] <NA> <NA> <NA> 1 1
#> 9 mild l… Alcoho… B18|… 571[24… 070([2… <NA> <NA> <NA> 1 1
#> 10 diabet… Diabet… E1[0… 250[0-… 250[0-… <NA> <NA> <NA> 1 1
#> 11 hemipl… Parapl… G(04… 34(41|… 3(341|… "G(114… 344 34[2-4] 2 1
#> 12 renal … Chroni… I1(2… 58([25… 40(3([… "I1[23… 40[34]… 40[34]… 2 1
#> 13 diabet… Diabet… E1[0… 250[4-… 250[4-… "E1[0-… 250 250 2 1
#> 14 malign… Malign… C([0… (1([4-… 1([4-6… "C([01… 1([4-6… 1([4-6… 2 1
#> 15 modera… Hepati… I(8(… 456[01… 456[0-… "B18|I… 070|45… 070|45… 3 1
#> 16 metast… Second… C(7[… 19([6-… 19[6-9] "C(7[7… 19[6-9] 19[6-9] 6 1
#> 17 AIDS/H… HIV in… B2[0… 04[2-4] 04[2-4] "B2[0-… <NA> 279K 6 1
#> # … with 4 more variables: dhoore <dbl>, ghali <dbl>, quan_original <dbl>,
#> # quan_updated <dbl>, and abbreviated variable names ¹description,
#> # ²icd9cm_deyo, ³icd9cm_enhanced, ⁴icd10_rcs, ⁵icd8_brusselaers,
#> # ⁶icd9_brusselaers, ⁷charlson, ⁸deyo_ramano
Columns have pre-specified names and/or content:
group
: short descriptive names of all groups to
classify by (i.e. medical conditions/comorbidities in the Charlson
case)description:
(optional) details describing each
groupvignette("Interpret_regular_expressions")
for details and
?charlson
for concrete examples). Multiple versions might
be used if combined with different code sets (i.e. ICD-9 versus ICD-10)
or as suggested by different sources/authors. (Column names are
arbitrary but identified by attr(., "regexprs")
and
specified by argument regex
in
as.classcodes()
).attr(., "indices")
and specified by argument
indices
in as.classcodes()
.)condition
: (optional) conditional classification (not
used with charlson
but see example below).In the example above, we did not specify which version of the regular
expressions to use. We see from the printed output above (or by
attr(charlson, "regexprs")
), that the first regular
expression is “icd10”. This will be used by default. We have ICD-10
codes recorded in our code data set (ex_icd10$icd10
). We
might therefore use either “icd10” or the alternative “icd10_rcs”. Other
versions might be relevant if the medical data is coded by other codes
(such as earlier versions of ICD). We will show below how to alter this
setting in practice.
Some classcodes objects have an additional class attribute
“hierarchy”, controlling hierarchical groups where only one of possibly
several groups should be used in weighted index sums. The classcodes
object for the Elixhauser comorbidity classification has this
property:
print(elixhauser, n = 0) # preview 0 rows but present the attributes
#>
#> Classcodes object
#>
#> Regular expressions:
#> icd10, icd10_short, icd9cm, icd9cm_ahrqweb, icd9cm_enhanced
#> Indices:
#> sum_all, sum_all_ahrq, walraven, sid29, sid30, ahrq_mort, ahrq_readm
#> Hierarchy:
#> c("metastatic cancer", "solid tumor"),
#> c("diabetes uncomplicated", "diabetes complicated")
This means that patients who have both metastatic cancer and solid tumors should be recognized as such if classified. If such patient are assigned an aggregated index score, however, only the largest score is used (in this case for a metastatic cancer as superior to a solid tumor). The same is true for patients diagnosed with both uncomplicated and complicated diabetes.
Consider a patient Alice with some diagnoses:
<- tibble::tibble(id = "Alice")
pat <- c("C01", "C801", "E1010", "E1021")
diags ::decode(diags, decoder::icd10cm)
decoder#> [1] "Malignant neoplasm of base of tongue"
#> [2] "Malignant (primary) neoplasm, unspecified"
#> [3] "Type 1 diabetes mellitus with ketoacidosis without coma"
#> [4] "Type 1 diabetes mellitus with diabetic nephropathy"
According to Elixhauser, poor Alice has both a solid tumor and a metastatic cancer, as well as diabetes both with and without complications. The (unweighted) index “sum_all”, however will not equal 4 but 2, since metastatic cancer and diabetes with complications subsume solid tumors and diabetes without complications.
<- tibble::tibble(id = "Alice", icd10 = diags)
icd10 <- categorize(pat, codedata = icd10, cc = elixhauser,
x id = "id", code = "icd10", index = "sum_all", check.names = FALSE)
#> Classification based on: icd10
t(x)
#> [,1]
#> id "Alice"
#> congestive heart failure "FALSE"
#> cardiac arrhythmias "FALSE"
#> valvular disease "FALSE"
#> pulmonary circulation disorder "FALSE"
#> peripheral vascular disorder "FALSE"
#> hypertension uncomplicated "FALSE"
#> hypertension complicated "FALSE"
#> paralysis "FALSE"
#> other neurological disorders "FALSE"
#> chronic pulmonary disease "FALSE"
#> diabetes uncomplicated "TRUE"
#> diabetes complicated "TRUE"
#> hypothyroidism "FALSE"
#> renal failure "FALSE"
#> liver disease "FALSE"
#> peptic ulcer disease "FALSE"
#> AIDS/HIV "FALSE"
#> lymphoma "FALSE"
#> metastatic cancer "TRUE"
#> solid tumor "TRUE"
#> rheumatoid arthritis "FALSE"
#> coagulopathy "FALSE"
#> obesity "FALSE"
#> weight loss "FALSE"
#> fluid electrolyte disorders "FALSE"
#> blood loss anemia "FALSE"
#> deficiency anemia "FALSE"
#> alcohol abuse "FALSE"
#> drug abuse "FALSE"
#> psychoses "FALSE"
#> depression "FALSE"
#> sum_all "2"
Consider Alice once more. Suppose she got a THA and had some surgical
procedure codes recorded at hospital visits either before, during or
after her index surgery. Those codes are recorded by the Nomesco
classification of surgical procedures (also known as KVA codes in
Swedish). Here, “post_op” indicates whether the code was recorded after
surgery or not. This information is not always accessible by pure date
stamps (if so, the approach illustrated in
vignette("coder")
could be used instead).
<-
nomesco ::tibble(
tibbleid = "Alice",
kva = c("AA01", "NFC01"),
post_op = c(TRUE, FALSE)
)
Thus, the “post_op” column is a Boolean/logical vector with a name
recognized from the “condition” column in hip_ae
, a
classcodes object used to identify adverse events after THA (the use of
set_classcodes()
is further explained below and is used
here since hip_ae
includes codes for both ICD and
NOMESCO/KVA).
set_classcodes(hip_ae, regex = "kva")
#>
#> Classcodes object
#>
#> Regular expressions:
#> kva
#> Indices:
#>
#>
#> # A tibble: 1 × 3
#> group kva condi…¹
#> <chr> <chr> <chr>
#> 1 KVA ^(NF([CF-HJ-MS-TW]|A(02|1[12]|2[0-2])|Q09|U[013489]9)|QD(A10|B(… post_op
#> # … with abbreviated variable name ¹condition
A code from nomesco$kva
will only be recognized as an
adverse events if 1) the code is matched by the relevant regular
expression, and 2) the extra condition (from
nomesco$post_op
) is TRUE.
We need to specify that codes are based on regular expressions
matching NOMESCO codes. We do this by the regex
argument
passed to set_classcodes()
by the cc_args
argument.
In the data set (nomesco
), “AA01” was recorded after
surgery but does not indicate a potential adverse event. “NFC01” is a
potential adverse event but was recorded already before surgery.
Therefore, no adverse event will be recognized in this case.
categorize(pat, codedata = nomesco, cc = hip_ae, id = "id", code = "kva",
cc_args = list(regex = "kva"))
#> index calculated as number of relevant categories
#> # A tibble: 1 × 3
#> id KVA index
#> <chr> <lgl> <dbl>
#> 1 Alice FALSE 0
Most functions do not use the classcodes object themselves, but a
modified version passed through set_classcodes()
. This
function can be called directly but is more often invoked by arguments
passed by the cc_args
argument used in other functions (as
in the example above).
set_classcodes()
We might use set_classcodes()
to prepare a
classification scheme according to the Charlson comorbidity index based
on ICD-8 (Brusselaers and Lagergren 2017).
Assume that such codes might be found in character strings with leading
prefixes or in the middle of a more verbatim description. This is
controlled by setting the argument start = FALSE
, meaning
that the identified ICD-8 codes do not need to appear in the beginning
of the character string. We might assume, however, that there is no more
information after the code (as specified by stop = TRUE
).
We can also use some more specific and unique group names as specified
by tech_names
.
<-
charlson_icd8 set_classcodes(
"charlson",
regex = "icd8_brusselaers", # Version based on ICD-8
start = FALSE, # Codes do not have to occur in the beginning of a vector
stop = TRUE, # Code vector must end with the specified codes
tech_names = TRUE # Use long but unique and descriptive variable names
)
The resulting object has only one version of regular expressions
(icd8_brusselaers
as specified). Each regular expression is
suffixed with $
(due to stop = TRUE
). Group
names might seem cumbersome but this will help to distinguish column
names added by categorize()
if this function is run
repeatedly with different classcodes (i.e. if we calculate both Charlson
and Elixhauser indices for the same patients). The original
charlson
object had 17 rows, but charlson_icd8
has only 13, since not all groups are used in this version.
charlson_icd8#>
#> Classcodes object
#>
#> Regular expressions:
#> icd8_brusselaers
#> Indices:
#> charlson, deyo_ramano, dhoore, ghali, quan_original, quan_updated
#>
#> # A tibble: 13 × 9
#> group descr…¹ icd8_…² charl…³ deyo_…⁴ dhoore ghali quan_…⁵ quan_…⁶
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 charlson_icd8_b… Acute … (41[0-… 1 1 1 1 1 0
#> 2 charlson_icd8_b… Heart … (4270|… 1 1 1 4 1 2
#> 3 charlson_icd8_b… Periph… (44[0-… 1 1 1 2 1 0
#> 4 charlson_icd8_b… Cerebr… (43[0-… 1 1 1 1 1 0
#> 5 charlson_icd8_b… Senile… (290[0… 1 1 1 0 1 2
#> 6 charlson_icd8_b… Chroni… (49[0-… 1 1 1 0 1 1
#> 7 charlson_icd8_b… System… (7(1[0… 1 1 1 0 1 1
#> 8 charlson_icd8_b… Parapl… (344)$ 2 1 1 0 2 2
#> 9 charlson_icd8_b… Chroni… (40[34… 2 1 1 3 2 1
#> 10 charlson_icd8_b… Diabet… (250)$ 2 1 1 0 2 1
#> 11 charlson_icd8_b… Malign… (1([4-… 2 1 1 0 2 2
#> 12 charlson_icd8_b… Hepati… (070|4… 3 1 1 0 3 4
#> 13 charlson_icd8_b… Second… (19[6-… 6 1 1 0 6 6
#> # … with abbreviated variable names ¹description, ²icd8_brusselaers, ³charlson,
#> # ⁴deyo_ramano, ⁵quan_original, ⁶quan_updated
Note that all index columns remain in the tibble. It is thus possible
to combine any categorization with any index, although some combinations
might be preferred (such as regex_icd9cm_deyo
combined with
index_deyo_ramano
).
We can now use charlson_icd8
for classification:
classify(410, charlson_icd8)
#> Classification based on: icd8_brusselaers
#>
#> The printed data is of class: classified, matrix.
#> It has 1 row(s).
#> It is here previewed as a tibble
#> Use `print(x, n = NULL)` to print as is (or use `n` to specify the number of rows to preview)!
#>
#> # A tibble: 1 × 13
#> charlson_icd…¹ charl…² charl…³ charl…⁴ charl…⁵ charl…⁶ charl…⁷ charl…⁸ charl…⁹
#> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl>
#> 1 TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> # … with 4 more variables:
#> # charlson_icd8_brusselaers_diabetes_complication <lgl>,
#> # charlson_icd8_brusselaers_malignancy <lgl>,
#> # charlson_icd8_brusselaers_moderate_or_severe_liver_disease <lgl>,
#> # charlson_icd8_brusselaers_metastatic_solid_tumor <lgl>, and abbreviated
#> # variable names ¹charlson_icd8_brusselaers_myocardial_infarction,
#> # ²charlson_icd8_brusselaers_congestive_heart_failure, …
The ICD-8 code 410
is recognized as (only) myocardial
infarction.
set_classcodes()
Instead of pre-specifying the charlson_icd8
, a similar
result is achieved by:
classify(
410,
"charlson",
cc_args = list(
regex = "icd8_brusselaers",
start = FALSE,
stop = TRUE,
tech_names = TRUE
)
)#>
#> The printed data is of class: classified, matrix.
#> It has 1 row(s).
#> It is here previewed as a tibble
#> Use `print(x, n = NULL)` to print as is (or use `n` to specify the number of rows to preview)!
#>
#> # A tibble: 1 × 13
#> charlson_icd…¹ charl…² charl…³ charl…⁴ charl…⁵ charl…⁶ charl…⁷ charl…⁸ charl…⁹
#> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl>
#> 1 TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#> # … with 4 more variables:
#> # charlson_icd8_brusselaers_diabetes_complication <lgl>,
#> # charlson_icd8_brusselaers_malignancy <lgl>,
#> # charlson_icd8_brusselaers_moderate_or_severe_liver_disease <lgl>,
#> # charlson_icd8_brusselaers_metastatic_solid_tumor <lgl>, and abbreviated
#> # variable names ¹charlson_icd8_brusselaers_myocardial_infarction,
#> # ²charlson_icd8_brusselaers_congestive_heart_failure, …