Conditioned data frames, or cnd_df
, are a powerful tool
in the {sdtm.oak}
package designed to facilitate
conditional transformations on data frames. This article explains how to
create and use conditioned data frames, particularly in the context of
SDTM domain derivations.
A conditioned data frame is a regular data frame extended with a
logical vector cnd
that marks rows for subsequent
conditional transformations. The condition_add()
function
is used to create these conditioned data frames.
Consider a simple data frame df
:
## # A tibble: 3 × 2
## x y
## <int> <chr>
## 1 1 a
## 2 2 b
## 3 3 c
We can create a conditioned data frame where only rows where
x > 1
are marked:
## # A tibble: 3 × 2
## # Cond. tbl: 2/1/0
## x y
## <int> <chr>
## 1 F 1 a
## 2 T 2 b
## 3 T 3 c
Here, only the second and third rows are marked as
TRUE
.
The real power of conditioned data frames manifests when they are
used with functions such as assign_no_ct
,
assign_ct
, hardcode_no_ct
, and
hardcode_ct
. These functions perform derivations only for
the records that match the pattern of TRUE
values in
conditioned data frames.
Consider a simplified dataset of concomitant medications, where we
want to derive a new variable CMGRPID (Concomitant Medication Group ID)
based on the condition that the medication treatment (CMTRT) is
"BENADRYL"
.
Here is a simplified raw Concomitant Medications data set
(cm_raw
):
cm_raw <- tibble::tibble(
oak_id = seq_len(14L),
raw_source = "ConMed",
patient_number = c(375L, 375L, 376L, 377L, 377L, 377L, 377L, 378L, 378L, 378L, 378L, 379L, 379L, 379L),
MDNUM = c(1L, 2L, 1L, 1L, 2L, 3L, 5L, 4L, 1L, 2L, 3L, 1L, 2L, 3L),
MDRAW = c(
"BABY ASPIRIN", "CORTISPORIN", "ASPIRIN",
"DIPHENHYDRAMINE HCL", "PARCETEMOL", "VOMIKIND",
"ZENFLOX OZ", "AMITRYPTYLINE", "BENADRYL",
"DIPHENHYDRAMINE HYDROCHLORIDE", "TETRACYCLINE",
"BENADRYL", "SOMINEX", "ZQUILL"
)
)
cm_raw
## # A tibble: 14 × 5
## oak_id raw_source patient_number MDNUM MDRAW
## <int> <chr> <int> <int> <chr>
## 1 1 ConMed 375 1 BABY ASPIRIN
## 2 2 ConMed 375 2 CORTISPORIN
## 3 3 ConMed 376 1 ASPIRIN
## 4 4 ConMed 377 1 DIPHENHYDRAMINE HCL
## 5 5 ConMed 377 2 PARCETEMOL
## 6 6 ConMed 377 3 VOMIKIND
## 7 7 ConMed 377 5 ZENFLOX OZ
## 8 8 ConMed 378 4 AMITRYPTYLINE
## 9 9 ConMed 378 1 BENADRYL
## 10 10 ConMed 378 2 DIPHENHYDRAMINE HYDROCHLORIDE
## 11 11 ConMed 378 3 TETRACYCLINE
## 12 12 ConMed 379 1 BENADRYL
## 13 13 ConMed 379 2 SOMINEX
## 14 14 ConMed 379 3 ZQUILL
To derive the CMTRT
variable we use the
assign_no_ct()
function to map the MDRAW
variable to the CMTRT
variable:
## # A tibble: 14 × 4
## oak_id raw_source patient_number CMTRT
## <int> <chr> <int> <chr>
## 1 1 ConMed 375 BABY ASPIRIN
## 2 2 ConMed 375 CORTISPORIN
## 3 3 ConMed 376 ASPIRIN
## 4 4 ConMed 377 DIPHENHYDRAMINE HCL
## 5 5 ConMed 377 PARCETEMOL
## 6 6 ConMed 377 VOMIKIND
## 7 7 ConMed 377 ZENFLOX OZ
## 8 8 ConMed 378 AMITRYPTYLINE
## 9 9 ConMed 378 BENADRYL
## 10 10 ConMed 378 DIPHENHYDRAMINE HYDROCHLORIDE
## 11 11 ConMed 378 TETRACYCLINE
## 12 12 ConMed 379 BENADRYL
## 13 13 ConMed 379 SOMINEX
## 14 14 ConMed 379 ZQUILL
Then we create a conditioned data frame from the target data set
(tgt_dat
), meaning we create a conditioned data frame where
only rows with CMTRT
equal to "BENADRYL"
are
marked:
## # A tibble: 14 × 4
## # Cond. tbl: 2/12/0
## oak_id raw_source patient_number CMTRT
## <int> <chr> <int> <chr>
## 1 F 1 ConMed 375 BABY ASPIRIN
## 2 F 2 ConMed 375 CORTISPORIN
## 3 F 3 ConMed 376 ASPIRIN
## 4 F 4 ConMed 377 DIPHENHYDRAMINE HCL
## 5 F 5 ConMed 377 PARCETEMOL
## 6 F 6 ConMed 377 VOMIKIND
## 7 F 7 ConMed 377 ZENFLOX OZ
## 8 F 8 ConMed 378 AMITRYPTYLINE
## 9 T 9 ConMed 378 BENADRYL
## 10 F 10 ConMed 378 DIPHENHYDRAMINE HYDROCHLORIDE
## 11 F 11 ConMed 378 TETRACYCLINE
## 12 T 12 ConMed 379 BENADRYL
## 13 F 13 ConMed 379 SOMINEX
## 14 F 14 ConMed 379 ZQUILL
Finally, we derive the CMGRPID
variable conditionally.
Using assign_no_ct()
, we derive CMGRPID
which
indicates the group ID for the medication, based on the conditioned
target data set:
derived_tgt_dat <- assign_no_ct(
tgt_dat = cnd_tgt_dat,
tgt_var = "CMGRPID",
raw_dat = cm_raw,
raw_var = "MDNUM"
)
derived_tgt_dat
## # A tibble: 14 × 5
## oak_id raw_source patient_number CMTRT CMGRPID
## <int> <chr> <int> <chr> <int>
## 1 1 ConMed 375 BABY ASPIRIN NA
## 2 2 ConMed 375 CORTISPORIN NA
## 3 3 ConMed 376 ASPIRIN NA
## 4 4 ConMed 377 DIPHENHYDRAMINE HCL NA
## 5 5 ConMed 377 PARCETEMOL NA
## 6 6 ConMed 377 VOMIKIND NA
## 7 7 ConMed 377 ZENFLOX OZ NA
## 8 8 ConMed 378 AMITRYPTYLINE NA
## 9 9 ConMed 378 BENADRYL 1
## 10 10 ConMed 378 DIPHENHYDRAMINE HYDROCHLORIDE NA
## 11 11 ConMed 378 TETRACYCLINE NA
## 12 12 ConMed 379 BENADRYL 1
## 13 13 ConMed 379 SOMINEX NA
## 14 14 ConMed 379 ZQUILL NA
Conditioned data frames in the {sdtm.oak}
package
provide a flexible way to perform conditional transformations on data
sets. By marking specific rows for transformation, users can efficiently
derive SDTM variables, ensuring that only relevant records are
processed.