Opal Projects

Yannick Marcon

2024-09-18

Opal stores data and meta-data (dictionaries) in projects that are accessible through web services. See the Variables and Data documentation page that explains the data model of Opal. See also the Resources documentation for an alternate way of accessing data in a project.

The Opal R package exposes projects related functions:

Note that these functions do not create a R session on the server side: it is only accessing the content of the Opal server (permission checks apply).

Setup

Setup the connection with Opal:

library(opalr)
o <- opal.login("administrator", "password", url = "https://opal-demo.obiba.org")

Project

List the projects:

opal.projects(o)

Create a project, linked to a database (default or first one):

if (opal.project_exists(o, "dummy"))
  opal.project_delete(o, "dummy")  
opal.project_create(o, "dummy", database = TRUE)
opal.project(o, "dummy")

Backup and Restore

Backup a project and download the backup archive (encrypted):

opal.project_backup(o, 'CNSIM', '/home/administrator/backup/CNSIM')
opal.file_download(o, '/home/administrator/backup/CNSIM', '/tmp/CNSIM.zip', key = "12345abcdef")

Restore a project from an uploaded (and encrypted) archive:

opal.file_upload(o, '/tmp/CNSIM.zip', '/home/administrator')
opal.project_restore(o, 'dummy', '/home/administrator/CNSIM.zip', key = "12345abcdef")
# verify tables
opal.tables(o, "CNSIM")

Tables

In Opal there are two kinds of tables:

List the tables in a project, with their count of variables and entities:

opal.tables(o, "CNSIM", counts = TRUE)

The table object can be retrieved as follow:

opal.table(o, "CNSIM", "CNSIM1", counts = TRUE)

The existence of a table can be checked:

opal.table_exists(o, "CNSIM", "CNSIM1")

And more specifically, verify whether a table is a view or not:

opal.table_exists(o, "CNSIM", "CNSIM1", view = TRUE)

A table can be created, either as a raw table or a view. To create a view, specify which tables are referred:

# drop table if it exists
opal.table_delete(o, "CNSIM", "CNSIM123")
# then create a view, no variables
opal.table_create(o, "CNSIM", "CNSIM123", tables = c("CNSIM.CNSIM1", "CNSIM.CNSIM2", "CNSIM.CNSIM3"))

Dictionaries

List the variables of a table and get the details of the variable annotations (one column per variable attribute with namespace). This is a summary dictionary, as it includes the concatenated category properties:

opal.variables(o, "CNSIM", "CNSIM1")

It is also possible to get the full data dictionary of a table, as separate data frames of variables and categories. This is the recommended format for working with a data dictionary:

dico <- opal.table_dictionary_get(o, "CNSIM", "CNSIM1")
dico$variables
dico$categories

Here we modify the data dictionary by appending a derivation script to each of the variables:

dico$variables$script <- paste0("$('", dico$variables$name, "')")
dico$variables

Then we apply this derived variables dictionary to the view we have previously created and verify the counts of columns (variables) and rows (entities) in this table:

opal.table_dictionary_update(o, "CNSIM", "CNSIM123", variables = dico$variables, categories = dico$categories)
opal.table(o, "CNSIM", "CNSIM123", counts = TRUE)

Assign this view to a symbol in the R server, and get the summary statics:

opal.assign(o, "D", "CNSIM.CNSIM123")
opal.execute(o, "summary(D)")

Values

Get the values in a table for a specific Participant entity:

opal.valueset(o, "CNSIM", "CNSIM123", identifier = "1454")

Get all the values of a table in our local R session as a data.frame (tibble) object:

cnsim1 <- opal.table_get(o, "CNSIM", "CNSIM1")
cnsim2 <- opal.table_get(o, "CNSIM", "CNSIM2")
cnsim3 <- opal.table_get(o, "CNSIM", "CNSIM3")

Then do some alterations on this data.frame and save it back as a raw table:

# make sure IDs are unique
cnsim1$id <- paste0(cnsim1$id, "-1")
cnsim2$id <- paste0(cnsim2$id, "-2")
cnsim3$id <- paste0(cnsim3$id, "-3")
# bind tables
cnsim123 <- rbind(cnsim1, cnsim2, cnsim3)
# remove some columns
cnsim123$DIS_AMI <- NULL
cnsim123$DIS_CVA <- NULL
cnsim123$DIS_DIAB <- NULL
# save as a raw table
opal.table_save(o, cnsim123, "CNSIM", "CNSIM", overwrite = TRUE, force = TRUE)
opal.table(o, "CNSIM", "CNSIM", counts = TRUE)

Verify that this raw table resulting from the merge of the other tables as same values for a given Participant:

opal.valueset(o, "CNSIM", "CNSIM", identifier = "1454-1")

It is possible to truncate a table, i.e. delete ALL the values of a table (which must not be a view), without modifying the dictionary:

opal.table_truncate(o, "CNSIM", "CNSIM")
opal.table(o, "CNSIM", "CNSIM", counts = TRUE)

Annotations

Variables can be described by taxonomy terms.

List the taxonomies:

opal.taxonomies(o)

List the vocabularies of a taxonomy:

opal.vocabularies(o, taxonomy = "Mlstr_area")

List the terms of a vocabulary:

opal.terms(o, taxonomy = "Mlstr_area", vocabulary = "Lifestyle_behaviours")

To apply a taxonomy term to a table dictionary, use the following for batch annotation:

annotations <- tibble::tribble(
  ~variable, ~taxonomy, ~vocabulary, ~term,
  "LAB_TSC", "Mlstr_area", "Physical_measures", "Physical_characteristics",
  "LAB_TRIG", "Mlstr_area", "Physical_measures", "Physical_characteristics",
  "LAB_HDL", "Mlstr_area", "Physical_measures", "Physical_characteristics",
  "LAB_GLUC_ADJUSTED", "Mlstr_area", "Physical_measures", "Physical_characteristics"
)
opal.annotate(o, "CNSIM", "CNSIM123", annotations = annotations)

To list the variable annotations:

opal.annotations(o, "CNSIM", "CNSIM123")

Resources

Resources are an alternative way of accessing data or computation systems. In a project are stored references to resources, i.e. how to build a resource object in R and the permissions to use this resource.

To list the resource references:

opal.resources(o, "RSRC")

To create a reference to a resource (a compressed CSV file, stored in a Opal file system, authorized by a personal access token):

if (opal.resource_exists(o, "RSRC", "CNSIM4"))
  opal.resource_delete(o, "RSRC", "CNSIM4")
opal.resource_create(o, "RSRC", "CNSIM4", 
   url = "opal+https://opal-demo.obiba.org/ws/files/projects/RSRC/CNSIM3.zip", 
   format = "csv", secret = "EeTtQGIob6haio5bx6FUfVvIGkeZJfGq")
# verify the resource reference object
opal.resource(o, "RSRC", "CNSIM4")

From a resource reference, it is possible to build and get the resource object in the local R session:

opal.resource_get(o, "RSRC", "CNSIM4")

Depending on the nature of the resource, it may be possible to coerce it to a data.frame in the client side:

library(resourcer)
as.data.frame(opal.resource_get(o, "RSRC", "CNSIM4"))

The same operation can be done on the R server side:

# assign the resource object
opal.assign.resource(o, "rsrc", "RSRC.CNSIM4")
# coerce it to a data.frame
opal.assign.script(o, "D", quote(as.data.frame(rsrc)))
# get some summary statistics
opal.execute(o, "summary(as.factor(D$GENDER))")

Permissions

Permissions can be managed (list, add, delete) at different levels:

Teardown

Good practice is to free server resources by sending a logout request:

opal.logout(o)