Overview
Our package ctrdata
is continually developed and offered for the R system since 2015. It facilitates investigating and understanding trends in design and conduct of trials, their availability for participants and using their protocols and results for research and meta-analyses. The package can be used with information and documents available in the
- EU Clinical Trials Register (“EUCTR”, https://www.clinicaltrialsregister.eu/)
- EU Clinical Trials Information System (“CTIS”, https://euclinicaltrials.eu/)
- ClinicalTrials.gov (“CTGOV” classic and “CTGOV2” since 2023)
- ISRCTN Registry (https://www.isrctn.com/)
Its features include,
- Protocol- and results-related trial information is easily downloaded, including any trial documents available in registers.
- Information is stored as
JSON
in a document-centric database (DuckDB
,PostgreSQL
,RSQLite
orMongoDB
), for fast offline access. - Find active substance synonyms, identify unique (de-duplicated) records across registers, merge and recode fields, easily access deeply-nested fields
When using such data, the registers’ terms and conditions need to be respected and are shown with ctrOpenSearchPagesInBrowser(copyright = TRUE). When using package ctrdata
, please cite it in any publication as: “Ralf Herold (2024). ctrdata: Retrieve and Analyze Clinical Trials in Public Registers. R package version 1.17.2, https://cran.r-project.org/package=ctrdata”. Package ctrdata
has been used for unpublished work and for:
- Lasch et al. (2022) The Impact of COVID‐19 on the Initiation of Clinical Trials in Europe and the United States. https://doi.org/10.1002/cpt.2534
- Blogging on Innovation coming to paediatric research
- Cancer Research UK (2017) The impact of collaboration: The value of UK medical research to EU science and health
Download
Package ctrdata
is on CRAN
: https://cran.r-project.org/package=ctrdata Within R, package ctrdata can be installed with: install.packages("ctrdata")
Documentation
Start here to find the Reference documentation and several Articles with detailed trial analysis: https://rfhb.github.io/ctrdata/
Support
The preferred way to flag issues is via https://github.com/rfhb/ctrdata/issues. The author can also be contacted through the comments form at the bottom of this page.
Example workflow
Code example
This covers how to obtain information of trials of interest from all supported registers, for plotting their start and completion over time. For more sophisticated examples, see the Articles under Documentation above.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 | # Load our package ctrdata from the library # Load our package ctrdata from the library library(ctrdata) # Connect to (or newly create) an SQLite database # - See help(nodbi) for how to store in the file # system and for connecting with other databases db <- nodbi::src_sqlite( collection = "some_collection_name" ) # Retrieve trials from public register: ctrLoadQueryIntoDb( queryterm = "https://www.clinicaltrialsregister.eu/ctr-search/search?query=neuroblastoma&phase=phase-three", euctrresults = TRUE, con = db ) # Retrieve trials from another register: ctrLoadQueryIntoDb( queryterm = "https://euclinicaltrials.eu/app/#/search?basicSearchInputAND=neuroblastoma", con = db ) # Retrieve trials from another register: ctrLoadQueryIntoDb( queryterm = "https://www.clinicaltrials.gov/search?cond=Neuroblastoma&aggFilters=ages:child,phase:3,studyType:int", con = db ) # Retrieve trials from another register: ctrLoadQueryIntoDb( queryterm = "https://classic.clinicaltrials.gov/ct2/results?cond=neuroblastoma&intr=Drug&recrs=e&age=0&phase=2", con = db ) # Retrieve trials from another register: ctrLoadQueryIntoDb( queryterm = "https://www.isrctn.com/search?q=neuroblastoma", con = db ) # Names of all fields / variables in the collection: length(dbFindFields(".*", con = db, sample = FALSE)) # Finding fields in database collection (may take some time) . . . . . # Field names cached for this session. # [1] 3953 dbFindFields("(start.*date)|(date.*decision)", con = db) # Using cache of fields. # - Get trial data result <- dbGetFieldsIntoDf( fields = c( "ctrname", "record_last_import", # CTGOV "start_date", "overall_status", # CTGOV2 "protocolSection.statusModule.startDateStruct.date", "protocolSection.statusModule.overallStatus", # EUCTR "n_date_of_competent_authority_decision", "trialInformation.recruitmentStartDate", # needs above: 'euctrresults = TRUE' "p_end_of_trial_status", # ISRCTN "trialDesign.overallStartDate", "trialDesign.overallEndDate", # CTIS "authorizedPartI.trialDetails.trialInformation.trialDuration.estimatedRecruitmentStartDate", "ctStatus" ), con = db ) # use helper packages for plotting library(dplyr) library(tidyr) library(ggplot2) # - Deduplicate trials and obtain unique identifiers # for trials that have records in several registers # - Calculate trial start date # - Calculate simple status for ISRCTN # - Update end of trial status for EUCTR result %<>% filter(`_id` %in% dbFindIdsUniqueTrials(preferregister = c("CTGOV", "CTGOV2", "EUCTR"), con = db)) %>% rowwise() %>% mutate(start = max(c_across(matches("(date.*decision)|(start.*date)")), na.rm = TRUE)) %>% mutate(isrctnStatus = if_else(trialDesign.overallEndDate < record_last_import, "Ongoing", "Completed")) %>% mutate(p_end_of_trial_status = if_else( is.na(p_end_of_trial_status) & !is.na(n_date_of_competent_authority_decision), "Ongoing", p_end_of_trial_status)) %>% ungroup() # - Merge fields from different registers with re-leveling statusValues <- list( "ongoing" = c( # EUCTR "Recruiting", "Active", "Ongoing", "Temporarily Halted", "Restarted", # CTGOV "Active, not recruiting", "Enrolling by invitation", "Not yet recruiting", "ACTIVE_NOT_RECRUITING", "RECRUITING", # CTIS "Ongoing, recruiting", "Ongoing, recruitment ended", "Ongoing, not yet recruiting", "Authorised, not started" ), "completed" = c( "Completed", "COMPLETED", "Ended"), "other" = c( "GB - no longer in EU/EEA", "Trial now transitioned", "Withdrawn", "Suspended", "No longer available", "Terminated", "TERMINATED", "UNKNOWN", "Prematurely Ended", "Under evaluation") ) result[["state"]] <- dfMergeVariablesRelevel( df = result, colnames = c( "overall_status", "p_end_of_trial_status", "protocolSection.statusModule.overallStatus", "ctStatus", "isrctnStatus" ), levelslist = statusValues ) # - Plot example ggplot(result) + stat_ecdf(aes(x = start, colour = state)) + labs( title = "Evolution over time of neuroblastoma phase 3 trials", subtitle = "Data from EUCTR, CTIS, ISRCTN, CTGOV, CTGOV2", x = "Date of start (proposed or realised)", y = "Cumulative proportion of trials", colour = "Current status", caption = Sys.Date() ) |
Graphing the start and completion of trials since 2000
This example shows the plot resulting from the script above.
Major milestones of ctrdata
- By November 2023, package
ctrdata
was freed from all dependencies on command line tools that so far had to be installed in the operating system. - The European Union's new Clinical Trials Information System (CTIS) is supported since March 2023 by package ctrdata.
- Since 2019, package ctrdata provides support for several databases through package
nodbi
(now maintained by the same author). - Various refactoring efforts, such as to factor out more functions, make downloading more robust, handle nested information, generate data frames.
- Results from the EU Clinical Trials Register are supported and imported since 2017 (included from ClinicalTrials.Gov since the beginning).
- On 15 September 2015, a first version of package
ctrdata
was published on CRAN.
Data model used with ctrdata
Package ctrdata
uses the data models that are implicit in data retrieved from the different registers. The approach is further explained here, together with the reasons for this choice. A possible future development is to provide a mapping to a canonical data model, which however does not exist at the moment and will require an international approach and alliance.
Here is the model of data from CTIS for a given trial, as an example of how ctrdata
downloads, transforms and stores information from registers: