From clinical science supported by data to data science supported by clinicians
We live in an era of a tremendous amount of information. Scientific research is particularly well suited by the possibilities offered by analyzing large sets of data. In the past, data has been locked up in individual data bases and were not openly shared or available. Over the last two decades access to data has been improved and more open sources for analyses are now available. With advancements in technology, including cloud computing, big data is now available to all researchers. Information gained from big data needs to be translated into knowledge. Acute and chronic disease is a complex process and often displays itself in a variety of phenotypes with different outcomes. Consequently, data has to be complex in order to identify subgroups, to define disease phenotypes, and precise treatment strategies.
I recently attended the AHA Scientific Sessions meeting the “Early Career Day” to learn more about the AHA – Precision Medicine Platform (AHA-PMP) to access and also upload my own data and use the provided workspace, which is especially great for teams. Additionally, to AHA-PMP other data portals were presented and explored in small groups. These open portals included cardiovascular, cerebrovascular, and diabetes research such as the Cardiovascular Disease Knowledge Portal (CVDKP; broadcvdi.org), cerebrovascularportal.org (CDKP) and the type2diabetesgenetics.org (T2DKP) portal.
The goal of these platforms is to accelerate analyses of the genetics of cardiovascular and cerebrovascular disease as well as diabetes. For example, the CVDKP is an open-access resource that facilitates the translation of genomic data into actionable knowledge for better understanding and treatment of cardiovascular disease. For example data in the CVDKP are from 4 large Consortia namely the Atrial Fibrillation Consortium (AFGen), the Global Lipids Genetics Consortium (GLGC), the Myocardial Infarction Genetics Consortium (MIGen), and the CARDIoGRAMPlusC4D Consortium. The CVDKP was built on the Knowledge Portal platform originally designed for the Type 2 Diabetes Knowledge Portal (type2diabetesgenetics.org), which was produced by the Accelerating Medicines Partnership In Type 2 Diabetes. It is part of the Knowledge Portal Network, which also includes the Cerebrovascular Disease Knowledge Portal (CDKP: cerebrovascularportal.org). Data in the CVDKP include GWAS data for CVD and other traits (anthropometric, glycemic, renal, and psychiatric traits), exome chip data, whole exome sequence data, disease-agnostic genomic resources and epigenomic data. Further, with evolving results from big data a paradigm shift in science has been recognized. While over the last few decades medicine has been mostly clinical science supported by data; now medicine is about to become data science supported by clinicians and artificial intelligence and machine learning (deep learning) plays an important role. This new frontier of data science, provides a greater opportunity especially for younger investigators to develop and drive their own projects.
However, despite the widely endorsement of sharing data and the availability of open sources and platforms, the rate that these data are accessed and utilized are still low. This is one reason AHA wants to promote these valuable resources further to advance our understanding in medicine and facilitate new therapies.
The perception that open source data are underutilized is supported by recent studies. A just published analysis showed that for example cardiometabolic study data from patient-level clinical trial data are less accessed than previously assumed. In this study by Vaduganathan et al. data were extracted from ClinicalStudyDataRequest.com, a large, multi-sponsor data-sharing platform hosting individual patient-level data from completed studies sponsored by 13 pharmaceutical companies. Over the last 4 years, the platform had data from 3374 clinical trials, of which 537 evaluated cardiometabolic therapeutics covering 74 therapies and 398 925 patients. Diabetes mellitus and hypertension were the most common study topics with a median follow up time of 79 months. As of May 2017, despite availability of data from more than 500 cardiometabolic trials in a multi-sponsor data-sharing platform, ClinicalStudyDataRequest.com, only 15% of these trials and 29% of phase 3 or 4 clinical trials have been accessed by investigators and almost all researchers were from academic centers in North America and Europe. Of note, only half of the proposals were funded, and most proposals were for secondary hypothesis-generating questions. To date, after a median of 19 months (9-32 months) only 3 peer-reviewed articles have been published.
Further, when analyzed if male and female researchers were requesting data access equally, the investigators found that only 15 % of female researchers accessed data while the majority, with 85%, were men.
In conclusion, during “Early Career Day” I learnt that available open sources for big data analysis are underutilized and researchers who access scientific data are predominately men. Data platforms provide a huge opportunity for researchers, and especially women, to generate hypotheses which may then lead to (further) funding.
Tanja Dudenbostel is an Internist, Hypertension Specialist within Cardiology at the University of Alabama at Birmingham where I divide my time as an Assistant Professor between clinical research and seeing patients in cardiology.