Dr. Prakriteswar Santikary, chief data officer at ERT, spoke last month at Data Architecture Summit 2018 Conference about the data lake architecture his team developed at their clinical research organization. He discussed the data platform deployed in the cloud to streamline data collection, aggregation, and clinical reporting and analytics, using concepts like serverless computing and data services.
Santikary talked about the market dynamics and challenges in the clinical trial industry. The stakes are higher than ever in clinical research, and the clinical development marketplace has become more competitive, with stricter regulatory standards and greater emphasis on trial oversight and patient safety. At the same time, the costs and time needed to commercialize a new drug are escalating – often exceeding eight years and over $2 billion. A big factor in these staggering rates is the increasing complexity of clinical trials, driven in large part by study designs needing more endpoints to demonstrate product value.
Clinical trial sponsors are also incorporating data from multiple, disparate sources including genomic sequencing, medical imaging, labs, wearables, and other mobile health (mHealth) devices. They also integrate operational, financial and real-world evidence data with endpoint data to get the most out of their R&D investments, and bring life-saving drugs and therapies to the market faster.
Other current trends impacting clinical research include the following:
- Precision medicine
- Virtual trials
- Patient centricity
InfoQ spoke with Santikary to learn more about the master data management (MDM), the data lake solutions they developed, and the emerging trends in data architecture and technologies in the clinical and healthcare industry.
InfoQ: Can you discuss some of the data architecture challenges in the clinical trial industry?
Dr. Prakriteswar Santikary: Given the market dynamics and data integration challenges, the data architecture challenges in clinical research are significant. At ERT, our modern data platform is architected to meet the following challenges:
Data security, privacy and protection at scale, Data integration at scale, Real time reporting and analytics at scale, and Data governance and master data management at scale.
InfoQ: What are the challenges in implementing a data lake architecture in healthcare systems?
Santikary: The primary challenge in implementing a data lake architecture in healthcare has to do with making sure the data platform is architected with data security, privacy and protection in mind while enabling real time data transmission, collection, ingestion and integration at scale. Not to mention, challenges in dealing with unstructured and binary data in the data lake cannot be underestimated. From the data lake architecture perspective, supporting both batch and near time data integration and business intelligence are a real practical challenge. Making integrated data available to all constituents in a self-service manner is another big challenge.
InfoQ: How did your team architect the master data management in your new data platform?
Santikary: Master data management (MDM) is a key architectural component of our overall modern data architecture foundation. Our enterprise data lake is a consumer to our MDM platform, which collects all master entities from all of our operational and transactional systems, and masters them in real time using sophisticated matching and merging algorithms, metadata management and semantic matching. We also have data stewards who oversee manual merge and data quality and own data from business ownership and accountability perspective. We have created a data governance council that is cross-functional in nature – this council draws data expertise from across the organization, not just from R&D. MDM is a strategic initiative within our company, as is our enterprise data lake. This data lake platform serves up all business intelligence reporting, analytics and AI across the entire company across all business lines, enabling us to create smart data products and opening up new revenue channels for the company.
InfoQ: What are the emerging trends in data architecture and data technologies in the clinical and healthcare industry?
Santikary: The rate of change in clinical research and healthcare technology is unprecedented, as new innovations and discoveries are fomenting advances at an alarmingly-rapid pace. The following technologies are making a huge difference in clinical research:
Artificial Intelligence, machine learning and deep learning. We see the use of AI continuing to expand, especially in the following categories:
Making Clinical Trials Intelligent, Optimizing Patient Recruitment / Retention, and Gaining Greater Insight for Smarter Decision Making.
Blockchain: Blockchain technology enables practitioners to share their data without fear of compromising data security, as the blockchain data is immutable and any changes made to the data can be tracked.
Cloud Computing and Big data: Advances in data analytics and visualization are enabling clinical researchers to explore and interact with large-scale, often aggregated bodies of data. For patients in a clinical trial, the potential to capture nearly unlimited data about their mood or daily food intake during the study by snapping a quick picture of each meal changes the landscape of data analysis for clinical trials.