Exploring the Impact of Natural Language Processing on Clinical Trials

Introduction

The pharmaceutical R&D sector is undergoing a significant transformation because of the integration of advanced technologies, notably Natural Language Processing (NLP), which is set to enhance the efficacy and precision of clinical trials. NLP, a facet of Artificial Intelligence (AI), is becoming an essential tool in the conduct and analysis of clinical trials, promising to redefine their future landscape.

Understanding Natural Language Processing (NLP)

NLP is a domain within AI that facilitates the interaction between computers and human languages. It enables machines to understand, interpret, and generate human language in ways that are meaningful and contextually appropriate. This technology holds tremendous potential for optimizing clinical trial processes, including protocol development and data analysis. For an in-depth understanding of NLP, refer to this comprehensive overview: [NLP Fundamentals].

Patient Recruitment

Patient recruitment is often cited as one of the most challenging aspects of clinical trials, influencing timelines, costs, and the overall success of research projects. NLP can significantly enhance the efficiency and effectiveness of patient recruitment by automating the identification and screening of potential participants. NLP technologies can analyze extensive databases of electronic health records (EHRs), clinical notes, and other sources of unstructured medical data to identify patients who meet specific trial criteria. This approach not only speeds up the recruitment process but also improves its accuracy by reducing human error in eligibility screening.

Furthermore, NLP can aid in patient engagement by crafting personalized communication strategies based on the analysis of individual patient data, such as communication preferences and historical health information. This tailored approach can increase the likelihood of enrollment and retention by addressing specific patient concerns and expectations. Additionally, NLP can analyze social media and online community data to identify potential recruitment channels and patient populations that are often underrepresented in clinical trials.

By streamlining these processes, NLP not only accelerates patient recruitment but also enhances the diversity and representativeness of clinical trial populations, which is crucial for the generalizability of the study outcomes. For a more detailed discussion on the impact of NLP on patient recruitment, consider reviewing: [Finding the right patients for the right treatment with AI]. Another great article on using AI to more effectively find, recruit, and enroll eligible patients in clinical trials can be found here.

Protocol Development

The development of clinical trial protocols is a critical phase where precision and adherence to scientific and regulatory standards are paramount. NLP can play a transformative role in this stage by automating and optimizing the creation and review of trial documents. NLP systems can analyze existing protocols and regulatory guidelines to ensure that new protocols meet required standards and are aligned with the latest research findings. This not only streamlines the development process but also enhances the quality and compliance of clinical trial protocols. Additionally, NLP can help identify potential risks or inconsistencies in trial designs by comparing protocols against a vast database of clinical trial outcomes and regulatory feedback, leading to more robust trial designs.

Data Collection

In the context of clinical trials, data collection is a foundational activity that determines the quality of the research outcomes. NLP can significantly enhance the data collection and overall clinical trial data management processes by enabling the extraction of relevant information from a variety of unstructured sources, such as patient interviews, open-ended survey responses, and clinical notes. This capability not only expands the breadth and depth of data collected but also improves the speed and accuracy of this collection. NLP tools can be configured to recognize and categorize specific types of clinical data, such as symptoms, diagnoses, and treatment responses, from diverse data streams, which can then be seamlessly integrated into the clinical trial’s data repository.

Data Validation

Ensuring the accuracy and reliability of clinical trial data is essential, and this is where NLP can provide substantial improvements. NLP techniques can be used to validate data by cross-verifying collected information across multiple sources, identifying discrepancies, and suggesting corrections. This process helps maintain the integrity of the data and reduces the likelihood of errors that could compromise study results. Furthermore, NLP can assist in real-time data monitoring, detecting anomalies or outliers that may indicate data entry errors or potential adverse events. This proactive validation approach helps maintain the rigor of clinical trials and supports the credibility of the findings.

Data Analysis

The analysis phase of clinical trials involves interpreting vast amounts of complex data to draw meaningful conclusions about the efficacy and safety of treatments. NLP is particularly valuable in this phase as it allows for the extraction of deeper insights from both structured and unstructured clinical trial data. Advanced NLP techniques, such as sentiment analysis and thematic clustering, can uncover nuanced patterns and trends that may not be apparent through traditional statistical methods. These insights can include patient sentiment towards treatments, the contextual significance of adverse effects, and more. By integrating these findings into the data analysis process, researchers can gain a more comprehensive understanding of the trial outcomes, leading to better-informed decisions regarding future research directions and therapy developments.

Facilitating Regulatory Compliance and Drug Safety Monitoring

Ensuring compliance with stringent regulatory requirements and monitoring drug safety are critical aspects of clinical trials. NLP can streamline these processes by analyzing relevant texts from trial protocols, regulatory documents, and adverse event reports to promptly identify potential issues. This proactive approach not only ensures compliance but also enhances patient safety. More information on NLP’s role in regulatory compliance and patient safety can be found at: [NLP to Improve Clinical Trials and Patient Safety].

Overcoming Challenges and Ethical Considerations

Despite its benefits, the application of NLP in clinical trials is not without challenges. Issues such as data privacy, algorithmic bias, and the need for transparency must be addressed to fully leverage NLP technologies ethically and effectively. Enhancing model interpretability and ensuring equitable AI use are paramount to gaining trust among all stakeholders.

Embracing a Collaborative Future

The successful integration of NLP in clinical trials requires collaboration across various stakeholders, including researchers, healthcare providers, technology experts, regulators, and patients. Collective efforts are essential to develop solutions that uphold safety, efficacy, and ethical integrity. This collaborative approach will drive medical innovation forward, leading to better health outcomes.

Conclusion

The potential of NLP to transform clinical trials is immense, offering significant improvements in trial efficiency, participant diversity, and data quality. By adopting NLP-powered solutions, the future of clinical trials looks promising, paving the way for more effective treatments and improved patient care. As this technology continues to evolve, its impact on clinical research will undoubtedly grow, marking a new era in healthcare innovation.

William Qubeck, VP, Consulting & Enterprise Hosting, Clinical Trial Analytics, Instem

Citations

NLP Foundations – https://www.ibm.com/cloud/learn/natural-language-processing

Using AI to More Effectively Find, Recruit, and Enroll Eligible Patients in Clinical Trials – https://www.bekhealth.com/blog/using-ai-to-more-effectively-find-recruit-and-enroll-eligible-patients-in-clinical-trials/

Finding the right patients for the right treatment with AI – https://www.linkedin.com/pulse/nlp-improve-clinical-trials-patient-safety-/

NLP to Improve Clinical Trials and Patient Safety – https://www.linkedin.com/pulse/nlp-improve-clinical-trials-patient-safety-/

The Value of Internal Data Reuse, Why Data is Better Than Oil. 

The value of clinical trial data reuse.

As everyone knows, data has been and will continue to be one of the biggest and fastest growing industries in the world. Data is everywhere. It’s the websites you visit, the food you eat, the people you talk to, the questions you google, the medications you take, and so on. And this data is one of the topics we talked about during the PHUSE working groups. Obviously, much of this clinical data feels extremely personal to people. There’s an immeasurable importance to data privacy, but what happens when data is anonymized? Suddenly its value has skyrocketed, especially in a clinical and healthcare setting. And this is what data reuse groups have been shouting at the top of their lungs for the past few years.  

According to data privacy laws, you can’t reuse data collected for clinical trials for any other purpose. This means that data is sitting on the shelf, not being used, even though there’s immense value in it. However, if we could liberate that data, take it down from the shelf and give it a purpose, potentially that data can be used to save countless lives, money, and time.

How can we reuse clinical trial data?

Data reuse can be used to create safer and less expensive clinical trials. People outside of the pharma-industry may not realize, but people have literally given their lives to be on a clinical trial. They may have died. They may have become unwell. The drug they were trialing may have failed, and we’re not going to reuse that data?? To me, that’s a tragedy. We should be taking that data, we should be combining it with other data and looking for value, for patterns, and for conclusions. There could be discoveries of new drugs in there. There could be interesting indications of trial safety or efficacies according to different compounds. In theory, you can take that data and combine it with all the other clinical trials and pull unprecedented value from it.

Mass anonymization of clinical data not only protects patient confidentiality, but it can potentially help uncover key scientific findings. Looking specifically at clinical trials for rare diseases; the very nature of rare diseases is that not many people are suffering from them!! So applications like reusing placebo groups from previous clinical trials enables us to get as many people with a rare disease on a clinical trial as possible. We can also pull and combine data from other ends of the pharma development spectrum. Data from preclinical and the FDA Adverse Event Reporting Service can be combined with clinical data to get a more complete picture of drug’s effectiveness. You can reuse patient data in multiple of ways, and we’re discussing even more ways in our clinical data reuse groups.

Why is data not the new oil?

A recurring theme in our data reuse group is that “data is the new oil”. I don’t think this is true because you can only use oil once, but with data, you can use it over and over and over again. Data is fantastic! If you look at the biggest companies in the world today, they are data companies.

Companies doing amazing things right now are looking at the data and making decisions using that data. There is immense value in this approach. For example, car manufacturing companies can use real world driver and crash data to improve the design of their cars. They become a data company which makes cars. Imagine a future where pharma companies behave more like a data company that makes life enhancing medicines. There is so much information and value in pharmaceutical datasets. We’ve got the cleanest data in the world because it’s coded, it’s clean, it’s well maintained, and it’s been collected by professionals.  It’s about time we start using it in more ways in order to save more lives. 

There’s an incredible opportunity sitting at our doorstep right now; data transparency and anonymization is the key to open that door. We are already holding that key; we know how to anonymize data. This means that we can use that data for countless other society-benefiting activities. If you need the key, call me and I’ll show you mine… and we can get you one cut.

Cathal Gallagher, Transparency Solution Owner, Instem

Unraveling the Differences Between Data Lakes and Clinical Trial Data Repositories

In the realm of healthcare and clinical research, effective data management is paramount for ensuring the success and reliability of clinical trials. Two pivotal components in this domain, Data Lakes and Clinical Trial Data Repositories, serve as key players in the quest for streamlined data processes and improved research outcomes. Understanding the distinctions between these two is crucial for organizations aiming to harness the full potential of their clinical trial data.

Data Lakes: The Versatile Reservoir of Healthcare Information

A Data Lake in the healthcare context serves as a centralized repository designed to store vast amounts of both structured and unstructured data. This includes a wide spectrum of information, such as electronic health records (EHRs), medical imaging, patient-generated data, and more. Data Lakes offer unparalleled flexibility and scalability, making them ideal for organizations dealing with diverse data types in the healthcare landscape.

Key Features of Healthcare Data Lakes:

  1. Versatility: Data Lakes can accommodate diverse healthcare data types, facilitating the integration of information from various sources within a healthcare organization.
  2. Scalability: As clinical trial data volumes grow, Data Lakes can scale horizontally to handle the increasing influx of patient data, research findings, and administrative records.
  3. Advanced Analytics: The flexibility of Data Lakes allows for the application of advanced analytics, artificial intelligence, and machine learning algorithms to derive actionable insights.

However, the expansive nature of Data Lakes demands careful consideration of data governance, security, and the potential for information silos.

Clinical Trial Data Repositories (CDRs): A Focused Approach to Trial Data Management

In contrast, a Clinical Trial Data Repository is specifically tailored to meet the unique demands of managing data generated during clinical trials. CDRs prioritize the integration and storage of clinical trial-specific information, including patient demographics, study protocols, case report forms (CRFs), adverse events, and outcomes.

Key Features of Clinical Trial Data Repositories:

  1. Study-Centric Focus: CDRs are centered around specific clinical trials, ensuring that all relevant data points, from patient recruitment to trial outcomes, are meticulously captured and stored.
  2. Compliance and Standardization: CDRs adhere to industry standards and regulatory requirements, ensuring that clinical trial data is managed in a manner compliant with the highest ethical and quality standards.
  3. Data Traceability: CDRs often provide robust data traceability, allowing researchers and regulatory authorities to track and audit every step of the data lifecycle within a clinical trial.

While CDRs excel at providing a consolidated view of data specific to clinical trials, they may lack the versatility needed to handle the diverse data sources and analytical capabilities offered by Data Lakes.

Choosing the Right Path: Integrating Data Lakes and CDRs

The decision between a Data Lake and a Clinical Trial Data Repository ultimately depends on the specific needs and objectives of a research organization. Both technologies can complement each other within a comprehensive clinical data management strategy.

  • Data Lakes for Versatile Analytics: If the goal is to harness the power of diverse data for advanced analytics, research, and broader healthcare insights, a Data Lake may be the preferred choice.
  • CTDRs for Trial-Centric Precision: When the primary focus is on managing and ensuring the integrity of clinical trial data, a Clinical Trial Data Repository becomes a vital tool for precision and compliance.
  • Strategic Integration: Research organizations can strategically integrate Data Lakes and CDRs to create a seamless data ecosystem that addresses both the analytical and trial-specific aspects of clinical research.

In conclusion, successful management of clinical trial data requires a thoughtful approach that aligns with the unique demands of healthcare research. By carefully considering the strengths and limitations of both Data Lakes and Clinical Trial Data Repositories, organizations can optimize their data strategies, enhance research efficiency, and contribute to the advancement of medical knowledge.

Andrew Ratcliffe, Aspire Solution Owner, Clinical Trial Analytics, Instem

Is your clinical trial data safe or is it at risk from a ransomware attack? 

You may be surprised to learn that ransomware attacks on pharmaceutical companies have occurred; most notably Merck & Co. (2017), Lupin (2020), and Dr. Reddy’s Laboratories (2020) have all reported incidents.  Your organization may also be vulnerable even if there is a backup/restore and disaster recovery (DR) solution in place.  Here are the four ways you can protect your clinical trial data.

  • Defense in depth strategy: Having multiple layers of protection is crucial to mitigate the risk of an attack. While backups and DR solutions are important components, they should be complemented by additional security measures.
  • Rapid detection and response: Ransomware attacks can spread quickly and encrypt data. Backup solutions are typically designed to create periodic snapshots or copies of data, which may not capture the most recent changes. In contrast, a dedicated ransomware protection solution can actively monitor for malicious activities, detect ransomware early, and enable prompt response to prevent or minimize data loss.
  • Data integrity and non-repudiation: While backup solutions can help restore data, they may not guarantee the integrity or authenticity of the restored data. A separate ransomware protection solution can provide features like cryptographic hashing to ensure the integrity of data backups and help verify that the restored data has not been tampered with during the recovery process.
  • Prevention of lateral movement: Ransomware attacks often involve spreads from one system to another within the network. A dedicated ransomware protection solution can include measures like network segmentation, endpoint isolation, or user behavior analytics to prevent the lateral spread of the attack and contain the impact.

While backup solutions and disaster recovery plans are essential components of a comprehensive computer system strategy, a separate ransomware protection solution offers capabilities designed to prevent, detect, and respond to ransomware attacks more effectively.  Implementing multiple layers of security significantly enhances an organization’s resilience against evolving cyber threats and helps safeguard critical data and systems.  Reach out today if you want to learn more how you can protect your invaluable data.

B.Qubeck, VP, Consulting & Enterprise Hosting, Instem

References

Merck & Co. (2017): https://www.merck.com/news/merck-confirms-ransomware-attack/

Lupin (2020): https://www.bleepingcomputer.com/news/security/indian-pharma-giant-lupin-hit-by-netwalker-ransomware/

Dr. Reddy’s Laboratories (2020): https://www.reuters.com/article/us-drreddys-cyber/dr-reddys-laboratories-says-i-t-security-infra-hit-by-cyberattack-idUSKBN26S0F9

Navigating the Public Cloud Adoption: Top 3 Mistakes When Transitioning from Private Cloud

In the previous blog we gave a preview of the cloud as a foundation transformation for our future in clinical research. Before we think about what it can provide, let’s consider common mistakes you might encounter when working through this transition.

Cloud adoption has become the cornerstone for many industries, with the world of life sciences trailing behind. The benefits? Enhanced collaboration, improved data accessibility, scalability and streamlined operations. While the benefits are all good, it’s not as easy as a plug-and-play. It comes with its own sets of challenges and pitfalls. I’ve discussed these challenges with many of the Instem cloud experts to identify the top 3 mistakes when migrating from private cloud to public cloud within our highly regulated industry.

Mistake 1: Forgetting About Rules and Safety

In life sciences, we have important rules about how we handle data (like patient information) to keep it safe. Sometimes, when moving to the cloud, people forget these rules. It’s crucial to confirm that the cloud (and managed services!) company you are working with understands and follows the rules our regulated industry requires and does not take for granted that they are meeting your needs.

It’s like forgetting to lock a door: without strong security (like passwords and locks), someone could sneak in and take important information. Always making sure data is protected is a must.

Mistake 2: Plans are Worthless, but Planning is Essential

Moving your infrastructure and information (which is your crown jewels) to the public cloud without a plan is like waking up one morning and deciding you are going to pack up and move to a new house the same day.  If there’s no plan, things can get messy, and you can put you, your data and key information at risk. It’s essential to consider thinks like:

  • How much transformation am I ready to take on to leverage cloud capabilities vs lift and shift?
  • Who are my stakeholders and how can I leverage the cloud for their needs?
  • What is the sequence of migration that makes sense?  How much will processes change?
  • What is my backup and disaster recovery strategy?

These are just a small subset of the myriad of questions you need to ask and answer as you develop a robust plan for adopting and migrating to a new world.

Mistake 3: Forgetting About How Much Space We Need

Sometimes, we don’t think about how much space we’ll need for all our information and what that space might look like. To use the house analogy again, it’s like moving from a large house to one with half the space but having the same amount of stuff. Making the right choice as to how much storage you need, what storage needs to be high performing vs archive, and how easy it is to move between different options is critical to implementing your infrastructure within the cloud.  In addition, don’t overlook scalability, especially in handling increasing data demands! Storage and scalability must be balanced with cost to find a fit that meets users’ needs, while not breaking the bank for your organization.

Successful cloud adoption requires a strategic approach, careful planning, and a deep understanding of the industry’s ever-changing demands. Prioritizing regulatory compliance, security measures, and scalability planning are a must when evaluating public cloud migration in the life sciences industry. The journey to the public cloud is a significant leap forward, driving innovation, collaboration, and success in our industry.

Until next time, Michelle Chen

Clinical Computing…Back to the Future or Back to the Past with our Technology Journey

As I sat down to think about the technology journey I have taken within our industry and contrast it to the technology advancements outside our industry the movie Back to the Future popped into my head and I thought to myself “Wow we really haven’t really left the 80s when this movie was made as it relates to the use of technology”.

As a graduate student in statistics, my first job in the industry in the early 90s was an internship in the biometrics group at Burroughs Wellcome (bonus points if you remember that company!) where I was immediately thrown into programming tables and listing for a clinical trial. My first memory was about learning what “data null” was in SAS code and being confused as to why people cared more about the presentation of the data then the data itself. That started me on the path of pushing and challenging the way we do things and the technology we use to do it. Almost thirty years later I feel we haven’t really made much progress and my responsibility strength makes me feel like I have let our industry down.

In the mid 80’s new technology included things such as the first Apple computer, broad use of the VCR, and the Sony Walkman (used by Marty McFly in Back to the Future!). Today, we have the MacBook Pro, tap our credit cards for payment, and stream all our video and audio through Netflix or Spotify. Yet, in our industry, we still use technology that is decades whole. You know a technology we still use today? The SAS Transport format which was a ‘new’ technology in 1985…Seriously?

When we throw around the world ‘technology’ what are we referring to and why is this technology so important for us to think about if we are going to really transform what we do. Below is a summary of what we believe are the key areas we need to reimagine and will talk in more detail on future blogs about each.

Data Paradigm: The paradigm we use for data today is fundamentally broken. Years ago, I heard a quote by Florence Barkats, an RWE expert

In the real world, we want to capture the complexity and diversity of the standard of care without forcing it into an artificial structure. The current framework forces researchers to constrain data to a structure that is not intuitive and limits the value of the data.”

We have forced data into a structure that leads to us losing the data’s true context and diminishing its value. By shoving it to two dimensional structures and not having metadata that defines the relationships between the data, we lose it’s meaning. Today’s technology provides us the capabilities to link our information, but we have to be willing to shift this paradigm…more to come.

Cloud Adoption: The word cloud gets thrown into a conversation frequently these days with our clients but in most cases I don’t believe the person using the word really understands the value. The use of public cloud has been standard practice for over a decade in other industries, but our industry treats it as a novel concept and doesn’t fully grasp the value it can provide. We’ll chat about the value of the cloud in a follow-on blog and what it can provide you within our clinical data lifecycle.

Automation: I love reviewing client proposal requests and seeing the requirement “We must automate” with no context of what and how they would like to tackle this problem. We have also seen the AI and ML buzzwords floating around and how ChatGPT is changing the way we think about gathering information. I have used ChatGPT to help me write code that I couldn’t remember, write code I never knew how to, or even generate a job description for a role I was trying to hire. We’ll have several follow-on discussions about these topics and our opinion on what is fantasy vs reality in the use of these capabilities in our clinical computing world.

At the end of the day, we must start expanding our minds and say “What if our analyses were automatically generated based on the data collected? What if we could ask ChatGPT to tell us what to create from our protocol? What if the context of our data was part of the data giving it exponentially more value?” Once we start thinking this way, we can start acting like Doc in Back to Future “Roads? Where we’re going, we don’t need roads.”

Chris Decker (aka ‘the disrupter’)