hidden

Evaluating ML/AI Models in Clinical Research

The number of machine learning (ML) and artificial intelligence (AI) models published in clinical research is increasing yearly. Whether clinicians choose to dive deep into the mathematical and computer science underpinnings of these algorithms or simply want to be conscientious consumers of new and relevant research to their line of work, it is important to become familiar with reading literature in this field.

To that end, Quer et al. recently wrote a State-of-the-Art Review in the Journal of The American College of Cardiology detailing the research landscape for ML and AI within cardiology including concrete tips on how a non-ML expert can interpret these studies. At its core, ML is about prediction, and models are created to make accurate predictions on new or unseen data. Inspired by their work and incorporating many of their recommendations, below is a list of considerations for when you are critically evaluating an ML/AI model in clinical research:

  1. What question is addressed and what problem tackled? How important is it? Regardless of a model’s performance or the accuracy, its usefulness is determined by its clinical application. Everything must go back to the patient.
  2. How does the ML/AI model compare to traditional models for the given task? Many studies have shown little additional benefit when comparing ML/AI models to standard statistical approaches including logistic regression for clinical questions that have been extensively researched in the past with key predictors of the outcome of interest identified. The promise of ML/AI really exists in incorporating novel data sources and data structures, including time-series information and continuous input from wearable sensors, raw images and signals such as that from common studies including echos and ECGs, and harmonizing unique data types together.
  3. To which broad category does the model fall into? Most machine learning models fall into buckets of supervised learning algorithms, unsupervised learning algorithms, or reinforcement learning. Each approach is slightly different with a unique end product. Supervised learning algorithms learn patterns in the data that allow them to predict whether a specific observation falls within a specific class or category, for example determining if a photo is a cat or a dog. This requires data that is labeled for the algorithm to learn from, i.e. someone or something has provided data that is correctly tagged as a dog or cat. Unsupervised learning does not require observations with labels but instead combs through the observations to look for those that are similar to each other. Reinforcement learning a separate task in which an agent is trained to optimize choices made to attain a stated goal. All of these have been used clinically in recent literature.
  4. How were the data and labels generated? Garbage in = garbage out. Your model is only as good as the data it was trained on and the accuracy of the labels. It’s important to know where this information came from.
  5. Model training, validation/performance, generalizability. A common approach to training models is to split the data into a training set with unique observations left for the test set to validate the model. It is critical to train and test on different data with no overlap. Model performance is tracked with metrics similar to those used to evaluate clinical models, including sensitivity, specificity, positive predictive value, negative predictive value, and AUC, although the names associated with those measures may be different. Additional measures such as an F-score may be used. Arguably more important, however, is generalizability. This is how well the model performs in an entirely unique cohort, often from another center, although many of the currently published studies do not include this step.
  6. How clinically useful are these findings, and is the model interpretable? Basically, is the juice worth the squeeze? And can a human understand why the model made its conclusion? A common knock against deep learning neural networks for example is that although they are incredibly skilled at learning from data and making accurate predictions on new data, how they do so is a “black box,” although new ML/AI methods have started to account for this.
  7. How reproducible are the results? Did the authors share their code or dataset? If they used an EHR phenotype to generate their cohort, can you do the same thing at your institution?

These points are meant to summarize and add to some important aspects of this recently published article, but it is an excellent read and I encourage everyone to review it in its entirety.

Reference:

Quer, G., et al. (2021). “Machine Learning and the Future of Cardiovascular Care.” Journal of the American College of Cardiology 77(3): 300-313.

 

“The views, opinions and positions expressed within this blog are those of the author(s) alone and do not represent those of the American Heart Association. The accuracy, completeness and validity of any statements made within this article are not guaranteed. We accept no liability for any errors, omissions or representations. The copyright of this content belongs to the author and any liability with regards to infringement of intellectual property rights remains with them. The Early Career Voice blog is not intended to provide medical advice or treatment. Only your healthcare provider can provide that. The American Heart Association recommends that you consult your healthcare provider regarding your personal health matters. If you think you are having a heart attack, stroke or another emergency, please call 911 immediately.”

 

hidden

Setting Expectations for AI Models in Medicine

Artificial intelligence is a hot topic in every field, and these algorithms are being widely used in scientific research. Particularly in my field of genetics and genomics, machine learning methods are invaluable for gleaning insights from large amounts of highly dimensional data. But there are many things to consider before applying AI and ML in a clinical setting, when real people are on the other end of the predictive model. It is important to set expectations for what AI can and cannot accomplish and what is needed for a broad application of AI in medicine in the future. In the session “Hype or Hope? Artificial Intelligence and Machine Learning in Imaging”, presenters gave a great overview of the applications of AI, its limitations, and the advancements that are needed for a wide application of AI in medicine.

Dr. Geoffrey Rubin described many different scenarios in which AI can be deployed. Specifically, he talked about how AI can be used in predictive analytics to make test selection and imaging more efficient, in image reconstruction to reduce noise, in image segmentation to identify regions of interest and provide quantitative analysis, and in interpretation to derive unique characteristics that cannot be measured directly, identify abnormalities, and create reports. In addition, Dr. Tessa Cook explained in greater depth how AI can be used as clinical decision support to incorporate diverse data types and aid in proper test selection. Dr. Damini Dey also discussed how AI can improve diagnosis and prediction, characterize disease, and personalize therapy. Overall, it is important to determine where AI can provide the greatest value while introducing the least amount of risk.

However, there are many limitations to AI and ML models. First, as Dr. David Ouyang noted, because these models are trained by humans, they can only perform tasks that a human could theoretically do. AI just performs these tasks faster, more consistently, and at a larger scale. He noted that these models are not effective unless trained on broad underlying datasets, and that unless explicitly programmed, they do not accurately weight rare significant events. AI models can easily become uninterpretable black boxes, keeping experts from recognizing where they are failing. Dr. David Playford emphasized that due to these and other limitations, AI models are not yet clinically accurate in all areas.

There are many steps that must be taken before AI models can achieve wide use in clinical settings. Dr. Ouyang suggests standardized baselines and open access to measure advancements among tools. Dr. Cook implements a “trust and value” checklist to assess how each tool was trained and tested, as well as what it can and cannot do, before using it for clinical decision support. Dr. Playford advocates for randomized trials to establish proof-of-concept and compare outcomes to the current standard of care. Most importantly, steps must be taken to reduce bias in AI models, which can negatively impact the care of underrepresented populations. Multidisciplinary collaborative teams can ensure that the data aligns with the clinical question being tackled, diverse yet consistent training datasets are being used, and methods such as transfer learning are implemented to produce more accurate predictions on previously unseen datasets. While AI can be an important tool in clinical decision making, it is ultimately the responsibility of each physician to ensure that AI tools are serving their patients as effectively as possible.

“The views, opinions and positions expressed within this blog are those of the author(s) alone and do not represent those of the American Heart Association. The accuracy, completeness and validity of any statements made within this article are not guaranteed. We accept no liability for any errors, omissions or representations. The copyright of this content belongs to the author and any liability with regards to infringement of intellectual property rights remains with them. The Early Career Voice blog is not intended to provide medical advice or treatment. Only your healthcare provider can provide that. The American Heart Association recommends that you consult your healthcare provider regarding your personal health matters. If you think you are having a heart attack, stroke or another emergency, please call 911 immediately.”

hidden

Buzzword Alert! Artificial Intelligence – Just the Hype Man or a Genuine Showstopper?

Conversations of the utility and promise of machine learning (ML) and artificial intelligence (AI) permeate all fields of medicine, and cardiology is no exception. A quick search shows that 69 posters containing the keywords “machine learning” made it into AHA’s Scientific Sessions 2020. But is it for real? Will we really see a future in with ML/AI factors into all aspects of clinical care and in fact, re-write the script on how we care for patients?

Below is some of the discussion points and imperatives that stood out to me today from the “Hope or Hype? Artificial intelligence and Machine Learning in Imaging.” session at #AHA20 featuring thought leaders Drs. Marielle Scherrer-Crosbie, Alex Bratt, David Ouyang, Tessa Cook, Damini Dey, David Playford, and Geoffrey Rubin.

  1. While awe-inspiring in its ability to make inferences and predictions human beings often cannot themselves, we must be aware that ML/AI algorithms can recreate and reinforce the bias pre-existing in our society. We must fight this by knowing it is a possibility, screening for it, and training algorithms on datasets that are truly representative. As much of the political landscape and national conversation right now centers on structural racism and bias in America, it’s is prudent to understand how the models we create can perpetuate this.
  1. Separate low hanging fruit from the unrealistic (at the moment) and consider the unrealistic tasks in the realm of discovery science. A quick rule of thumb provided by Dr. Ouyang, summarizing the words of Dr. Andrew Ng, first determines if it is possible for humans to do a task relatively quickly. If it is, we can probably automate it with AI now or in the near future.

  1. Scrutinize our data. How much do we trust it? High-quality data for ML/AI means broad, accurate, and plentiful. We need robust training labels, as free from subjectively as possible.
  2. How open is our data for inspection? Fields in computer science are far further along than medicine in deploying and improving ML/AI models because of open data sets and shared code, allowing groups to verify, tinker, and re-create to move the needle forward. Medical AI has not been so forthcoming.

  1. As new technology is rapidly evolving and making it into the clinical space, we need to be responsible for mistakes. This means we need to assess not only our model performance before deploying but also the consequences of using the model in real life. This may require RCTs and to consider ML/AI algorithms like we consider new therapeutics.

  1. What we really want is the AI running in the background saying “Hey, this task was automated and is now solved for you. Proceed as you see fit.” Humans and machines are in this shared space. The more we can integrate ML/AI to help us with tasks we are already doing, the better our results will be.

So where does this leave us? Most in our field believe ML/AI will play an important role in our future. Ideally, we will do it in a way that will make sure human intelligence is always paired with artificial intelligence to create a product neither of the two could be alone, will ensure our algorithms are free of bias and openly shared to allow for continuous improvement.

 

“The views, opinions and positions expressed within this blog are those of the author(s) alone and do not represent those of the American Heart Association. The accuracy, completeness and validity of any statements made within this article are not guaranteed. We accept no liability for any errors, omissions or representations. The copyright of this content belongs to the author and any liability with regards to infringement of intellectual property rights remains with them. The Early Career Voice blog is not intended to provide medical advice or treatment. Only your healthcare provider can provide that. The American Heart Association recommends that you consult your healthcare provider regarding your personal health matters. If you think you are having a heart attack, stroke or another emergency, please call 911 immediately.”

hidden

Can artificial intelligence save our lives?

The role of artificial intelligence (AI) in our life is advancing rapidly and is making strides in the early detection of diseases. The consumer market is composed of wearable health devices that enables continuous ambulatory monitoring of vital signs during daily life (at rest or physical activity), or in a clinical environment with the advantage of minimizing interference with normal human activities1. These devices can record a wide spectrum of vital signs, including: heart rate and rhythm, blood pressure, respiratory rate, blood oxygen saturation, blood glucose, skin perspiration, body temperature, in addition to motion evaluation. However, there is a lot of controversies whether these health devices are reliable and secure tools for early detection of arrhythmia in the general population2.

Atrial fibrillation (afib) is the most common arrhythmia currently affecting over 5 million individuals in the US and it’s expected to reach almost 15 million people by 2050. Afib is associated with an increased risk of stroke, heart failure, mortality, and represents a growing economic burden3. Afib represents a diagnostic challenge, it is often asymptomatic and is often diagnosed when a stroke occurs. Afib represents also a long term challenge and often involves hospitalization for cardioversion, cardiac ablation, trans-esophageal echo, anti-arrhythmic treatment, and permanent pacemaker placement. However, if afib is detected, the risk of stroke can be reduced by 75% with proper medical management and treatment3.

Physicians need fast and accurate technologies to detect cardiac events and assess the efficacy of treatment. A reliable, convenient and cost-effective tool for non-invasive afib detection is desirable. Several studies assessed the efficacy and feasibility of wearable technologies in detecting arrhythmias. The Cleveland Clinic conducted a clinical research where 50 healthy volunteers were enrolled. They tested 5 different wearable heart rate monitors including: (Apple Watch, Garmin Forerunner, TomTom Spark Cardio, and a chest monitor) across different types and intensities of exercises (treadmill, stationary bike and elliptical). The study found that the chest strap monitor was the most accurate in tracking the heart rate across different types and intensities of exercises4.

Apple and Stanford’s Apple Heart Study enrolled more than 419,297 Apple Watch and iPhone owners. Among these users, 2,161 (roughly 0.5%) received a notification of an irregular pulse. Of those who received the notifications, only about 450 participants scheduled a telemedicine consultation and returned a BioTelemetry ECG monitoring patch. When the Apple Watch notification and ECG patch were compared simultaneously, researchers found 71% positive predictive value, and about 84% of the cases were experiencing Afib at the time of the alert. Additionally, 34% of participants whose initial notification prompted an ECG patch delivery were later diagnosed with Afib. This finding shows that Apple watch detected afib in about one-third of the cases which is “good” for a screening tool considering the “intermittent nature of afib and that it may not occur for a whole week” says Dr. Christopher Granger, a professor of medicine at Duke University who participated on the steering committee for the Apple Heart study5.

These studies are observational studies and are not outcome-driven. They are not randomized and are not placebo-controlled. There are potentials for false negatives, where the Apple watch fails to detect the afib and false-positive where it detects arrhythmia that does not exist. Unfortunately, patients who are false negative don’t consult the physician about their symptoms of palpitations and shortness of breath since it provides false security. While patients with false-positive are sent unnecessarily to the clinic that could lead to further unnecessary tests and anxiety for the patient.

Is the Apple Watch ready to be used as a default screening tool to monitor the heart rate and rhythm in the general population and by physicians with patients with or at high risk for Afib is still unclear and warrant further studies. In conclusion, physicians should be cautious when using data from consumer devices to treat and diagnose patients.

The views, opinions and positions expressed within this blog are those of the author(s) alone and do not represent those of the American Heart Association. The accuracy, completeness and validity of any statements made within this article are not guaranteed. We accept no liability for any errors, omissions or representations. The copyright of this content belongs to the author and any liability with regards to infringement of intellectual property rights remains with them. The Early Career Voice blog is not intended to provide medical advice or treatment. Only your healthcare provider can provide that. The American Heart Association recommends that you consult your healthcare provider regarding your personal health matters. If you think you are having a heart attack, stroke or another emergency, please call 911 immediately.

References:

  1. Cheung, Christopher C., Krahn, Andrew D., Andrade, Jason G. The Emerging Role of Wearable Technologies in Detection of Arrhythmia. Canadian Journal of Cardiology. 2018;34(8):1083-1087. doi:10.1016/j.cjca.2018.05.003
  2. Dias D, Paulo Silva Cunha J. Wearable Health Devices-Vital Sign Monitoring, Systems and Technologies. Sensors (Basel). 2018;18(8):2414. Published 2018 Jul 25. doi:10.3390/s18082414
  3. Chugh, S., Sumeet, Havmoeller, J., Rasmus, Narayanan, F., Kumar, et al. Worldwide Epidemiology of Atrial Fibrillation: A Global Burden of Disease 2010 Study. Circulation. 2014;129(8):837-847. doi:10.1161/CIRCULATIONAHA.113.005119
  4. Wrist-Worn Heart Rate Monitors Less Accurate Than Standard Chest Strap. Medical Design Technology. http://search.proquest.com/docview/1875621494/. Published March 9, 2017.
  5. Turakhia, Mintu P., Desai, Manisha, Hedlin, Haley, et al. Rationale and design of a large-scale, app-based study to identify cardiac arrhythmias using a smartwatch: The Apple Heart Study. American Heart Journal. 2019;207:66-75. doi:10.1016/j.ahj.2018.09.002

 

 

hidden

Artificial Intelligence in Cardiology: Opportunities for Cardio-Oncology

History was made recently with the inaugural and first ever continuing medical education conference on artificial intelligence (#AI) in Cardiology. While most of the presentations were on artificial intelligence or cardiology or both, several sessions also made reference to other fields in which AI has been or is being used, such as Oncology. There was even one study presented on Cardio-Oncology. As study after study was presented, it became clear to me that perhaps several of these techniques and methodologies could potentially be useful to our patients in Cardio-Oncology.

Every single piece of technology started with one single prototype. Every single new piece of software started with one single algorithm. Every single patent started with one single idea. Every single idea started with the impact that disruptive technology could have for at least one single patient – one single case.

As I view various case reports in Cardio-Oncology, I think about how #AI could influence care delivery to potentially improve outcomes and the experience for each patient and their health professionals.

One example that was reiterated in multiple presentations was that of the ECG. Applying #AI to the ECG has been shown in the studies presented to determine the age, sex, and heart condition of the individual. Details were shown for a case of hypertrophic cardiomyopathy (yes, HCM, not just left ventricular hypertrophy) diagnosed via #AI analysis of an ECG that appeared relatively unremarkable to physicians’ eyes. After the septal surgery/procedure, although the ECG then looked remarkably abnormal to physicians’ eyes, the #AI algorithm could identify resolution of the hypertrophic cardiomyopathy.

Another example reiterated throughout the conference was identifying undiagnosed left ventricular systolic dysfunction, in a general community population and also in patients referred to a cardio-oncology practice at a large referral center.

Recently, #AI in Cardiology has been used most frequently for monitoring and detection of arrhythmias, such as atrial fibrillation. Everyone can purchase their own wearable to determine this. Physicians are also now prescribing these wearables for ease-of-use, given their pervasive presence and coupling with smartphones owned by much of the population or provided temporarily by the physician group. Such wearables are transitioning from standalone electrodes, to watches, skin patches, and clothing (e.g., shirts, shorts).

Many direct-to-consumer #AI applications in daily life actually are not wearable, such as Alexa and Siri. One study described the ability of #AI to help diagnose mood disorders and cardiac conditions and risk factors by simply “listening to” and analyzing voice patterns. The timing of a young man’s “voice breaking” can potentially predict his risk for heart disease!

A popular use for #AI in medicine overall is to assist with interpretation of various imaging, such as chest X-rays, MRIs, or CT scans. This applies in Cardiology as well. Further, in Cardiology, #AI is being used to help guide the procurement of echocardiograms. The algorithms provide visual instructions (such as curved arrows) to indicate directions in which the ultrasound probe should be moved to obtain the standard view, to which the algorithm is comparing the image being procured moment-by-moment. The idea is for #AI to help less experienced sonographers or echocardiographers learn and perform echocardiography even more expediently.

The theme of the conference was current advances and future applications of #AI in Cardiology. Accordingly, a historical perspective was given, describing some of the earliest attempts at #AI in various fields. A video of a possible precursor to current automated vacuum cleaners was shown, from archives dating back to the 1960s. In addition to ways in which #AI is now being studied or applied, future opportunities for using #AI were also postulated, for example for coronary artery disease, since stress tests are not 100% sensitive and the gold standard coronary angiography is invasive. #AI could help stratify patients who needed versus did not need the invasive procedure for recurrent convincing symptoms in the absence of a positive stress test. Of course, coronary CT angiography could help fill this gap, but #AI might assist with decision-making sooner.

There have been studies on #AI in Cardiology, and studies on #AI in Oncology, and at least one study in #AI in Cardio-Oncology – a study I predicted; one that is quite intuitive and mentioned above. I propose that we continue to apply #AI in Cardio-Oncology, so that the field can catch up with the rest of Cardiology and Oncology, and help us continue to develop this emergent and burgeoning multidisciplinary subspecialty.

This is an exciting time for me to be alive. I am an early adopter of artificial intelligence. I look forward to seeing more and more the availability of #AI to enhance our use of electrocardiography, echocardiography, wearables, biosensors, voice analysis, and more in Cardiology, and particularly in Cardio-Oncology, with an emphasis on primary and primordial prevention even before secondary and tertiary prevention in the area of Preventive Cardio-Oncology, and especially in women.