[UCSF] Information Extraction from Unstructured Clinical Notes Using Natural Language Processing (NLP)

Discovering Inaccurate Extracted Information from Unstructured Clinical Notes Using Natural Language Processing

Team: Lengning Wei (BIOE), Marcel Schaack (BIOE), Chloe Kim (BIOE)

Advisors: Gundolf Schenk (UCSF), Gabriel Gomes (ME)

Despite the development of medical records systems, there is no precise technology for large-scale information extraction, public health research and precision medicine. We designed a reliable NLP-enabled system that extracts the medical history from health records and provides users with the ability to give targeted user-input on NLP-correctness, creating an active-learning feedback loop that allows the system to continuously improve. We trained a machine learning model with user input and incorporated NLP metrics to provide confidence scores to users, enabling them to evaluate the accuracy of extracted information from cTAKES, the most commonly used system for large-scale clinical data analysis.

Apache cTAKES

cTAKES (Clinical Text Analysis & Knowledge Extraction System) analyzes unstructured electronic health records and extracts medical concepts and health information.

NLP Analysis

NLP analytics are used to evaluate precision of matching between concepts in the clinical notes and in cTAKES medical database.

ML Prediction

Trained random forest model predicts the correctness of the extracted concepts and computes reliable confidence scores which are used to improve cTAKES identification.

User Feedback

Users provide feedback (‘yes’ or ‘no’) to extracted clinical concepts and the respective modifier words through UI. Feedback is used as a label to train an ML model along with NLP the metrics.

Results

We collected over 1500 feedback datapoints from users, and used this, together with the computed NLP metrics to build a model that identifies inaccurate extracted information at a 96% accuracy (97% F1). Our human-in-the-loop system can utilize user-provided feedback to self-improve through active learning. It, thus, represents a superior method to collect and use feedback data and can effectively be used to increase the reliability of medical NLP pipelines.