• Skip to primary navigation
  • Skip to main content
  • Skip to footer
  • Career
  • Alumni
  • Employers
  • News

Fung Institute for Engineering LeadershipFung Institute for Engineering LeadershipFung Institute for Engineering Leadership

  • Master of Engineering
    • Master of Engineering Program
      • Engineering Departments
      • Program Design
      • Leadership Development
      • Capstone Experience
      • Career Development
      • Learn More
      • How to Apply
  • Fung Fellowship
    • Fung Fellowship

      The Fung Fellowship is shaping a new generation of entrepreneurial leaders focused on transforming health and wellness.

      • Program Overview
    • The Fung Fellowship
    • Executive & Professional Education
  • Partners
    • Partners
    • Become a Partner
    • Propose a Project
    • Recruit a Student
  • Apply
  • About
  • Career
  • Alumni
  • News
Information Extraction from Unstructured Clinical Notes Using Natural Language Processing (NLP)

[UCSF] Information Extraction from Unstructured Clinical Notes Using Natural Language Processing (NLP)

November 13, 2020 by

Discovering Inaccurate Extracted Information from Unstructured Clinical Notes Using Natural Language Processing

Team: Lengning Wei (BIOE), Marcel Schaack (BIOE), Chloe Kim (BIOE)

Advisors: Gundolf Schenk (UCSF), Gabriel Gomes (ME)

Despite the development of medical records systems, there is no precise technology for large-scale information extraction, public health research and precision medicine. We designed a reliable NLP-enabled system that extracts the medical history from health records and provides users with the ability to give targeted user-input on NLP-correctness, creating an active-learning feedback loop that allows the system to continuously improve. We trained a machine learning model with user input and incorporated NLP metrics to provide confidence scores to users, enabling them to evaluate the accuracy of extracted information from cTAKES, the most commonly used system for large-scale clinical data analysis.

Apache cTAKES

cTAKES (Clinical Text Analysis & Knowledge Extraction System) analyzes unstructured electronic health records and extracts medical concepts and health information.

NLP Analysis

NLP analytics are used to evaluate precision of matching between concepts in the clinical notes and in cTAKES medical database.

ML Prediction

Trained random forest model predicts the correctness of the extracted concepts and computes reliable confidence scores which are used to improve cTAKES identification.

User Feedback

Users provide feedback (‘yes’ or ‘no’) to extracted clinical concepts and the respective modifier words through UI. Feedback is used as a label to train an ML model along with NLP the metrics.

Results

We collected over 1500 feedback datapoints from users, and used this, together with the computed NLP metrics to build a model that identifies inaccurate extracted information at a 96% accuracy (97% F1). Our human-in-the-loop system can utilize user-provided feedback to self-improve through active learning. It, thus, represents a superior method to collect and use feedback data and can effectively be used to increase the reliability of medical NLP pipelines.

Project Brief


← View all Capstone Projects

Fung Institute For Engineering Leadership
Shires Hall
2451 Ridge Road Berkeley, CA 94709

Mudd Hall
1798 Scenic Avenue Berkeley, CA 94709

(510) 642-0633
funginstitute@berkeley.edu

Explore

  • Programs
  • Partners
  • Apply
  • Feedback
  • Job Opportunities

Experience

  • About
  • Career
  • Alumni
  • News
  • Donate

Connect

Copyright © 2023 Accessibility • Nondiscrimination • Privacy • Sitemap

berkeley_engineering

uc-berkeley

Copyright © 2023 Accessibility • Nondiscrimination • Privacy • Sitemap

berkeley_engineering

uc-berkeley

Prospective MEng Students

Sign up for our mailing list to receive program news and updates including information sessions, class visits and opportunities to connect with an admissions advisor.