NLP pipelines

With Melax Tech NLP pipelines, users can amass information on any number of conditions, co-morbidities, or cohorts within an unstructured data set.

Clinical documents

Medical problems, medications, treatments, and lab tests

Comprehensive Clinical Information

This comprehensive pipeline recognizes four primary clinical entities from clinical notes: “medical problems,” “medications,” “treatments,” and “lab tests,” as well as their modifiers, including (1) “negation,” “severity,” “uncertainty,” “condition,” “subject,” and “body location” for “medical problems;” (2) “form,” “dosage,” “strength,” “route,” “duration,” and “frequency” for “medications;” (3) “negation” for “treatments;” and (4) “negation” and “value” for “lab tests.” All “temporal information” associated with primary entities will be extracted as well. Moreover, all extracted primary entities will be mapped to standard codes in corresponding medical terminologies: SNOMED-CT for “medical problems,” RxNorm for “medications,” ICD-10 PCS for “treatments,” and LOINC for “lab tests.”

Bleeding Events

This pipeline automatically identifies clopidogrel-induced bleeding events from clinical notes. (Reference)

COVID-19 Signs and Symptoms

This pipeline extracts COVID-19 related signs and symptoms defined by WHO, as well as eight associated attributes (body location, severity, temporal expression, subject, condition, uncertainty, negation, and course) from clinical text. The extracted information is also mapped to standard concepts in the Observational Medical Outcomes Partnership Common Data Model. (Reference)

(mapped to ICD-10)

This pipeline extracts disease mentions, together with associated modifiers including “negation,” “severity,” “uncertainty,” “condition,” “subject,” and “body location” from clinical reports. The recognized diseases will be mapped to ICD-10 CM codes.

Diseases and Symptoms
(mapped to SNOMED-CT)

This pipeline extracts patients’ medical problems, such as diseases and symptoms, together with associated modifiers, including “negation”, “severity”, “uncertainty”, “condition”, “subject”, and “body location”, from clinical reports. The recognized diseases/symptoms will be mapped to SNOMED-CT concept IDs. 

Lab Tests
(mapped to LOINC)

This pipeline recognizes lab test-related information from clinical reports. Examples of lab tests include panels and tests run on body fluids, procedures performed on a patient, such as x-rays and biopsies, and vital signs. It will extract numeric values associated with lab tests as well. Extracted lab test entities will be mapped to LOINC codes if applicable. 

Medication and Signature Information
(mapped to RxNorm)

This pipeline identifies mentions of medications as well as their signature information including “form,” “dosage,” “strength,” “route,” “duration,” and “frequency” from clinical reports. It then maps recognized medication and signature information to RxNorm codes. 


This pipeline will extract Opioid-related medications and dosage information, then convert them to morphine milligram equivalents (MME) for opioid overdose recognition. 

Procedures and Other Treatments

This pipeline extracts procedures, as well as other non-medication treatments for patients from clinical reports. Recognized entities will be mapped to ICD-10 procedure codes.

Clinical oncology

Cancer Information in Pathology Notes

This pipeline extracts comprehensive types of cancer-related information in pathology reports such as tumor size, tumor stage, and biomarkers. (Reference)

Colorectal Cancer Cases

This pipeline identifies colorectal cancer cases from multiple types of clinical notes. (Reference)

Lung Cancer Metastases Status

This pipeline extracts metastases-related information from pathology reports of lung cancer patients, including histological type, grade, specimen site, metastatic status indicators and the procedure.

Demographics and SDOH

Alcohol Status

This pipeline recognizes alcohol consumption of patients from clinical notes.


This pipeline recognizes Protected Health Information (PHI), including patient names, doctor names, addresses, dates, etc. It also provides two types of post processings. One is to replace the recognized PHI with place holders, and the other is to replace PHI with synthetical (fake) data. It can also be used to shift the dates on a patient level.


This pipeline extracts demographic information of patients, including gender, age, and ethnicity from clinical notes.

Intimate partner violence (IPV)

This pipeline identifies family violence-related behaviors such as kicking, hitting, and insulting from clinical notes.

Language Barriers

This pipeline recognizes the patient’s primary language and fluency levels of languages.

Smoking Status

This pipeline extracts mentions of the smoking status of patients in clinical notes and classifies them into three categories: current smoker, past smoker, and non-smoker.

Stressor Information Extraction

This pipeline extracts diverse types of stressors, such as lost job, family violence, and financial difficulty, from psychiatry notes. (Reference)

Biomedical documents

Biomedical Entity Extraction

This pipeline extracts genes, chemicals, and diseases from MEDLINE titles and abstracts.

Human Phenotype Ontology (HPO) Concepts

This pipeline will recognize HPO terms in clinical text and map them into HPO codes.

SNOMED and SNOMED CT® are registered trademarks of SNOMED International.