University of Georgia Tests AI Framework for Intensive Care Medication Decision Support

Medication errors in intensive care units occur at alarming rates, with unintended adverse drug events affecting 5% of hospitalized patients annually and contributing to doubled mortality risk. A multi-institutional research team led by the University of Georgia has developed PharmacyGPT, a novel artificial intelligence framework using iterative prompt optimization to enhance medication decision-making in the ICU. The study, published in BMC Medical Informatics and Decision Making, demonstrates how large language models can be rapidly trained for specialized healthcare tasks without extensive dataset requirements or traditional model fine-tuning.

Key Points

  • The research tested whether iterative prompt engineering could train GPT-4 to perform specialized medication management tasks including patient clustering, medication regimen generation, and mortality prediction in intensive care settings.
  • PharmacyGPT achieved mortality prediction accuracy of 0.75 with recall of 0.70, successfully generated medication regimens including appropriate drug, dose, and frequency combinations, and created 11 interpretable disease state clusters from patient data.
  • This represents early-stage exploratory research with significant limitations including relatively small dataset size by AI standards (1,000 patients), imbalanced outcomes (9:1 ratio of survivors to deceased), and low ROUGE evaluation scores (0.07 for medication regimen generation), indicating the approach requires substantial refinement before clinical implementation.

The findings suggest that with carefully designed high-quality datasets and specialized training approaches, AI systems could eventually support medication management decisions, though extensive validation and development work remains necessary before clinical deployment.

The Data

  • The study used retrospective data from 1,000 adult patients admitted to ICU for ≥24 hours, applying few-shot learning (training with minimal examples) and dynamic prompting techniques within the GPT-4 framework.
  • The cohort included adult ICU patients from the University of North Carolina Health System between October 2015 and October 2020, with data encompassing demographics, medication information, severity of illness scores (SOFA, APACHE II), and patient outcomes.
  • Mortality prediction performance varied by prompting strategy.
  • Medication regimen generation produced ROUGE scores of 0.07 (ROUGE-1), 0.01 (ROUGE-2), and 0.05 (ROUGE-L) for PharmacyGPT using few-shot learning, compared to 0.04 for both zero-shot ChatGPT and GPT-4 approaches, with clinical expert review noting that despite low scores, generated regimens had appropriate syntax, structure, and clinically reasonable medication combinations.
  • The hierarchical clustering algorithm successfully identified 11 distinct patient clusters including categories such as “Diverse symptoms with neurological impact,” “Respiratory & Pulmonary,” “Heart Attack,” and “Trauma,” demonstrating meaningful disease state groupings based on embeddings.
  • The authors acknowledged critical limitations including the 9:1 imbalance between alive and deceased patients significantly reducing precision and F1 scores, the relatively small 1,000-patient dataset by comparative AI standards, lack of “ground truth” validation from prospective comprehensive medication management, and the need for novel evaluation metrics beyond traditional NLP measures like ROUGE that may not appropriately assess medication-specific tasks.

Industry Context

Iterative prompt optimization shows promise as a rapid means to improve LLM functionality to specific tasks, even in highly domain specific areas like medication management in the ICU.

Zhengliang Liu, Shaochen Xu, Zihao Wu, and colleagues, BMC Medical Informatics and Decision Making

This work addresses a critical healthcare need, as medication complexity in intensive care units creates substantial patient safety risks. ICU patients typically receive 13-20 medications simultaneously that require minute-to-minute titration, many designated as high-risk with narrow therapeutic indices and potential for dangerous drug interactions. The annual costs of treating adverse drug events across the United States exceeds $1.5 billion, with estimates indicating most such events are preventable. Critical care pharmacists can reduce adverse drug events by up to 70%, demonstrating cost avoidance to pharmacist salary ratios of $3.3-$9.6:1, yet many facilities face personnel or financial constraints limiting access to this expertise.

The PharmacyGPT framework builds on emerging work in domain-specific prompt engineering across medicine, including similar approaches in radiation oncology (RadOncGPT, RadiologyGPT) and radiology report generation (ImpressionGPT). These projects demonstrate that carefully designed prompts and in-context learning can develop specialized capabilities in general-purpose language models even with minimal training data. The current study extends this concept to the particularly challenging domain of ICU medication management, where the alphanumeric complexity of medication data (drug names, doses, frequencies, routes) has historically limited traditional AI algorithm development.

The research team notes that while current clinical decision support systems often reduce medication decisions to “one-size-fits-all” rules based on rudimentary inputs like package insert warnings, LLMs demonstrate unique potential to process complex medication-related data and potentially provide more individualized, expert-level guidance. However, they caution that substantial work remains before such systems could be clinically implemented, including rigorous validation studies, user-centered design processes, and robust implementation science methodology to ensure safety and effectiveness in real-world clinical settings.

The study, “PharmacyGPT: exploration of artificial intelligence for medication management in the intensive care unit,” was published in BMC Medical Informatics and Decision Making, October 2025 (DOI: 10.1186/s12911-025-03230-1).