Amalgam Rx has published two peer-reviewed studies reporting engagement metrics for its Medical-Grade AI™ framework, which includes clinical guardrails and personality-driven design for large language models in healthcare settings.
Key Points
- Amalgam Rx published two peer-reviewed papers validating its Medical-Grade AI™ framework: “Taming Large Language Models for Healthcare – A Multi-layered System” in Artificial Intelligence and Applications (selected from 376 submissions at ICAI 2024), and “Humanizing AI: Enhancing User Engagement in Health Applications with Personality-Driven AI Design” in Human-Centric Computing.
- The multi-layered guardrail system operates outside the large language model itself to monitor and govern outputs, designed to prevent potentially harmful information in regulated healthcare environments. This addresses issues found in Microsoft Research’s 2024 study on AI accuracy failures in medical contexts.
- The studies tested different AI personality designs with thousands of users over several weeks, with an empathetic persona producing the strongest engagement results.
- Amalgam Rx’s platform currently supports nearly 10 million patients across four continents and has processed more than 70 million clinical decisions through partnerships with global life sciences companies, health plans, and providers.
The reported engagement metrics suggest that combining clinical oversight systems with personality-based conversational design may increase patient interaction time in digital health applications, potentially supporting medication adherence and patient journey management.
The Data
- User sessions averaged 35 minutes across all personality variants tested, with 10% of interactions exceeding one hour in duration.
- Weekly retention reached 60%, which the company describes as double the typical industry baseline engagement rates.
- The empathetic AI persona variant produced 41-minute average sessions with 67% retention rates, outperforming other personality designs tested.
- A 2024 Microsoft Research study titled “The Illusion of Readiness” tested six leading AI models on medical tasks and found accuracy dropped from 83% to 52% under minor test variations, with models producing inaccurate or fabricated medical information nearly half the time. Amalgam cites this as evidence for why external guardrail systems are necessary.
- Studies involved thousands of users engaging over several weeks.
Industry Context
If not kept in check, AI can bring chaos in healthcare—it’s like expecting obedience from an untamed beast.
Bharath Sudharsan, chief data scientist and head of AI at Amalgam
The company positions its approach as addressing fundamental limitations in applying large language models to healthcare applications. While LLMs have been adopted across multiple industries, Amalgam argues that most implementations rely on the models to self-regulate their outputs, which may be insufficient for medical contexts where inaccurate information carries significant consequences.
The Medical-Grade AI™ framework operates as an external layer that sits outside the language model architecture. This design aims to provide monitoring and governance capabilities while allowing the underlying AI to generate conversational responses. The company works with pharmaceutical partners to develop what it describes as companion applications for patient support.
Amalgam Rx’s broader platform includes a modular software-as-a-medical-device (SaMD) system and EHR integration solutions. The company recently secured a $20 million credit investment from Catalio Capital Management and announced a strategic investment from CVS Health Ventures, though financial terms were not disclosed in this announcement. Whether the engagement metrics from these studies will translate to measurable health outcomes remains to be demonstrated through additional clinical research.



