Abstract:
Sepsis remains a critical challenge in intensive care, demanding rapid and accurate decision-making to optimize patient outcomes. This study evaluates a multi-agent system designed to support sepsis management by integrating three specialized agents—sepsis management, antibiotic recommendation, and guidelines compliance—using retrieval-augmented generation (RAG) to leverage current literature and guidelines. Initially tested on a single pneumonia-related sepsis case from the MIMIC IV database, the system has now been assessed across 10 diverse cases, including eight from MIMIC IV (e.g., necrotizing fasciitis, genitourinary sepsis) and 2 from specialized literature. The evaluation, conducted with Palmyra-Med 70B and compared against GPT-3.5 Turbo and GPT-4o Mini, focuses on recommendation accuracy, guideline adherence, and the prevalence of hallucinations—unsupported or excessive outputs that undermine reliability. Initial results concerning the pneumonia-related sepsis case indicate acceptable recommendations per expert reviews (Cohen’s Kappa = 0.622, p = 0.003), with strengths in early antibiotic suggestions and monitoring strategies. However, with further testing, hallucinations, such as erroneous clinical assertions, were detected across cases, with groundedness scores varying. Programmatic evaluations (e.g., TruLens) and human expert assessments highlight the need for improved context relevance and response grounding. This system exemplifies the potential of multi-agent architectures in clinical decision support for biomedical engineering yet underscores the challenge of ensuring reliability in real-time applications. Addressing hallucinations through refined RAG databases and agent definitions is critical for clinical adoption. This work invites further validation across broader datasets and integration into ICU workflows, offering a pathway to enhance sepsis care through advanced informatics.