Tsinghua University researchers build virtual Agent Hospital for AI doctor training without human intervention

Tsinghua University AI researchers develop Agent Hospital to train virtual AI doctors. (Source: Tsinghua University)

Tsinghua University researchers have built a virtual Agent Hospital for AI doctor training without human intervention. By creating a ‘digital twin’ of a real hospital, patients, and hospital staff, then having virtual doctors treat thousands of virtual patients, high accuracy in diagnosis and treatment was achieved after the doctors honed their skills.

David Chien, Published 06/16/2024 🇨🇳 🇫🇷 ...

AI Software

Tsinghua University researchers at the Intelligent Industry Research Institute (AIR) and the Department of Computer Science and Technology have built a virtual Agent Hospital for AI doctor training without human intervention. They first created a simulation of an entire hospital along with staff and patients. AI doctors were then given the responsibility of diagnosing and treating thousands of virtual patients without human intervention. The doctors quickly learned from their mistakes, and their skills in examining, diagnosing, and treating rose significantly.

Virtual simulations, or simulacrums, replicate a real-world environment for safe and rapid training of AI. The computer does not need to wait for a sick patient to appear, but rather, hundreds, thousands, even millions of sick patients can be programmed to appear as desired. The cost of such simulations is also much lower than actual training.

Tsinghua researchers were able to quickly train virtual AI doctors on 10,000 virtual patients in the Agent Hospital simulation using their process called the MedAgent-Zero method. These were created by feeding large-language models information on eight types of diseases to create electronic health records for 10,000 virtual patients, each of whom had a different severity and presentation. These eight diseases were acute nasopharyngitis, acute rhinitis, bronchial asthma, chronic bronchitis, COVID-19, Influenza A, Influenza B, and mycoplasma infection. A separate set of 500 patient records were created for testing.

During simulations, the virtual doctor powered by gpt-3.5-turbo-1106 quickly developed their skills. After seeing 10,000 virtual patients, the doctor had success rates in examining, diagnosing, and treating patients as high as 88%, 95.6%, and 77.6% depending on the disease.

GPT is rapidly improving, so Tsinghua researchers also tested their MedAgent-Zero training method using the more powerful gpt-4-1106-preview. They compared the performance of gpt-3 and gpt-4 AI doctors by using 1,273 questions from the MedQA database, a large set of multiple-choice questions that are similar to medical licensing questions found on tests such as the USMLE. The performance of virtual doctors on respiratory disease questions was 93.06% gpt-4 versus 84.72% gpt-3.

The ground-breaking performance of these AI doctors was achieved using just days of virtual training, and the Agent Hospital simulacrum opens the road to developing training methods for future AI doctors, as well as real doctors, that are significantly faster and more effective.

Readers who don’t know what jobs will be left in an AI-powered future can just tell their 1X humanoid robots to make dinner and clean house for them. Those that want to build AI should outfit their PC rigs with a fast Nvidia GPU (like this one on Amazon) or buy one of the fastest laptops on the planet (like this one on Amazon) to get started on training AI to take over many jobs.

Agent Hospital recreates the hospital environment so virtual AI doctors can practice medicine. (Source: Tsinghua University)

Virtual patients show up at the hospital to be helped by staff and treated by doctors. (Source: Tsinghua University)

All stages of a sick patient are simulated, from diagnosis to treatment to recovery. (Source: Tsinghua University)

The simulation runs repeatedly on its own to train and improve AI doctors until they've mastered their skills. (Source: Tsinghua University)

Without human intervention or training assistance, virtual AI doctors learn on their own to correctly diagnose and treat ailments. (Source: Tsinghua University)

Source(s)

Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents, Tsinghua University - Institute for AI Industry Research news release

▶ ▼ Press Release

Machine translated by Edge browser:

AIR creates a virtual hospital to realize the self-evolution of AI doctors

Release time: 2024-05-24

Tsinghua University's Intelligent Industry Research Institute (AIR) and the Department of Computer Science and Technology of Tsinghua University have cooperated to build a virtual hospital, Agent HospitalMedAgent-Zero, a self-evolution method for medical agents, is proposed, which enables medical agents to continuously improve their medical capabilities by generating a large amount of data without manual annotation in virtual hospitals, and is verified in real-world datasets. All patients, nurses, and doctors in Agent Hospital are played by autonomous agents driven by large models, which simulate the closed-loop process of "pre-hospital-in-hospital-post-hospital" of onset, triage, registration, consultation, examination, diagnosis, medication, rehabilitation, and follow-up. Based on the knowledge base and basic model, Agent Hospital simulates the disease generation and development process of virtual patients. Virtual doctors learn (i.e., read medical literature) and practice (i.e., interact with virtual patients and make diagnosis and treatment decisions) in Agent Hospital, constantly summarize experience from successful diagnosis and treatment cases, reflect on lessons from failed cases, and continuously improve the accuracy of multiple diagnosis and treatment tasks. After treating nearly 10,000 virtual patients (it takes about 2 years for human doctors), virtual doctors were able to surpass the current best methods on the respiratory disease subset of the MedQA dataset, achieving an accuracy rate of 93.06%. The study, which was co-authored by Assistant Prof. Ma Weizhi of AIR and Prof. Yang Liu, Executive Dean of AIR and Associate Dean of the Department of Computer Science, has received extensive attention and discussion from the artificial intelligence community and medical community at home and abroad after it was published on arXiv.

· Title of the paper: Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents

· Link to the paper: arxiv.org/pdf/2405.02957v1

In recent years, large-scale language models have developed vigorously, and agent technology based on large language models has attracted much attention. Previous studies have used agent technology to achieve real-world simulation, including interaction and game scenarios such as "Stanford Town" and "Werewolf Killing Game". At the same time, agent technology is also used in the scheduling planning and collaboration process of various tasks, but this process mostly relies on the support of high-quality manual annotated data. Therefore, the research question is whether real-world simulation can help improve the task processing ability of agents.

Smart healthcare has attracted wide attention due to its importance and application value, and the research team has paid great attention to the application of large language models and agent technology in medical scenarios. In response to the above research questions, the team believes that the real model environment can help the task ability of agents improve and evolve, so they carried out the Agent Hospital research that combines real-world simulation and medical ability improvement. In this work, the team is committed to building a hospital simulation environment and exploring the autonomous evolution of medical agents in this environment. The purpose is to enable agents to independently accumulate medical knowledge in the process of diagnosis and treatment and learning, just like human doctors, and realize the continuous evolution of medical capabilities.

The research team first focused on using large-scale model agents to simulate real-world critical medical processes. In Agent Hospital, the team designed and covered 8 typical scenarios from disease generation to recovery, namely: onset, triage, registration, consultation, examination, diagnosis, prescription and recovery, and patients will actively participate in follow-up feedback. All processes are supported by large models in which the roles can interact autonomously.

Examples of major diagnosis and treatment sessions

The diagram above illustrates a closed-loop approach: when Kenneth Morgan, the patient's agent, becomes ill, he goes to the hospital for help. Triage nurse Katherine Li understands Morgan's symptoms and analyzes him, and triages him to a specific department. After Morgan completes the registration, consultation, and medical examination according to the doctor's instructions, the doctor Robert will give him the final diagnosis and treatment plan, and Morgan will go home to rest according to the doctor's instructions and give feedback to the hospital for recovery, until the next time he gets sick and then go to the hospital.

As you can see from the example above, the research team designed two main types of roles for the hospital: medical staff and patients. All character information is generated by a large model (GPT-3.5), so it can be easily scaled and added. The specific information of some of the characters is shown in the figure below, 35-year-old patient Kenneth Morgan currently has acute rhinitis, a history of hypertension, and a series of symptoms such as persistent vomiting; Zhao Lei is an experienced radiologist, and internist Elise Martin has excellent communication skills and specializes in the diagnosis and treatment of acute and chronic medical diseases. These complete character information backgrounds enhance the realism of the hospital simulation.

An introduction to the virtual character's information

In the above-mentioned medical simulation process, the generation of disease is the key to it. Specifically, the current medical record information is generated by a large language model combined with medical knowledge to generate a complete medical record for the patient, including the type of disease, symptoms, duration, and various examination results (see the appendix of the paper for details). It should be noted that in order to ensure the accuracy of the entire simulation process as much as possible, the patient agent will only perceive the symptoms of his disease but not the specific disease, while the doctor agent can only understand the information by talking to the patient agent and prescribing tests. The examination that the patient agent needs to perform, the type of disease and the severity of the disease will be used as three key tasks to evaluate the ability of the medical agent to diagnose and treat virtual patients.

Most traditional medical model training methods rely on pre-training, fine-tuning and other technologies, so they need to be supported by a large amount of medical data and some high-quality manually annotated data. However, the research team believes that the process of improving the capacity of human doctors does not rely on such massive data, and they can often accumulate experience from clinical practice in the diagnosis and treatment process, and will also improve by reading medical literature to accumulate key knowledge. Medical agents in virtual hospitals should be able to achieve similar capability evolution.

Therefore, the team designed an agent self-evolution algorithm named "MedAgent-Zero", which, like AlphaGo-Zero, does not rely on manual annotation data, but uses learning (i.e., reading medical literature) and practice (i.e., interacting with virtual patients and making diagnosis and treatment decisions) in the virtual hospital to achieve capacity improvement. Independently accumulate experience on the three tasks of disease diagnosis and treatment recommendations; On the other hand, medical agents will also learn autonomously, simulating the learning process of medical documents based on the medical questions generated by the LLM.

MedAgent-Zero policy flow diagram

As shown in the figure above, the evolution of MedAgent-Zero includes two approaches: 1) Summarizing experience from successful cases, for diagnosis and treatment problems that can be answered correctly, the intelligent body will accumulate case database experience like a human doctor; 2) Reflect on lessons learned from failures, and when answering mistakes, the agent will take the initiative to reflect on the mistakes and reflect on them. If the lessons from reflection help the agent answer the question, it will be preserved and stored in the experience pool.

Eventually, the research team will carry out the accumulation and evolution of the above two aspects in the training process on virtual data. In each inference process, the agent retrieves the most similar content from the two databases and adds it to the Prompt for in-context learning, and accumulates medical records or summarizes experience according to the correct and incorrect answers, so as to continuously improve the agent's ability.

In the virtual hospital, the research team constructed the medical records of tens of thousands of virtual patients for the autonomous evolution experiments of medical agents, including 8 respiratory related diseases such as influenza A, influenza B, and new crown, involving more than 10 different medical examinations. Based on the calculation that human doctors treat about 100 patients a week, it may take two years for human doctors to diagnose 10,000 patients, but it only takes a few days for intelligent doctors to do this.

The team mainly evaluated the ability of medical agents in virtual hospitals from two aspects. The first is the evaluation of medical competence in the virtual environment: as shown in the figure below, in the training process of the medical agent (left), with the increase in the number of patients diagnosed and treated, the accuracy of the medical agent on the three key tasks continues to increase and gradually stabilizes. In the experiment of 500 test medical records, it was found (right) that the accuracy of the agent fluctuated slightly as the number of patients increased, but showed an overall upward trend.

The task accuracy of the medical agent on the training set (left) and the test set (right).

Subsequently, the research team compared the diagnostic accuracy of medical agents on various diseases before and after their evolution, and found that they were all greatly improved, verifying the effectiveness of their autonomous evolution.

Diagnostic manifestations of different diseases before and after the evolution of agents

On the other hand, the team used a subset of respiratory diseases from the external dataset MedQA to evaluate the medical agent's ability in real-world medicine. Surprisingly, even without using any artificially annotated data in the process of agent evolution, after treating nearly 10,000 patients, the medical agent was able to surpass the current best method on the dataset and achieve the highest accuracy rate of 93.06%, which verifies the effectiveness of the autonomous evolution of medical agents in the simulated environment.

Accuracy of different methods on a subset of MedQA

In addition, the research team carried out ablation experimental verification, and the results showed that both the examples accumulated from the successes and the lessons learned from the failures can help improve the medical capabilities of the model.

Ablation assay performance of MedAgent-Zero

In summary, this research work constructs the first virtual hospital scenario, Agent Hospital, and proposes MedAgent-Zero, a medical agent evolution algorithm that does not rely on artificial data annotation. The experimental results of virtual data and real data preliminarily verify the effectiveness of the simulation environment for the improvement of medical agent capabilities, and propose new solutions for the application of artificial intelligence, especially large language models and agent technology in smart medical scenarios. However, there are still some limitations in this research work, and in the future, the team will continue to improve and optimize the disease types covered, the meticulousness of the simulation environment, and the selection and optimization of the model base.

About the corresponding author

Ma Weizhi, an assistant researcher at the Institute of Intelligent Industry (AIR) of Tsinghua University, was selected as the "Young Talent Lifting Project" of the China Association for Science and Technology. His research interests include intelligent information acquisition and intelligent medical care. Personal homepage: mawz12.github.io.

Liu Yang is Professor of GDS, Executive Dean of the Institute of Intelligent Industry (AIR), Deputy Dean of the Department of Computer Science, Tsinghua University, and winner of the National Fund for Distinguished Young Scholars. His research interests include artificial intelligence, natural language processing and smart medicine. Personal homepage: nlp.csai.tsinghua.edu.cn/~ly.

Loading Comments

Comment on this article

Oukitel WP52: Compact smartphone is...

Rollei 35AF compact camera combines...

David Chien - Tech Writer - 869 articles published on Notebookcheck since 2023

Having worked at Activision, UCLA, Anime Expo and more, I've seen technology being used to save lives, create games, and create fantastic 3D VR/AR worlds. There's always something fun in emerging technology that I want to get my hands on and all my friends turn to me to find the best for their needs, so I'm glad to bring my experience to Notebookcheck.

Please share our article, every link counts!