Where AI makes a real difference
The big dream? A chatbot that provides medical assistance to people in Africa in their own language. Be it to a young mother, an assistant in a hospital unit or a teacher: artificial intelligence (AI) would give ad vice on the best available medical knowledge. This is how Mary-Anne Hartley, Professor and Director of the Laboratory for Intelligent Global Health and Humanitarian Response Technologies (LiGHT) at EPFL, introduces it. The doctor with South African roots has already taken a major step towards this dream, together with EPFL specialists for AI. At the end of 2023, the team presented the MEDITRON-70B language model, which performs reasonably well when it comes to exam questions from medical studies. The real test, however, is how it performs in practice.
AI for healthcare, open to all
The AI is based on the Llama series of models, developed by Facebook parent group Meta. This large language model (LLM) is similar to OpenAI’s GPT or Google’s Gemini but has two key advantages in this regard: it is completely open source, and it is small enough to host privately within hospitals and lowresource settings. The Meditron team coled by Professor Antoine Bosselut, Head of the Natural Language Processing Group at EPFL, has been working on LLMs for medical applications for some time. Here, too, a lot has happened regarding the successes of generative AI since 2022.
The challenge of medical AI
AI and medicine? The combination may be surprising – in this part of the world, the medical context is considered the pièce de résistance in terms of AI. What if it’s way off the mark? We’ve heard the discussions: the language models “hallucinate”, as it’s called in technical jargon. If they do not know something, they confabulate, but make it sound very plausible – not really what we want when it comes to medical expertise.
Hartley and Bosselut both emphasise that this happens to people too: we also cover up uncertainty, and, of course, human experts can be wrong too. On the other hand, there is still a great potential benefit if such AI can provide vital information even if it’s far from welldeveloped medical care. Building trust in AI works the same way as with any other medical intervention: any promising tool must show its effectiveness in studies, and good results in the laboratory do not necessarily equate to real success in everyday medical practice.
The “contaminations” are more crucial than the “hallucinations” anyway, says Hartley. This refers to distortions in the data the system is working with. “Not even three percent of the studies in the largest Pubmed medical database represent Africa.” A recipe for biased and inequitable distribution of accuracy. “If we cannot represent the non-Western medical context, we will not be able to build a useful system for Africa.” And because there is no time to wait for “perfect data”, we must make do with iterations and gradually get the systems to do what they are supposed to do.
Shaping AI with Human Input
Not too long ago, chatting with GPT was a rather anarchic matter, and language outputs could also go in perplexing directions. The fact that GPT now holds very civilised conversations is partly due to an additional loop in the training process, what is known as “reinforcement learning from human feedback” (RLHF). People teach the system to a certain degree by evaluating answers. The EPFL researchers do something similar in their language model; Hartley calls it “nudging”.
Doctors influence AI’s medical skills
Numerous doctors across the world, from Lausanne to Bangkok and the African continent, are testing the AI and its answers. “Doctors love it. It’s like a game between colleagues as part of a mentorship process: If you can mislead the novice, can you expose their lack of knowledge and then teach them?” As a result, the machine is getting better and better, especially when it comes to the special medical conditions which differ greatly from the typical exam questions at Western universities.
Hartley and Bosselut emphasise that a model like this can only be developed in academic settings, and “maybe only at EPFL”, says Hartley, thanks to the corresponding technical resources and expertise, also in collaboration with superb and innovative university hospitals in the area like CHUV. When it came to the first language models speciaised in medicine, Bosselut thought especially of hospitals in our latitudes and pharmaceutical companies, i. e. “of people who can pay a lot of money for this kind of thing”. It was only with Hartley that there was a shift to the “low resource context”, to “users that are much closer to my heart”. And Hartley adds: “We didn’t just want to build a model and publish great results. We wanted to go into practice too. That’s the hardest evidence you can get”. The EPFL team will now launch a large-scale clinical trial across multiple countries in Africa to nudge these models toward reliable real-world impact.