The worldwide failure of AI to catch the Coronavirus |MIT Technology Review in Spanish

When Coronavirus pandemia (COVID-19) hit Europe in March 2020, hospitals entered a health crisis that is not yet understood very well."The doctors had no idea how to handle patients," says the epidemiologist of the University of Maastricht (Netherlands) Laure Wynants, dedicated to predictive tools.

But there were China's data, which had an advantage of four months in the race to overcome the pandemic.If automatic learning algorithms could train with that data to help doctors to understand what they faced and make decisions, lives could be saved.Wynants recalls: "I thought: 'If there is any time when artificial intelligence [the] can demonstrate its usefulness, that moment is now'.I had very hopeful ".

But that has not happened, although not due to lack of effort.Research teams around the world tried to help.The community of AI, in particular, quickly launched software that many believed that would allow hospitals to diagnose or classify patients faster to, in theory, provide the support that was needed so much in the first battle line.

In the end, hundreds of predictive tools developed, but none made a real difference and even some was potentially harmful.

That is the condemnatory conclusion of several studies published in recent months.In June, the Turing Institute (the National Center for Data Science and Artificial Intelligence of the United Kingdom) published a report that summed up the debates of a series of meetings held at the end of 2020.The clear consensus was that the tools of AI had had little or no impact on the fight against COVID-19.

No apto para uso clínico

It is a reflection of the results of two important studies that evaluated hundreds of predictive tools developed last year. Wynants es la autora principal de uno de ellos, publicado en el British Medical Journal, que se sigue actualizando a medida que se lanzan nuevas herramientas y se prueban las existentes.Wynants and their colleagues have analyzed 232 algorithms to diagnose patients or predict the severity of the disease they suffered.They discovered that none of those algorithms was suitable for clinical use.Only two have been indicated as promising enough for future evidence.

Wynants confesses: "It is shocking.I already had my doubts before starting the studio, but this exceeded my fears ".His work is supported by another great analysis by the Automatic Learning Researcher at the University of Cambridge (EE.UU.) Derek Driggs and his colleagues, which was published in Nature Machine Intelligence. Este equipo investigó los modelos de aprendizaje profundo para diagnosticar la COVID-19 y predecir el riesgo del paciente a partir de imágenes médicas, como radiografías de tórax y tomografías computarizadas (TC) de tórax.They examined 415 published tools and, like Wynants and their colleagues, they concluded that none was suitable for clinical use.

Driggs, who also works in an automatic learning tool to help doctors during pandemic, says: "This pandemic was a great test for AI and medicine.It would have been a great help for society to be on our side.But I don't think we have passed the test ".

El fracaso mundial de las IA para atrapar al coronavirus | MIT Technology Review en español

Both teams found that the researchers repeated the same basic errors in the way of training or testing their tools.Incorrect assumptions about data often meant that trained models did not work as stated.

Wynants and driggs still believe that AI has the potential to help.But they are worried that it could be harmful if it is built inappropriately, because it could not detect some diagnoses or underestimate the risk for vulnerable patients."There is a lot of hype about automatic learning models and what they can do today," says Driggs.

The unrealistic expectations encourage the use of these tools before they are ready.Wynants and Driggs claim that some of the algorithms that analyzed had already been used in hospitals and some were even marketed for private developers.Wynants warns: "I'm afraid they may have damaged patients".

So what went wrong?And how to solve that gap?The positive thing is that pandemic has made it clear to many researchers that the way in which the tools of AI should change are built.The researcher adds: "Pandemia has highlighted the problems we have been dragging for some time".

¿Qué salió mal?

Many of the discovered problems have to do with the poor quality of the data that researchers used to develop their tools.Information about COVID-19 patients, including medical scanners, was collected and shared during the global pandemic, often by doctors who struggled to treat these patients.The researchers wanted to help and these were the only public data sets available.But it meant that many tools were created using poorly labeled data or unknown sources.

DRIGGS highlights the problem of what he calls Frankenstein data sets, which are collected from multiple sources and can contain duplicates.This means that some tools end up being tested with the same data with which they were trained, which makes them look more precise than they really are.

Also clounts the origin of certain data sets.This can cause researchers not to detect some important points that bind the training of their models.Many, without knowing it, used a data set containing thorax scanners of children who did not have COVID-19 as examples of how cases were seen without COVID-19.But as a result, AI learned to identify children, not COVID-19.

The driggs group trained its own model using a data set that contained a combination of explorations taken when patients were lying down and standing.As scanned patients lying were more likely to be seriously ill, the AI erroneously learned to predict the risk of severe COVID-19 depending on the position in which the person was.

In other cases, it was discovered that some IA detected the source of the text that certain hospitals used to label the scan.As a result, hospitals with more serious cases became the Covid-19 risk predictors.

Errors like these seem obvious in retrospective.They can also be corrected by adjusting the models, if the researchers detect them.Deficiencies can be recognized and publish a less precise and less misleading model.But many tools were developed by AI researchers who lacked medical experience to detect errors in data or by medical researchers who lacked mathematical skills to eliminate those errors.

A more subtle problem that Driggs stands out is the built -in bias, or the bias entered at the point at which a data set is labeled.For example, many medical images were labeled based on whether the radiologists who made them indicated the presence of COVID-19.But that embeds, or incorporates, any bias of that concrete doctor in the basic truth of a data set.It would be much better to label a medical image with the result of a PCR test instead of the opinion of a doctor, says Driggs.But there is not always time for statistical details in such employed hospitals.

That has not prevented some of these tools from being quickly introduced into clinical practice.Wynants explains that it is not clearly known which ones are used or how.Hospitals sometimes claim that they use a tool only for research purposes, which makes it difficult to evaluate how much doctors trust them."There is a lot of secretism," Wynants highlights.

He asked a company that marketed his deep learning algorithms to share information about his method, but did not receive any response.Subsequently, he found several models published by researchers linked to this company, all with a high risk of bias."Actually, we don't know what the company implemented," he says.

In his opinion, some hospitals have even signed non -dissemination agreements with their medical suppliers.When I asked the doctors what algorithms or software they used, sometimes they replied that they were not allowed to say it.

Cómo arreglarlo

What is the solution?Better data would help, but in times of crisis it is very much to ask.It is more important to make the most of the data sets that we already have.The simplest would be for artificial intelligence teams to collaborate more with doctors, says Driggs.Researchers should also share their models and reveal how they were trained so that others can try them and develop them.And he adds: "We could do those two things now.And they would solve 50 % of the problems we have identified ".

It would also be easier to obtain data if the formats were standardized, highlights the doctor who directs the clinical technology team in the beneficial global health research organization Wellcomome Trust, Bilal Mateen, based in London (United Kingdom).

Another problem that Wynants, Driggs and Mateen identify is that most researchers rushed to develop their own models, instead collaborate or improve existing ones.The result was that the world collective effort produced hundreds of mediocre tools, instead of a handful of adequately trained tools.

Wynants details: "The models are very similar, almost all use the same techniques with small adjustments, the same data and all make the same mistakes.If all these people who manufacture new models try the already available models, maybe at this point we would have something that could help in the clinical part ".

In a sense, this is an old research problem.Academic researchers have few professional incentives to share work or validate existing results.There are no rewards for advancing to carry the technology of "The Laboratory table to the header", considers Mateen.

To address this problem, the World Health Organization is thinking about an emergency data exchange agreement that would be launched during international health crises.It would allow researchers to exchange data through borders more easily, underlines mateen.Before the G7 Summit in the United Kingdom in June, the main scientific groups of the participating nations also requested "data availability" to prepare for future health emergencies.

These initiatives sound a bit vague and the calls for change always remember illusions.But Mateen has a vision that he calls "naively optimistic".Before the pandemic, the impulse of such initiatives had stagnated."It seemed that it was a mountain too high, whose view was not worth it.The COVID-19 has put a lot of this on the agenda, "he says.

And concludes: "Until we accept the idea that we have to solve the unattractive problems before the most attractive, we are condemned to repeat the same mistakes.It would be unacceptable if it didn't happen.Forgetting the lessons of this pandemic is a lack of respect for those who died ".