What Will it Take for AI to Live Up to its Hype?

Artificial Intelligence/GettyStock

Courtesy of Getty Images

The pharmaceutical industry is expected to spend more than $3 billion on AI by 2025. It was $463 million in 2019, while AI clearly adds value, but its proponents say it’s not yet fulfilling its potential.

There are many reasons why the reality doesn’t live up to the hype, but limited data sets are a big one.

Given the vast amount of data collected every day, from steps to electronic medical records, lack of data is one of the last hurdles to be expected.

Traditional big data/AI approaches use hundreds or even thousands of data points to identify something like a human face. For this training to be reliable, AI needs thousands of data sets to recognize faces regardless of gender, age, ethnicity, or health status.

Examples are available for face recognition. Drug development is a completely different story.

“When you think about how you can modify a drug in different ways … the amount of dense data that covers all the possibilities is reduced,” says Adityo Prakash, founder and CEO of Verseon. BioSpace.

Adityo Prakash_Verseon2
Adityo Prakash

“Small changes can have a medicinal effect on our bodies, so you need detailed information about all kinds of changes.”

This may require millions of sample datasets, which Prakash says even the largest pharmaceutical companies don’t have.

Also Read :  Implemented OSIP ideas — December 2022

Limited predictability

Artificial intelligence could be very useful when the “rules of the game” become clear, he continued, citing the example of protein folding. Because protein folding is the same for many species, biology follows certain rules and can be used to predict the potential structure of functional proteins.

However, drug designs use entirely new combinations and are less amenable to AI because “you don’t have enough data to cover all the possibilities,” Prakash said.

Data sets have been used to make predictions about similar things, such as small molecule interactions, but the predictions are limited. Because there is no negative information, he said. Negative data is important for AI prediction.

Additionally, “much of what is printed is not reproducible.”

Small data sets, questionable data, and lack of negative data limit AI’s predictive capabilities.

Too much noise

Noise within available, large data sets presents another challenge. PubChem, one of the largest public databases, contains more than 300 million bioactivity data points from high-throughput screens, said Jason Rolfe, co-founder and CEO of Variational AI.

Jason Rolfe_Variable Artificial Intelligence
Jason Rolfe

“However, this data is unbalanced and noisy,” he said BioSpace. “In general, more than 99 percent of the compounds tested are inactive.”

Less than 1% of compounds that appear to be active across the screen are false positives, Rolfe said. This may be due to aggregation, assay interference, reactivity, or contamination.

Also Read :  Kognitiv Spark partners with University of New Brunswick to deliver Mixed Reality training resources

X-ray crystallography can be used to train artificial intelligence for drug discovery and to determine the precise spatial positions of ligands and their protein targets. However, despite significant progress in crystal structure prediction, drug-induced protein deformation is not well predicted.

Similarly, molecular binding (which mimics the binding of a drug to a target protein) is wrong, says Rolfe.

“Only about 30% predict the correct spatial organization of a drug and its protein target, and predictions of pharmacological activity are unreliable.”

While an astronomically large number of drug-like molecules can be generated, even AI algorithms that can accurately predict binding between ligands and proteins face a daunting challenge.

“It works against the primary target without disrupting the function of tens of thousands of proteins in the human body, and requires that they not cause side effects or toxicity,” Rolfe said. Currently, artificial intelligence algorithms are not up to the task.

The use of physics-based models of drug-protein interactions has been recommended to improve accuracy, but it has been noted that they are computationally intensive, requiring approximately 100 hours of CPU time per drug, which may limit their usefulness for studying large numbers of molecules.

Also Read :  MIDCO fiber internet goes live in Ely

Prakash notes that computer-based physics simulations are a step toward overcoming the current limitations of artificial intelligence.

“They can give you artificially generated data about how two things interact. But physics-based simulations don’t give you insight into degradation inside the body.”

Detached data

Another challenge is related to data systems and disconnected data sets.

“Many facilities are still using paper-based records, so the data they need is not available electronically,” said Moira Lynch, senior innovation leader in Thermo Fisher Scientific’s bioprocessing group. BioSpace.

Jaya Subramaniam_Definitive Healthcare
Jaya Subramaniam

“Data available electronically come from different sources, in different formats, and are stored in different locations,” compounding the challenge.

According to Jaya Subramaniam, head of life sciences product and strategy at Definitive Healthcare, these datasets are limited in scope and coverage.

The two main reasons are fragmented data and unknown data. “No single organization has a single complete set of data, be it claims, EMR/EHR or lab diagnostics.”

Furthermore, patient privacy laws require de-identified data, making it difficult to track a person’s journey from diagnosis to final outcome. Pharmaceutical companies are hampered by a slower rate of understanding.

Despite the unprecedented amount of data available, relevant and usable data is still quite limited. Only by overcoming these obstacles can the power of artificial intelligence be truly unleashed.


Leave a Reply

Your email address will not be published.

Related Articles

Back to top button