Although artificial intelligence has many advantages in the drug discovery process, challenges still remain--one being the ability for machine learning to follow a process that can allow it to extract critical information from a pool of data. In other words, pharmaceutical researchers must find ways to train software in pulling biological data. To address these challenges, scientists at Purdue University have developed a novel framework known as ‘Lemon’ for training learning models in mining data.
Specifically, Lemon can help drug discovery researcher’s better collect relevant data in the Protein Data Base (PDB) system. PDB is a comprehensive resource with more than 140,000 biomolecular structures.
"PDB is an essential tool for the drug discovery community," said Gaurav Chopra, an assistant professor of analytical and physical chemistry in Purdue's College of Science who works with other researchers in the Purdue Institute for Drug Discovery and led the team that created Lemon. "The problem is that it can take an enormous amount of time to sort through all the accumulated data. Machine learning can help, but you still need a strong framework from which the computer can quickly analyze data to help in the creation of safe and effective drugs."
Learn more about machine learning:
The Lemon software program is based on a quick C++11 library with Python bindings that will mine data stored in PDB in minutes. Lemon is available on GitHub at https://github.com/chopralab/lemon and detailed documentation is available at https://chopralab.github.io/lemon/latest/index.html.
Findings were published in Bioinformatics.
"Experimental structures deposited in PDB have resulted in several advances for structural and computational biology scientific and education communities that help advance drug development and other areas," said Jonathan Fine, a PhD student in chemistry who worked with Chopra to develop the platform. "We created Lemon as a one-stop-shop to quickly mine the entire data bank and pull out the useful biological information that is key for developing drugs."
Source: Science Daily