Accelerating Drug Discovery with AWS and Intel
Curated Data Set of Protein Structures from the PDB with Predicted Hydrogen Positions Available for Download
Thanks to the combined computing power of AWS and Intel, we have completed an Intel RISE grant project to provide a curated data set of protein structures from the Protein Data Bank (PDB) with predicted hydrogen positions.
Historically, in collaboration with the pharmaceutical industry, we have developed reliable methods for interpreting the likelihood of given interactions in the binding sites of proteins using proprietary information that is not unavailable publicly.
We were keen to repeat these studies with structures from the PDB. However the hydrogen positions are not available for water networks in the proteins. For reliable predictions, databases of augmented protein structures are needed where these hydrogen positions have been assigned.
Such information can be generated computationally, but such methods can be computationally expensive as they have to consider multiple possible models, necessitating significant computational power.
To overcome these restraints, the combined computing power of Intel and AWS was used to generate a snapshot of cavities in the proteins in the PDB that may bind small molecules with reliable hydrogen positions for all components.
This data set of protein structures with predicted hydrogen positions is now available for all to use in drug discovery research and development. Removing the requirement to repeat the computation saves time, reduces environmental impact, and makes the data available to everyone, regardless of access to large computational resource.
Download the protonated PDB files below.
Download the Intel RISE grant data set