Hi all,

I used to use GoldMine to manage my docking runs, but I have found that the software is simply too outdated, and most importantly, cannot handle large datasets of docked ligands very well (i.e. crashes constantly.) 

Am I able to use the python API to do everything I wish to do with GoldMine?

Specifically, I want to retrieve the top 30% scored poses from a virtual screening run, and output them as a mol2 file for the next docking run. I would also like to filter out all of the poses that don't satisfy an H-Bond constraint. 


Dear Jordan,

yes you will be able to use the API for this.

The docking module of the CSD API, https://downloads.ccdc.cam.ac.uk/documentation/API/descriptive_docs/docking.html

shows how to read ligands in their docked poses, how to get access to the scoring terms of the fitness function you have used, and how to inspect the hydrogen bonds of the docked poses.

There are example programs using docking in https://downloads.ccdc.cam.ac.uk/documentation/API/cookbook_examples/docking_examples.html which may be helpful.

Best wishes



Thanks Richard,

Unfortunately, I am not well versed in in python. I have managed to read in the conf file to the API, and do basic things like print the length of molecules and what not (based directly on the first link you sent through). What I still need is a way to write out the molecules that satisfy an Hbond constraint '<Gold.PLP.Constraint>', and write out molecules based on their score.

Any help is appreciated,




No worries now. For anyone interested, it was as simple as:


>>>Molecules = results.ligands
>>>with MoleculeWriter('test.mol2') as mol_writer:
              for mol in molecules:
                    if mol.scoring_term('constraint') == -0.0:


You can change the 'scoring_term' as needed

Hi again,


I have another issue that i'd like to add to this one.

My above code works fine for molecules that can be read. However, there are a few problematic molecules in 'results.ligands' that are giving errors, which stops the whole process before it can write out all of the necessary molecules.

The error I get is:


Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/home/software/apps/ccdc/Python_API_2019/miniconda/lib/python2.7/site-packages/ccdc/io.py", line 756, in entries
yield self._make_entry(self._enumerator.entry(i))
File "/home/software/apps/ccdc/Python_API_2019/miniconda/lib/python2.7/site-packages/ccdc/_lib/FileFormatsLib.py", line 2052, in entry
return _FileFormatsLib.HDatabaseEnumerator_entry(self, i)
RuntimeError: FlatFileDatabase::entry: couldn't retrieve entry 396133
The error message was:

Mol2Entry::read: SUBSTRUCTURE field before MOLECULE.


Is there a way to skip those problematic ligands that cannot be read, and continue on with writing the molecules to file?




Hi Jordan,

I think the following will work:

for i in range(len(results.ligands)):
        ligand = results.ligands[i]
    except RuntimeError:

I'm curious as to why the docking results are corrupted, though.  Did you concatenate many files to make the docking result, or did GOLD create corrupted files?

Best wishes

Thanks Richard, I will slot this into the script and report back.

As for my workflow that lead to the corrupt docking solution, I started with a single gold.conf file that was then split up into 60 jobs to run on our HPC using a modified version of this script (modified to be used on a slurm based cluster): https://www.ccdc.cam.ac.uk/support-and-resources/support/case/?caseid=b40a45cb-d4a3-4da3-ae72-b7653398895c

Then, I used gold_merge.py to concatenate the outputs of the 60 jobs into one. I then tried to filter the solutions with respect to the HBond constraint, which led to the error that I posted. 

Hope that helps.


You must be signed in to post in this forum.