I have a working docking setup with a substructure constraint between a protein residue atom and the substrate to be docked. However, I need to add a distance constraint between the cofactor (see below) and the same substrate. Attempts to do this result in a runtime error of 'Cannot find appropriate protein.' The code is based on the CCDC docking sample and works properly w/o the distance constraint.

Information:

 * the 'fixed' cofactor is part of the protein PDB file (Z:COF1)

 * the substrate is the ligand to be docked and is in a separate mol2 file

 * I get the cofactor atom using 

    cofactor_atom = [a for a in self.protein.ligands[0].atoms if a.label=="H24"][0]

  * I get the substrate atom using (after reading the mol2 file it into self.substrate - same approach used for the substructure constraint that works)

    substrate_atom = [a for a in self.substrate.atoms if a.label=="O1"][0]

  * I then try calling the add constraint function which fails (both atoms used are previously found properly)

    self.settings.add_constraint(self.settings.DistanceConstraint(cofactor_atom, substrate_atom))

 

It looks like the issue is with the cofactor atom being in the protein ligands (if I use a protein residue atom then I can add the constraint). Is there a proper way to define a distance constraint between a cofactor, that is part of the protein pdb file, and the ligand to be docked?

 

Thanks,

Hector

 

Dear Hector,

The way you are setting up a distance constraint looks correct to me. The only thing I'd suggest is to make sure you are calling the right atom label. Please find below a code snippet where I accessed the same atom but in two different ways (from the protein or from the binding site) and the atom label was different, too. This is due to the fact that new objects are created:

from ccdc.docking import Docker
from ccdc.io import MoleculeReader
from ccdc.protein import Protein

docker = Docker()
settings = docker.settings
protein_file = 'protein.mol2'
settings.add_protein_file(protein_file)

ligand_file = 'ligand.mol2'
settings.add_ligand_file(ligand_file,10)

protein = settings.proteins[0]

settings.fitness_function = 'plp'
settings.autoscale = 10.
settings.output_directory = 'output'
settings.output_file = 'docked_ligands.mol2'

settings.binding_site = settings.BindingSiteFromPoint(protein, (15.672, -6.014, 18.132), 10.0)
print settings.binding_site.ligands[0].identifier, protein.ligands[0].identifier
#option 1
#cofactor_atom = [a for a in settings.binding_site.ligands[0].atoms if a.label=="H4N"][0]
#option 2
cofactor_atom = [a for a in protein.ligands[0].atoms if a.label=="H23"][0]

### distance constraint
ligand_atom = [a for a in MoleculeReader(ligand_file)[0].atoms if a.label=="C32"][0]
settings.add_constraint(settings.DistanceConstraint(cofactor_atom, ligand_atom))

results = docker.dock()

 

I hope it helps.

Best regards,

Ilenia

Ilenia,

I took your code above and replaced the protein and ligand file names (also changed the atom labels to search for) and I get the same error message - RuntimeError('Cannot find appropriate protein.'). I tried using a mol2 file for the protein (instead of the PDB just in case) but it did not make any difference. As I previously mentioned, if I use a protein atom instead then I can set the distance constraint just fine and I am also using a substructure constraint that works properly. Let me know if you have any other suggestions.

FYI, I'm using the 2018 CSD API downloaded about a month ago.

Thanks,

Hector

 

Ilenia,

The cofactor we include in the protein's PDB file is defined as just a group of HETATM:

    HETATM15065  O1  COF Z   1      11.702  15.566  16.752  1.00  0.00           O 

    …

    TER        

Is there something else we may need to specify or any special formatting requirements for Gold?

Hector

 

Ilenia,

As a result of running your code with my data, it seemed likely that it was a file issue and it was.

It turned out to be the substrate mol2 file (ligand to be docked). None of the other utilities had any problems with it but it seems to cause problems for Gold. It was using an asterisk character for the internal name which I've seen happen before when working on files across multiple apps. I changed it manually just to try it and I can set the constraint now - the person who generated the file will be fixing it properly later.

Thanks for all your help,

Hector

 

Dear Hector,

Thank you for letting us know. I hope you now have a working setup to perform your docking calculation with GOLD.

Best regards,

Ilenia

 Ilenia,

I know it's been a few days but I wanted to see if you could look at the 'output/gold.err' log that was generated when you ran the code you previously provided above. Another option would be to provide me with the test files (protein.mol2 and ligand.mol2) you used and I will run the code and check the log.

The reason for wanting to do so is that I just found all the docking logs I've generated contain a warning message in gold.err like:

********************************************************************************
Ligand in file /home/hdelrisco/projects/kreds/Substrate_R_v2.mol2, named Substrate_R,
starting at address 70 raised the following warnings and/or errors
Warning message:
Distance constraint disabled for ligand Substrate_R; no atom 72 in Ligand

---- happens even with your sample SAH.mol2

What this means is that using a different ligand (from the CCDC files) or changing the format of our ligand did not solve the issue; it just moved the error message to a warning message somewhere else. It would be real simple to find out if this is really a problem with my files (using your code to test them) by making sure that using your code and files results in no warning. It's very important to remember that we only have any issues with docking when we try to set the distance constraint between the cofactor (in the protein file) and the ligand - if we use a protein atom instead of the cofactor, it all works perfectly and the API does not have any issues with the ligand file format.

Thanks,

Hector

 

 

Dear Hector,

Thank you for your post.

I did look at my output files, and indeed, those generated with the CSD Python API look similar to yours. The protein module currently treats cofactors as ligand molecules and therefore when the constraint is written out to the api_gold.conf file it will call the wrong atom indices. 

I've informed the developers team and they will be working on a fix for this issue.

In the meantime, could you use this rather suboptimal solution?

  • Access the right atom index with the command:

>>>cofactor_atom = protein.atom('H4N')

As before there must be a unique instance of this label for it to work.

  • Edit the api_gold.conf file to replace the word "ligand" with "protein"

e.g. in my case change: 

constraint distance ligand 5081 ligand 33 3.5000 1.5000 5.0000 off

to 

constraint distance protein 5081 ligand 33 3.5000 1.5000 5.0000 off

I hope this helps.

Best regards,

Ilenia

 

Ilenia,

Glad you could confirm it. Your suggestion will be a problem to complete the project I'm working on since I'll be going through thousands of protein files and some do not have a unique atom names. I went ahead and tried your suggestion on a small set of protein files where it should have worked but unfortunately I was not able to get it to work. I was able to get the atom properly using the protein.atom function. I also generated the config file and then modified it as you indicated (and confirmed the change was done properly). Unfortunately, when I ran the docking, using the existing settings file, I got an 'index out of range' error when docking.py tried to access the settings.ligands[0].atoms[...] for the constraint. It is possible that something is not right with my 'rig up' and re-docking setup (I assume that it actually worked for you).

At this point, I've wasted way too much time on this issue. Can you provide me with a time frame for the fix to be implemented so I can figure out what my options are?

Thanks,

Hector

 

Dear Hector,

First of all, apologies for the inconvenience this is causing you. We appreciate the time you are spending on it and we really value your feedback.


One of the developers on our side fixed the bug you reported and it will be available with the upcoming November release. I hope that's not too long for you to be considered as an option.

The fix still requires the atom to be accessed directly from the protein and to be unique. You can perhaps identify it by using other information than the label (e.g. coordinates).

Let me know if there is anything else I can help with.

Thank you very much again for all your feedback.

Best regards,
Ilenia

Ilenia,

I'm just going to run the docking without that constraint for now until the new release comes out and then I'll have to re-run it and redo the analysis. I have until the middle of December to decide if we are going to use the docking API or not so hopefully that should be enough time.

What would be the best way to find out when the November release is made available for download?

Thanks,

Hector

 

Dear Hector,

I will send you a download link when the release is out.

Best regards,

Ilenia

 

That will be helpful, Thanks...

Dear Hector,

I hope all is well with you. As promised, this is just to let you know that we've released version 2.0 of the CSD Python API which is now available for you to download here.

Best regards,

Ilenia

Ilenia,

Thanks for the update. I went to download it and try it out but I noticed it requires a 2019 license which we have not gotten yet so it will have to wait (we have the 2018 license and are in the final stage of evaluating multiple tools to decide which ones to keep/get next year). We will be making a decision soon.

Thanks for all your help,

Hector

You must be signed in to post in this forum.