The Python API 2D structure diagram documentation describes how to generate pictures of 2D diagrams, as well as how to perform SMARTS queries against the elements in those diagrams to perform highlighting.  

 

Question: Is there any programmatic way to extract the different components that are used in these 2D diagrams?  Presumably there is an underlying graph representation, but I can't find any documentation for this in the ccdc.diagram documentation

 

As a motivating example: The  ABEDAM 2D diagram correctly shows the organic component, but the 0-th component of the molecules entry is disordered, and hence has a different total number of carbon atoms.  It would be useful to be able to extract a graph representation (e.g., SMILES string) for the organic component that is displayed in the 2D diagram, even though this is unclear in the atomic coordinate data.

 Hi Joshua,

 

The reason that we can't present a SMILES representation directly through the API for ABEDAM is that the structure has some unspecified bond types.  It is possible to extract the individual components, and to access the bond information within those which corresponds to the underlying graph representation.

If bond types are assigned, SMILES will then become available.  To demonstrate:

import ccdc
import ccdc.io

r = ccdc.io.MoleculeReader('CSD')
abedam = r.molecule('ABEDAM')

# in the CSD, the bond types are not fully specified for ABEDAM. SMILES are unavailable
for component in abedam.components:
print(component.bonds)
print(component.smiles)

# we can assign plausible bonds based on standard CSD conventions. SMILES can now be generated.
abedam.assign_bond_types()

for component in abedam.components:
print(component.bonds)
print(component.smiles)

 

I hope this answers your question,

Stewart.

Stewart:

Thanks for your help.  Your answer makes sense, but perhaps there is a problem with the ABEDAM entry?

When I run your code, component zero returns the SMILES string [N][C][C][N] (which is in agreement with the 3D structure shown on the ABEDAM page)

However, this is missing a carbon atom compared to the systematic name and the chemical diagram shown on the ABEDAM page (which has a 1,3-diaminopropane molecule, NCCCN)

Is there a programatic way to access the components in the chemical diagram directly, without using the 3D structure?

 

 

 

Joshua,

 

The simple answer is that there is no way to access the chemistry directly from the 2D diagram in our public API.  This is, however, an existing feature request that we will consider implementing in a future release.

 

In the case of this specific entry, we have looked at the 3D structure and agree that this representation is misleading. This entry will be updated in the CSD. On the ABEDAM page, if you show disorder and packing, you can see where the confusion originates from.

 

Regards,

Stewart.

You must be signed in to post in this forum.