Hi,

In CSD Python API, I am trying to extract the coordinates of all the molecules in the unit cell in a "mol2" format. The problem is with this command:

            (entry.crystal.molecule.to_string('mol2'))

I am getting the coordinates of only one of the molecules. I would be appreciated if you could help me out.

 

Many thanks,

T.

Hi Tahereh,

as long as the mol2 file contains the crystallographic information, i.e. a CRYSIN line, then we can do this very easily:

from ccdc import io

reader = io.CrystalReader('/path/to/crystals.mol2')

for c in reader:
    unit_cell_molecule = c.packing()
    coordinates = [a.coordinates for a in unit_cell_molecule.atoms]

The packing method of the crystal will return a molecule, which may be several disconnected structures, which are found in the specified box, which by default is the unit cell.  There is an optional inclusion paramter to this method which controls whether or not to include atoms.  By default this is to include the whole molecule if the centroid is found in the unit cell.  Other options are, 'AllAtomsIncluded', 'AnyAtomIncluded' and 'OnlyAtomsIncluded'.

Hope this is helpful.

Best wishes

RIchard

 Many thanks for this. I really appreciate your kind help.

I have other question but don't know should I open a new topic or I can ask here. Anyway, I ask but if I have to do the other way around please let me know.

The concepts of "the number of molecules in the unit cell" and "component of  a molecule" is a bit confusing to me. For example, in Pentacene (entry identifier: PENCEN03) there are two molecules in the unit cell and also two component for each molecule. Whereas for other structures there might be four molecules in the unit cell and only one component. I don't know how to distinguish these two concepts. What is the meaning of molecule having several molecules (component)?

Best wishes,

T.

Hi Tahereh,

in the API a component is a connected set of atoms and the bonds connecting them.  These can be copies of the fundamental molecular structure as is the case with PENCEN03 where there are two copies of pentane, or different structures as in the case of ABEBUF.  These may be solvent molecules, coformer molecules, cocrystals or salts. 

When we pack the unit cell, we can get copies of all these components in a single molecule.  So in PENCEN03 you will get 10 copies of the pentane molecule.  When we pack ABEBUF, we get eight copies of each of the two structures of the crystal.

I hope this is clear.  If not, I'll try to expand on it, or ask a crystallographer here to make things clearer.

I really don't mind whether you open another topic or keep it here, but since it naturally follows on from your earlier question, perhaps here is best.

Best wishes

Richard

Hi Richard,

Thanks a lot for the explanations. I have been discussing these points with a colleague but unfortunately it is still not so clear to us. For example, in which situation this component is a copy of the fundamental molecule and when it is not? How these components are chosen? I would appreciate if you can expand on it a bit more.

Best Wishes,

T.

 

Hi Tahereh,

In some crystal structures, there can be more than one molecule in the asymmetric unit, i.e. there are molecules that are symmetry independent of each other in the crystal structure. These structures are known as "high Z'" structures, and more information on this topic can be found on these useful pages - https://zprime.co.uk/

In terms of the API, what this means is that the molecule object that you obtain has more than one component to it, although the components can be chemically identical. This is the case for PENCEN03, which has two molecules of pentacene in the asymmetric unit.

Best wishes,

Andy

Hi Andy,

Thanks a lot for this. I am happy that finally could understand this concept.

Best wishes,

Tahereh

You must be signed in to post in this forum.