the entry for csd number 1827879 will be appearing in the February update to the CSD which should be available to you very soon.
what problems are you having with the script?
the return value of the packing() function on a Crystal is an instance of a Molecule, not a Crystal, and the 'intermolecular' attribute of the HBondCriterion does not apply: all the HBonds are considered to be intramolecular, even if they are in disconnected components of the molecule. Additionally, the HBonds will not be found since the path_length_range attribute if the HBondCriterion is left at its default setting of (4, 999), requiring the HBond to be within a single connected component of the molecule.
You can discover the HBonds by setting the path_length_range to (0, 999), allowing HBonds to be between disconnected components of the molecule, and you can determine which HBonds are between disconnected components by testing the shortest_path() method of the atoms of the HBond, that is to say
criterion.path_length_range = (0, 999)
criterion.intermolecular = 'ANY'
packed = cry.packing()
hbonds = packed.hbonds(hbond_criterion=criterion)
inter_component_hbonds = [hb for hb in hbonds if packed.shortest_path(hb.atoms, hb.atoms[-1]) == 0]
I hope this is helpful, If anything is unclear please get back to us.
the definition of what constitutes an hbond is contained in the class molecule.MoleculeHBondCriterion. Here there are options for the geometry, the presence or absence of the hydrogen, and crucially for you, a means to specify atom types which may form a hydrogen bond.
These are documented in the API in:
I hope this is helpful.
you can use the python API to do this. For example:
from ccdc import io
with io.EntryReader('/path/to/file.sdf') as reader, io.EntryWriter('/path/to/file.pdb') as writer:
for entry in reader:
I hope this is helpful. Please come back to me if anything needs clarification.
I don't have an existing script to do this, but here is a sketch of how you might go about it.
from ccdc import io, descriptors
directory = '/path/to/directory/of/cifs'
cif_files = glob.glob(os.path.join(directory, '*.cif'))
for cif_file in cif_files:
with io.MoleculeReader(cif_file) as rdr:
cif = rdr
for a1 in cif.atoms:
for a2 in cif.atoms:
print a1.label, a2.label, descriptors.MolecularDescriptors.atom_distance(a1, a2)
It is worth noting that if you use an EntryReader to read the CIF file, all of the CIF attributes are available, for example,
cif_entry = io.EntryReader(cif_file)
a1 = cif_entry.attributes['_geom_bond_atom_site_label_1']
a2 = cif_entry.attributes['_geom_bond_atom_site_label_2']
length = cif_entry.attributes['_geom_bond_distance']
for a1, a2, length in zip(a1, a2, length):
print a1, a2, length
Hope this is helpful. Please come back with any questions you have about this, or indeed any other API matter.
in the API a component is a connected set of atoms and the bonds connecting them. These can be copies of the fundamental molecular structure as is the case with PENCEN03 where there are two copies of pentane, or different structures as in the case of ABEBUF. These may be solvent molecules, coformer molecules, cocrystals or salts.
When we pack the unit cell, we can get copies of all these components in a single molecule. So in PENCEN03 you will get 10 copies of the pentane molecule. When we pack ABEBUF, we get eight copies of each of the two structures of the crystal.
I hope this is clear. If not, I'll try to expand on it, or ask a crystallographer here to make things clearer.
I really don't mind whether you open another topic or keep it here, but since it naturally follows on from your earlier question, perhaps here is best.
as long as the mol2 file contains the crystallographic information, i.e. a CRYSIN line, then we can do this very easily:
from ccdc import io
reader = io.CrystalReader('/path/to/crystals.mol2')
for c in reader:
unit_cell_molecule = c.packing()
coordinates = [a.coordinates for a in unit_cell_molecule.atoms]
The packing method of the crystal will return a molecule, which may be several disconnected structures, which are found in the specified box, which by default is the unit cell. There is an optional inclusion paramter to this method which controls whether or not to include atoms. By default this is to include the whole molecule if the centroid is found in the unit cell. Other options are, 'AllAtomsIncluded', 'AnyAtomIncluded' and 'OnlyAtomsIncluded'.
Hope this is helpful.
It looks as if you've spelled csd_licence.dat as csd_licencw.dat, so that may be the source of the problem.
Incidentally, you're a brave man to be running these things as root!
the warning is issued by the underlying C++ code, and is not under the control of the python API. The only thing I can think of is to record the identifiers of the molecules involved before performing the search, possibly using ccdc.utilities.Logger, then inspecting the output.
I'm sorry I can't be any more helpful at the moment.