thinking about your search requests I realise that an SQL derived database really is overengineering.a solution to the problem. Since the ReducedCellSearch is very fast, and the number of hits returned is very small a simple filtering of the hits is more than fast enough. I've attached a simple example script which performs a couple of queries of the sort you are describing.
I'm sure you'll have no difficulty adjusting the script for your purposes, but if you do, please raise the issue here.
Okey-doke, Dean, I'll rustle something up. Might take me a couple of days, so please be patient.
we don't have any methods to search by spacegroup symbol or formula, so iterating over hit structures would be the only way to do it. If you have to do many of these searches it would be fairly simple to make an SQLite database containing terms of interest, then to join the results of a ReducedCellSearch with a query of this database.
If you like I can provide a prototype of how to go about this.
I'm not entirely clear what you are trying to do here. Let me know if I've got the wrong end of the stick:
You run a reduced cell search on the CSD, or another database of structures, retrieving some hits. You then wish to filter these results according to further criteria, e.g. chemical formula, or space groups.
You can do a simple filter of the hits, assuming there are not too many of them, simply by iterating over the hits:
for h in hits:
c = h.crystal
if c.spacegroup_symbol == ...
Alternatively you can use any of the search classes except TextNumericSearch on an individual crystal structure.
Hope this is helpful; if not please ask again.
The HBond in CATKIT is not found because the default path length range for detecting hydrogen bonds is set to (4, 999), so excluding contacts between separate components of the molecule. You can include such contacts by setting the path_length_range to (-1, 999), i.e:
from ccdc import io
csd = io.MoleculeReader('csd')
catkit = csd.molecule('CATKIT')
print catkit.hbonds(path_length_range=(-1, 999))
The value -1 is used to cope with both options to the 'require_hydrogens' parameter of the hbonds() method. I appreciate that this is not clear from the documentation, and this will be rectified in a forthcoming release.
I think the default behaviour is somewhat counterintuitive; I shall discuss with colleagues whether the default should be made more permissive.
Hope this is helpful.
I agree - or rather a friendly chemist agrees - that the structure is a bit rubbish. The first kekulize misassigns the double bonds in the carbon you mentioned, so the second aromatic assignment does not regard these bonds as aromatic, then the second kekulize does not operate on the same structure as the first. I agree that this is not ideal behaviour, but it is comprehensible.
The only solution I can think of is to assign all bond types:
where the double bond to the phosphorus is detected, the five membered ring is no longer aromatic and the kekulisation works as expected.
I have mailed the database group to see if they want to fix the bonds in the structure, but this will be too late for the forthcoming November release.
I'm afraid you have unearthed a genuine bug. Discussions are underway here to see if it can be fixed in the forthcoming API version 1.3 release.
In the meantime you can work around the problem by using the internal API:
mol = Molecule.from_string(...)
mol = Molecule(mol.identifier, _molecule=mol._molecule.create_editable_molecule())
Sorry about this.
Please carry on raising any difficulties you have, and making suggestions for ways in which we may improve the API.
Here's the slightly modified script, testing for 3D coordinates.
I've attached a table of spacegroup, average void space (as a percentage of the unit cell volume), number of observations from the 673,606 structures of CSD V536 with 3D coordinates. I'll leave it to the crystallographically adept to extract any meaning there is in the table.