• RE: normalise hydrogen position

    Dear David,

    I'm afraid in my last post I omitted the essential step of normalising hydrogen positions before testing the hits.  This should be done by:

    for h in hits:
        h.molecule.normalise_hydrogens()

    before testing the distance.

    Sorry about this.

    Best wishes

    Richard

     

  • RE: normalise hydrogen position

    Dear David,

    I'm afraid the API does not currently support this feature.  If your search has constraints or measurements which depend on hydrogen positions, these will be taken from the structure without applying hydrogen position normalisation, so may well not find hits of interest.  If, on the other hand, your search is independent of the hydrogen bond positions, then the results of the search may be normalised after the search, when measurements may be taken, or constraints applied.

    For example, you could perform a search with rather looser distance constraints to a hydrogen, then for each hit, normalise the molecule's hydrogens, and test with a more accurate measurement.

    Here is a worked example, searching for COOH dimers:

    from ccdc import search
    searcher = search.SubstructureSearch()
    query = search.SubstructureQuery()
    sub1 = searcher.add_substructure(search.SMARTSSubstructure('C(=O)-O-H'))
    sub2 = searcher.add_substructure(search.SMARTSSubstructure('C(=O)-O-H'))

    Normally we would search for the hydrogen bonds using a vdw corrected distance of (-5, 0) but if we do not trust the positions of the hydrogens we can make the range larger:

    searcher.add_distance_constraint('Dist1', sub1, 1, sub2, 3, (-5, 5))
    searcher.add_distance_constraint('Dist1', sub2, 1, sub1, 3, (-5, 5))

    hits = searcher.search(max_hit_structures=10, max_hits_per_structure=1)

    This gives 10 hits.

    [h.identifier for h in hits]

    [u'AADAMC', u'ABADIS', u'ABADIS02', u'ABAFER', u'ABEKUO', u'ABENEB', u'ABEQUU', u'ABEXUD', u'ABIDAR', u'ABIDAR01']

    Now we can filter using the correct distances:

    from ccdc.descriptors import MolecularDescriptors

    def correct_distance(h):
        atoms = h.match_atoms()
        o1, h2 = atoms[1], atoms[7]
        o2, h1 = atoms[5], atoms[3]
        return (
            MolecularDescriptors.atom_distance(o1, h2) - o1.vdw_radius - h2.vdw_radius < 0 and
            MolecularDescriptors.atom_distance(o2, h1) - o2.vdw_radius - h1.vdw_radius < 0
        )

    hits = [h for h in hits if correct_distance(h)]

    And now we have 6 hits:

    [h.identifier for h in hits]

    [u'AADAMC', u'ABADIS', u'ABEKUO', u'ABENEB', u'ABEQUU', u'ABEXUD']

    I hope this is helpful.

    Best wishes

    Richard

     

     

     

     

     

  • RE: empty result for what seem a reasonable query

    Dear Pascal,

    I'm afraid the API does not support the 'Any' keyword to mean any bond.  Instead this becomes a specification for an explicit 'Unknown' bond.  This is an error, and will be rectified in the next release of the API.  Instead, use the form

    cps_substructure.add_bond(QueryBond(), c1, c2)

    which will do the right search.

    Best wishes

    Richard

  • RE: Filter components of molecules?

    Hi Ryan,

    the simplest way to do this is to use the attribute 'heaviest_component'.  This will work as long as the solvent molecules have molecular weights lower than that of the component of interest.

    All discrete components of the molecule are accessible via the attribute 'components'.  These will be sorted by molecular weight, so can be inspected to find the structure of interest.

    Hope this is clear and helpful.

    Best wishes

    Richard

     

  • RE: finding covalently bonded clusters in crystal structures

     HI Xiwen,

    You are right to remove the atoms with no fractional coordinates, a case I hadn't considered when writing the code.  This occurs where there is some disorder in the structure.

    When I run the code with VOLMAL and IXASUW I get results in under a minute, with the longest step being the calculation of to_be_bonded (around 35 - 40 seconds).  Where did you put your extra line?  I put mine in the loop which adds expansions:

        for mol in expansions:
            mol.remove_atoms(
                a for a in mol.atoms if (a.atomic_symbol, str(a.coordinates)) in present
            )
            mol.remove_atoms(
                a for a in mol.atoms if a.fractional_coordinates is None
            )
            atoms_not_of_interest=[
                a for a in mol.atoms
                if a.fractional_coordinates.x<0 or a.fractional_coordinates.x>2
                or a.fractional_coordinates.y<0 or a.fractional_coordinates.y>2
                or a.fractional_coordinates.z<0 or a.fractional_coordinates.z>2
            ]
            mol.remove_atoms(atoms_not_of_interest)
            present |= set((a.atomic_symbol, str(a.coordinates)) for a in mol.atoms)
            shell.add_molecule(mol)

    though it could be an extra case in the atoms_not_of_interest comprehension.

    By the way, mol.atoms.sort() won't do anything useful.  The atoms property of the molecule is calculated afresh every time you call mol.atoms.  Better would be to sort a list of atoms, for example:

    atoms_not_of_interest.sort(...)
    mol.remove_atoms(atoms_not_of_interest)

    I've attached the script I use for testing these things.  Can you have a look at it and see if I'm doing anything terribly wrong?

    Best wishes

    Richard

  • RE: finding covalently bonded clusters in crystal structures

     Hi Xiwen,

    I've had a look at the code, and found that it is the remove_atoms() calls that take a long time - around half an hour to remove 8000 atoms.  I think this is to do with reshuffling atom indices for every removal.  I have tried sorting the atoms in decreaising value if index, and that does make a huge difference (but is an ugly trick) so I've come up with a scheme to do the deletions on the smaller expansion molecules before appending to the shell:

        present = set()
        for mol in expansions:
            mol.remove_atoms(
                a for a in mol.atoms if (a.atomic_symbol, str(a.coordinates)) in present
            )
            atoms_not_of_interest=[
                a for a in mol.atoms
                if a.fractional_coordinates.x<0 or a.fractional_coordinates.x>2
                or a.fractional_coordinates.y<0 or a.fractional_coordinates.y>2
                or a.fractional_coordinates.z<0 or a.fractional_coordinates.z>2
            ]
            mol.remove_atoms(atoms_not_of_interest)
            present |= set((a.atomic_symbol, str(a.coordinates)) for a in mol.atoms)
            shell.add_molecule(mol)

    For LAQNOH this fragment runs in less than four seconds.

    Best wishes

    Richard

  • RE: finding covalently bonded clusters in crystal structures

    HI Xiwen,

    sorry about the delay in answering.  I didn't notice that you had replied.

    Could you please post the code that you are running so I can try to see what's going wrong with it?

    Thanks

    Richard

  • RE: finding covalently bonded clusters in crystal structures

    Hi Xiwen,

    This message occurs when an attempt is made to add a bond that has already been made in the molecule, so either the to_be_bonded_2 list contains duplicates, or it contains a pair that is already bonded in the shell molecule.  You can check for either case when adding the bonds, by adding them one-by-one with an appropriate check:

    for a, c in to_be_bonded_2:
        if a not in c.neighbours:
            shell.add_bond('Single', a, c)

    but it would be better to eliminate the duplicates when constructing this list.

    I'm afraid there was a bug in the function I wrote earlier,  which should read:

    def to_be_bonded(shell, distance, *atom_types):
        '''Pairs of atoms of the correct type with a distance criterion.'''
        return [
            (a, b) for a in shell.atoms for b in shell.atoms
            if set([a.atomic_symbol, b.atomic_symbol]) == set(atom_types)
            and a.index < b.index
            and a not in b.neighbours
            and MolecularDescriptors.atom_distance(a, b) <= distance
        ]


    to_be_bonded_1 = to_be_bonded(shell, 2.6, 'V', 'O')
    to_be_bonded_2 = to_be_bonded(shell, 2.8, 'O', 'Te')

    I have tested it this time!  This works correctly for AXOHOL, IKELOA, YOTHUM, VOLMAL and VOLMEP.

    Have a good weekend.

    Best wishes
    Richard

  • RE: finding covalently bonded clusters in crystal structures

     Dear Xiwen,

    it's a pleasure for me to be able to help people like you to do interesting research with the API.  There are a couple of things that need to be changed in the code:

    Firstly, we don't know in which order the bonds will be defined, so we have to check for both cases O-V and V-O.  This is most easily done with a set comparison.  It's probably worth writing a little function for this, to save copy and paste errors, and to make life easier if you wish to study other crystal structures:


    def to_be_bonded(shell, *atom_types, distance=2.8):
        '''Pairs of atoms of the correct type with a distance criterion.'''
        return [
            (a, c) for a in shell.atoms for b in shell.atoms
            if set([a.atomic_symbol, c.atomic_symbol]) < set(atom_types)
            and a.index < c.index
            and a not in c.neighbours
            and MD.atom_distance(a, c) <= distance
        ]

    to_be_bonded_1 = to_be_bonded(shell, 'V', 'O', distance=2.6)
    to_be_bonded_2 = to_be_bonded(shell, 'O', 'Te', distance=2.8)

    The last line was accidentally left in the code.  When I wrote and tested the code, to check that I had done everything correctly I wrote the molecules out so I could see them in mercury.  You are right, it was a MoleculeWriter:

    with io.MoleculeWriter('/tmp/VOTe.mol2') as writer:
        writer.write(shell)

    I find this useful when developing molecular or crystallographic code.


    Your second expression for the expansions will not apply any translations to the symmetry operators, so is more concisely and clearly written as:

    expansions = [axohol.symmetric_molecule(symmop) for symmop in axohol.symmetry_operators]

    For counting the atoms in the unit cell and the supercell it is probably easier
    and clearer to use a generator expression rather than a for loop:

    max_unit_cell = max(len(c.atoms) for c in unit_cell.components)
    min_unit_cell = min(len(c.atoms) for c in unit_cell.components)

    This, though, is purely a matter of taste.

    May I ask what is the nature of the research you are doing?  

    Best wishes,
    Richard


  • RE: finding covalently bonded clusters in crystal structures

     Hi Xiwen,

    I think the approach I would take here is to identify the atoms of interest in the unit cell, remove those not of interest and place the molecule back into the crystal:

    For definiteness, I'll write an example from the CSD refcode AXOHOL, which is a Vanadium-Tellurium-Oxygen polymeric structure.  I've put extensive comments on the lines of code, so I hope everything will be clear.

    from ccdc import io # import the input-output module
    csd = io.EntryReader('csd') # This is the whole, current CSD
    axohol = csd.crystal('AXOHOL') # This pulls out the crystal given its refcode.  

    unit_mol = axohol.molecule
    heaviest = unit_mol.heaviest_component # This strips out the solvent molecule

    axohol.molecule = heaviest # Replace the crystal's molecule with the desolvated molecule

    Now we can make the symmetry expanded representation:

    from ccdc import molecule
    shell = molecule.Molecule('SHELL') # Create a new, empty molecule

    expansions = [
        axohol.symmetric_molecule(symmop,(i,j,k))
        for symmop in axohol.symmetry_operators
        for i in range(0,2)
        for j in range(0,2)
        for k in range(0,2)
    ] # This is exactly as you wrote

    Now add the molecules to the shell:

    for mol in expansions:
        shell.add_molecule(mol)

    Now the shell contains all the atoms of the super-cell.  However, because there may be atoms lying in special postions, which is the case with AXOHOL, we must take care to remove duplicate atoms from the super-cell.  We will determine uniquness by atoms of the same element lying in the same position:

    from collections import defaultdict
    unique_dict = defaultdict(list)
    for a in shell.atoms:
        unique_dict[(a.atomic_symbol, str(a.coordinates))].append(a) # I use str() on the coordinates to avoid rounding errors failing to catch duplicates

    Now any we can remove the duplicates:

    shell.remove_atoms(
        a for l in unique_dict.values() for a in l[1:]
    )

    Now we can make any extra bonds according to your criterion.  We could remove all bonds first, but I think what is more reasonable is to create a bond between the symmetrically related components, where there are two atoms close enough.

    from ccdc.descriptors import MolecularDescriptors

    to_be_bonded = [
        (a, b) for a in shell.atoms for b in shell.atoms
        if a.index < b.index
        and a not in b.neighbours
        and MolecularDescriptors.atom_distance(a, b) <= 2.21 # This is the ideal bond length for V-O bonds
    ]


    shell.add_bonds(('Single', a, b) for a, b in to_be_bonded)

    And now we can get the heaviest component of the super-cell:

    super_heaviest = shell.heaviest_component

    writer.write(shell)

    Please let me know if this works for you, and do not hesitate to come back with further questions.

    Best wishes
    Richard