• RE: finding covalently bonded clusters in crystal structures

    Dear Xiwen,

    I'm not entirely sure what you mean by 'covalently bonded clusters', so forgive me if I have gone down the wrong track in what follows, and please post some explanation of the term.

    In the API we have a number of methods for finding covalently bonded structures from a crystal.  For example,

    from ccdc import io
    csd = io.EntryReader('csd')
    crystal = csd.crystal('ABACUF')
    molecule = crystal.molecule

    Here the molecule derived from the crystal is all covalently bonded, but in the presence of solvents, ions or co-crystals the resulting molecule will form more than one discretely bonded structure.  These can be separated using the molecule's component attribute:

    components = molecule.components
    heaviest = components[0]

    It is also possible to extract the moiety lying in the asymetric unit, i.e the minimal representation of the crystal:

    asymm_mol = crystal.asymmetric_unit_molecule

    We can build larger representations of the molecule by applying the crystal's symmetry operators:

    from ccdc.molecule import Molecule
    shell = Molecule('SHELL')
    for symmop in crystal.symmetry_operators:
        shell.add_molecule(crystal.symmetric_molecule(symmop))

    And again, if these form non-covalently bonded structures, the heaviest component can be extracted from this shell molecule:

    heaviest_of_shell = shell.heaviest_component

    The bonds of these various molecules have been deduced from the crystallographic data and are generally considered to be very good.  If, however, you wish to use a bespoke distance range to define your clusters, then you will have to analyse the lengths of atom-atom distances according to your criterion.  If this is what you wish to do, please post again and I will post an example for you to work with.

    I hope this is helpful.

    Best wishes

    Richard

  • RE: Z/Density query in Python

    Hi Jan-Joris,

    I'm afraid we have not implemented this sort of search in the API.  The rather clumsy work-around is to scan the whole CSD with the required criteria, for example:

    from ccdc import io

    csd = io.CrystalReader('csd')
    refcodes = []
    for c in csd:
        if c.z_value == required_z and c.z_prime == required_z_prime and c.calculated_density is not None and low_density < c.calculated_density < high_density:
            m = c.molecule
            if len(m.components) == required_components:
                atoms_with_sites = [a for a in m.atoms if a.coordinates is not None]
                if len(atoms_with_sites) == required_sited_atoms:
                    refcodes.append(c.identifier)

    with open('hit_structures.gcd', 'w') as writer:
        writer.write('\n'.join(refcodes))
        writer.write('\n')

    I'm sorry that there is nothing faster available in the API: I shall raise this at the next scheduling meeting to see whether it can be implemented for a future release.

    Thank you for the question: it is feedback like this that helps to make the API better.

    Best wishes
    Richard

  • RE: H-bond propensities

    Hi Geoff,

    the HBond propensity module has not yet been released, and is currently being implemented.  There will need to be substantial scientific and usability testing before it is ready for release, and so I'm afraid I have no release date for it.

    Best wishes

    Richard

  • RE: Combine queries?

     HI Pascal,

    I'm afraid the substructure search over the list of identifiers will be slow, since it will not be able to make use of the screens for the database and so will attempt the substructure match on each of the 39000 structures.

    You may find it faster in this case to perform both searches on the full database and combine them by hand:

    substructure_hits = substructure_search.search()
    text_hits = text_numeric_search.search()
    text_ids = set(h.identifier for h in text_hits)
    both_hits = [h for h in substructure_hits if h.identifier in text_ids]

    Let me know if this is better.

    Best wishes
    Richard

     

  • RE: Combine queries?

     Hi Pascal,

    yes, all the searches except TextNumericSearch will accept a list of identifiers, a molecule, a crystal, an entry or a file.  By default it will search the CSD.  You will find it faster to do the TextNumericSearch first, since that is much faster than the substructure search.

    Best wishes
    Richard

  • RE: Combine queries?

    Hi Pascal,

    It is not currently possible to make a combined search directly, though this is under consideration for a future release.  Instead you can perform a second search on the results of the first search.  For example:

    text_hits = text_numeric_search.search()
    # this gives 3234 hits
    smarts_hits = s.search([h.identifier for h in text_hits])
    # This gives 231 hits, of which 112 are different structures

    I hope this is helpful.  Please get back in touch if anything is unclear.

    Best wishes
    Richard

     

     

  • RE: calculating atom distance across the unit cell boundary

     Hi Xiwen,

    you can generate molecules outside the unit cell with the translate parameter of the symmetric_molecule() method if you want explicit control of the symmety operator, or you can generate expanded representations of the crystal through methods such as packing_shell() and molecular_shell().  So, for example, you could use the molecular shell:

    mol = crystal.molecular_shell()
    atoms_of_interest = [a for a in mol.atoms if a.label == 'Te1']
    min_dist = min(MolecularDescriptors.atom_distance(a, b) for a in atoms_of_interest for b in atoms_of_interest if a != b)

    or by hand you can calculate all translated symmetric molecules:

    expansions = [
        crystal.symmetric_molecule(symmop, (i, j, k))
        for symmop in cry.symmetry_operators
        for i in range(-2, 3)
        for j in range(-2, 3)
        for k in range(-2, 3)
    ]

    min_dist = min(
        MolecularDescriptors.atom_distance(expansions[i].atom('Te1'), expansions[j].atom('Te1'))
        for i in range(len(expansions)) for j in range(len(expansions))
        if i != j
    )

    Is either of these solutions what you are after?

    Best wishes
    Richard

     

     

  • RE: calculating atom distance across the unit cell boundary

    Hi Xiwen,

    for this I think you will need to know the symmetry operators of the crystal, and to know which of the symmetry operators you are interested in using.  For example,

    from ccdc import io, descriptors
    csd = io.EntryReader('csd')
    crystal = csd.crystal('AABHTZ')
    base_mol = crystal.molecule
    symm_mol = crystal.symmetric_molecule(crystal.symmetry_operators[1])
    print descriptors.MolecularDescriptors.atom_distance(base_mol.atom('N2'), symm_mol.atom('N2')

    9.320

    Hope this is helpful; please let me know if you would like any more help.

    Best wishes
    Richard

     

     

  • RE: Strange results from TextNumericSearch

     Thanks for the suggestions, Paul, I'll certainly consider them for the next release.

    Best wishes
    Richard

     

  • RE: Strange results from TextNumericSearch

    Hi Paul,

    firstly you are right to spot that TextNumericSearch doesn't take a settings parameter.  I shall fix this for the next release.

    Secondly, you can exclude entries with specific elements using the search_settings class:

    search_settings.must_not_have_elements = [
        'Ar', 'K', ...
    ]

    or, programatically:
    from ccdc.molecule import Atom
    ats = []
    for i in range(18, 93):
        ats.append(Atom())
        ats[-1].atomic_number = i
    search_settings.must_not_have_elements = [a.atomic_symbol for a in ats]


    This is slightly clumsy because there is no atomic_number keyword for Atom creation.

    Thirdly you don't have to use a bogus search to extract all entries of a database matching specific search criteria.  You can set up the search_settings as above, then iterate over the csd:

    from ccdc import io
    csd = io.EntryReader('csd')
    for e in csd:
        if search_settings.test(e):
            do_something_interesting_with_entry(e)

    Fourthly the year range is a pair of numbers, interpreted as an inclusive range, rather than the list of values you have given.  The first two values of the range have been used as the inclusive range, so you are getting hits from 1970-1971.  The query should be written:

    search.add_citation(year=(1970, 2017))

    It would have been helpful if the API had made this clear.

    Lastly, it is perhaps counter-intuitive that a TextNumericSearch with no criteria returns no hits.  It would probably be better to raise an exception as the other classes do.  I shall consider this for the next release.


    Thank you for your questions; it is feedback like this that helps me to make the API better.

    Best wishes

    Richard