Hi all,

Absolutely new to CSD and I'm pretty overwhelmed by the amount of information available so I'm hoping I can get some advice here. I have a list of 22,000 small molecules, ~9,000 of those are going to be drugs, and the rest are probably peptide-like small molecules. 

I was hoping I could download the 3D structure of all these molecules which are available on the CSD. Those not available I can go from SMILES->3D and then low-level QM calculations (although at the risk of getting incorrect isomers). Hopefully, there won't be many that aren't available.

Is it possible to download structures? 

Thanks

Anthony 

Dear Anthony,

I'm very sorry about the long delay in getting back to you.

It sounds as if you need to do a substructure search on the CSD which for 22,000 structures is going to take some time.

This script will do something like what you want.  It will process your target file, search the CSD for structures containing your target molecule and will write the hits to a results file.  The hits will contain matches where your target molecule is a proper substructure of the matched structure; if you need exact matches, then please let me know.

--


import time

from ccdc import io, search, utilities

target_file = '/path/to/your/file/of/structures.mol2'
hit_file = '/path/to/where/you/want/to/put/results.gcd'

def one_search(m):
    searcher = search.SubstructureSearch()
    searcher.add_substructure(search.MoleculeSubstructure(m))
    hits = searcher.search()
    print 'CSD contained %d hits for %s' % (len(hits), m.identifier)
    return [h.identifier for h in hits]

reader = io.MoleculeReader(target_file)
start = time.time()
with open(hit_file, 'w') as gcd_writer:
    for i, m in enumerate(reader):
        gcd_writer.write('\n'.join(one_search(m)))
        utilities.Timer.progress(start, i+1, len(reader), '')
--

As I say, 20,000 structures is a lot of searching.  It may be advisable to split the input file into smaller chunks and process them separately.

Once again, I am terribly sorry about the delay in answering.

If you need any more help, please get back in touch.

Best wishes

Richard

You must be signed in to post in this forum.