I'm very sorry about the long delay in getting back to you.
It sounds as if you need to do a substructure search on the CSD which for 22,000 structures is going to take some time.
This script will do something like what you want. It will process your target file, search the CSD for structures containing your target molecule and will write the hits to a results file. The hits will contain matches where your target molecule is a proper substructure of the matched structure; if you need exact matches, then please let me know.
from ccdc import io, search, utilities
target_file = '/path/to/your/file/of/structures.mol2'
hit_file = '/path/to/where/you/want/to/put/results.gcd'
searcher = search.SubstructureSearch()
hits = searcher.search()
print 'CSD contained %d hits for %s' % (len(hits), m.identifier)
return [h.identifier for h in hits]
reader = io.MoleculeReader(target_file)
start = time.time()
with open(hit_file, 'w') as gcd_writer:
for i, m in enumerate(reader):
utilities.Timer.progress(start, i+1, len(reader), '')
As I say, 20,000 structures is a lot of searching. It may be advisable to split the input file into smaller chunks and process them separately.
Once again, I am terribly sorry about the delay in answering.
If you need any more help, please get back in touch.