I am pulling down an SDFset of drugs from PubChem using the drug CIDs. There isn't an absolute match between our experimental drug list and those on PubChem. They differ by the include/exclusion of HCL molecules. For now, we would like to do maximum similarity matches between the root drug i.e., without the HCL mol. Is there any way of removing the HCL molecules from each SDF entry in R?
[In place of my earlier email]
Hi Thomas, I would be happy to send across the code once I finished compiling my CID list. I'll post a small code snippet here.
I am very new to cheminformatics; in your experience, would removing dihydro/hydro/chlorides, bromides etc., from an SDF entry within R be difficult? Is the SDF format logical in terms of how two separate molecules are organised?
Some CIDs with a mismatch of additional compounds e.g., dihydrochloride, bromide, etc. 6434889 1979 6420038 5281082 245005 5763 76971380 102678 123606 441325 11065 60496 2170 9301 60795 2247 54360 5362123 23705 8478 7699 2346 50088 12456 5702220 31100 2448 64737 2480 23724817 5831 90010 40127 2585 23649704 10206 9571016 71821 80311 5282478
I will add this to the to-do list and report back when it is available.
That's great! I'll be using your excellent ChemmineR package quite a lot over the next 6 months. If I find anything else, I'll post it to Bioconductor support forums (in case it's a likely oversight on my behalf).