How to remove HCL mols in Chemminer
2
0
Entering edit mode
Anthony Nash ▴ 20
@anthony-nash-17385
Last seen 5.0 years ago
University of Oxford

I am pulling down an SDFset of drugs from PubChem using the drug CIDs. There isn't an absolute match between our experimental drug list and those on PubChem. They differ by the include/exclusion of HCL molecules. For now, we would like to do maximum similarity matches between the root drug i.e., without the HCL mol. Is there any way of removing the HCL molecules from each SDF entry in R?

ChemmineR Chemistry • 1.1k views
ADD COMMENT
1
Entering edit mode
Thomas Girke ★ 1.7k
@thomas-girke-993
Last seen 7 days ago
United States

Dear Anthony,

Such a feature isn't currently available, at least not out of the box. Since it sounds useful, we could add it to our to-do list and let you know once it becomes available. What could help is sending me an example of a few SDFs containing the HCl contaminations you would like to remove, or even better just the download code for PubChem you are using from R with ChemmineR to generate the corresponding SDFset object.

Best,

Thomas

ADD COMMENT
0
Entering edit mode

[In place of my earlier email]

Hi Thomas, I would be happy to send across the code once I finished compiling my CID list. I'll post a small code snippet here.

I am very new to cheminformatics; in your experience, would removing dihydro/hydro/chlorides, bromides etc., from an SDF entry within R be difficult? Is the SDF format logical in terms of how two separate molecules are organised?

ADD REPLY
0
Entering edit mode

Some CIDs with a mismatch of additional compounds e.g., dihydrochloride, bromide, etc. 6434889 1979 6420038 5281082 245005 5763 76971380 102678 123606 441325 11065 60496 2170 9301 60795 2247 54360 5362123 23705 8478 7699 2346 50088 12456 5702220 31100 2448 64737 2480 23724817 5831 90010 40127 2585 23649704 10206 9571016 71821 80311 5282478

ADD REPLY
1
Entering edit mode

I will add this to the to-do list and report back when it is available.

ADD REPLY
0
Entering edit mode

That's great! I'll be using your excellent ChemmineR package quite a lot over the next 6 months. If I find anything else, I'll post it to Bioconductor support forums (in case it's a likely oversight on my behalf).

ADD REPLY
0
Entering edit mode
khoran • 0
@khoran-10774
Last seen 3.5 years ago

There is a new function in the development version ( 3.35.6) called "largestComponent", which will take an SDFSet and return a new SDFSet with each SDF containing only the largest connected component within the original SDF file. All other components are removed from each SDF object. Hopefully this will help with your task.

ADD COMMENT

Login before adding your answer.

Traffic: 771 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6