I am pulling down an SDFset of drugs from PubChem using the drug CIDs. There isn't an absolute match between our experimental drug list and those on PubChem. They differ by the include/exclusion of HCL molecules. For now, we would like to do maximum similarity matches between the root drug i.e., without the HCL mol. Is there any way of removing the HCL molecules from each SDF entry in R?
Such a feature isn't currently available, at least not out of the box. Since it sounds useful, we could add it to our to-do list and let you know once it becomes available. What could help is sending me an example of a few SDFs containing the HCl contaminations you would like to remove, or even better just the download code for PubChem you are using from R with ChemmineR to generate the corresponding SDFset object.
There is a new function in the development version ( 3.35.6) called "largestComponent", which will take an SDFSet and return a new SDFSet with each SDF containing only the largest connected component within the original SDF file. All other components are removed from each SDF object. Hopefully this will help with your task.