Question

MCS & ChemineR

0

Entering edit mode

Thomas Girke ★ 1.7k

@thomas-girke-993

Last seen 4 weeks ago

United States

Dear David, Below is some code that addresses most of your needs as I understand them, especially under (B). Please note, the alternative solutions an MCS/FMCS search can return are alternative optimal solutions of identical size, and not suboptimal solutions of variable size. All those solutions are stored in the MCS object class and their structures (SDFset connection tables) can be returned with the mcs2sdfset function. The usage of the MCS class is explained in section 5.3 of the fmcsR vignette. An important consideration of your approach is the number of compounds you are trying to analyze. It should work fine if you deal with a few hundred to a few thousand compounds but it would be computationally very challenging if it were many thousands compounds, which is mainly due to the computational complexity of the MCS/FMCS search process (NP- complete problem). In addition, before you construct your network you may want think about a reasonable way to reduce the redundancy in the MCS/FMCS results which will be quite pronounced. Clustering based on atom pair fingerprint or MCS similarity may be an option as shown under (B2). Best, Thomas library(ChemmineR) library(fmcsR) ## Import sample compound set provided by ChemmineR library ## To import your own compounds use the read.SDFset function data(sdfsample) sdf <- sdfsample[1:6] # Sample set with 6 compound only ## (A.1) Compute similarity matrix with sim <- sapply(cid(sdf), function(x) fmcsBatch(sdf[x], sdf, au=0, bu=0)[,"Tanimoto_Coefficient"]) sim CMP1 CMP2 CMP3 CMP4 CMP5 CMP6 CMP1 1.0000000 0.2291667 0.2040816 0.1607143 0.3333333 0.3333333 CMP2 0.2291667 1.0000000 0.3000000 0.3181818 0.2564103 0.2500000 CMP3 0.2040816 0.3000000 1.0000000 0.1836735 0.2250000 0.3636364 CMP4 0.1607143 0.3181818 0.1836735 1.0000000 0.1956522 0.2439024 CMP5 0.3333333 0.2564103 0.2250000 0.1956522 1.0000000 0.3548387 CMP6 0.3333333 0.2500000 0.3636364 0.2439024 0.3548387 1.0000000 ## (A.2) Hierarchical clustering with MCS scores as similarity/distance matrix d <- 1-sim hc <- hclust(as.dist(1-d), method="complete") plot(as.dendrogram(hc), edgePar=list(col=4, lwd=2), horiz=TRUE) ## (B.1) Function to obtain MCS/FMCS for all compound pairs in a library allpairMCS <- function(sdf) { sdfset <- SDFset() for(i in cid(sdf)){ for(j in cid(sdf)){ f <- mcs2sdfset(fmcs(sdf[j], sdf[i]))[[1]] cid(f) <- paste(i, j, gsub("^.*_fmcs", "fmcs", cid(f)), sep="_") sdfset <- c(sdfset, f) } } return(sdfset) } allmcs <- allpairMCS(sdf=sdf) cid(allmcs)[1:12] [1] "CMP1_CMP1_fmcs_1" "CMP1_CMP2_fmcs_1" "CMP1_CMP3_fmcs_1" "CMP1_CMP3_fmcs_2" [5] "CMP1_CMP3_fmcs_3" "CMP1_CMP4_fmcs_1" "CMP1_CMP4_fmcs_2" "CMP1_CMP4_fmcs_3" [9] "CMP1_CMP4_fmcs_4" "CMP1_CMP4_fmcs_5" "CMP1_CMP4_fmcs_6" "CMP1_CMP4_fmcs_7" ## (B.2) Cluster MCS/FMCS result to identify/eliminate identical or very similar ones ## This could be done again using the FMCS algorithm or atom pair fingerprints sim <- sapply(cid(allmcs), function(x) fmcsBatch(allmcs[x], allmcs, au=0, bu=0)[,"Tanimoto_Coefficient"]) as.dist(sim[1:4,1:4]) MP1_CMP1_fmcs_1 CMP1_CMP2_fmcs_1 CMP1_CMP3_fmcs_1 CMP1_CMP2_fmcs_1 0.3333333 CMP1_CMP3_fmcs_1 0.3030303 0.4000000 CMP1_CMP3_fmcs_2 0.3030303 0.4000000 1.0000000 On Fri, Mar 15, 2013 at 03:14:29PM +0000, David Covell wrote: > Dear Thomas, > > As a follow-up to my previous e-mail, I would like to ask your advice about > using ChemmineR to complete this task. The goal is to generate all- to-all > MCS that include all alternative MCS's for a set of compounds. In order to do > this correctly, I have attached a test set of 50 compounds. > > First, from the ChemmineR Manual, I believe the all-to-all MCS > calculation should > have a form similar to that used for calculating a distance matrix > > > dist <- sapply(cid(sdf), function(x) > + fmcsBatch(sdf[x], sdf, au=0, bu=0)[,"Overlap_Coefficient"]) > > However, I am not able to determine the proper syntax for this > calculation. Is there > a compact way to do this? > > Second, I am not sure how to get the alternative MCS solutions. Can you > provide some suggestions. I believe the best MCS is often selected, however, > all alterantives would be needed here. > > Note that my goal is to generate a map that associates all MCSs to each test > compound. These associations will be used to generate a network interaction > map with the edges weighted according to compound bioactivity in a molecular > screen. > > Thanks for your help, > Sincerely, > David > > >

Network Clustering PROcess ChemmineR Network Clustering PROcess ChemmineR • 1.1k views

ADD COMMENT • link 11.1 years ago Thomas Girke ★ 1.7k