building enzyme centric metabolic network
2
0
Entering edit mode
anupam sinha ▴ 270
@anupam-sinha-3207
Last seen 10.2 years ago
Hi all, I am working on the evolutionary aspects of metabolic net works(enzyme centric). This network will consists of nodes that are enzymes and any two enzymes are linked if they share a metabolite (or in the same pathway). Can anyone suggest me a way to build these networks using KEGGgraph or KEGGSOAP packages ? Thanks in advance. Regards, Anupam Sinha [[alternative HTML version deleted]]
Network KEGGSOAP Network KEGGSOAP • 1.5k views
ADD COMMENT
0
Entering edit mode
@saroj-mohapatra-1446
Last seen 10.2 years ago
Hi Anupam: Using your second criterion, i.e., two enzymes are linked if they are in the same pathway, one strategy would be to use the information from org. packages for creating a network for each species. # For mouse, use the package org.Mm.eg.db to get the list of all entrez ids, associated KEGG id library("org.Mm.eg.db") egKEGGids = as.list(org.Mm.egPATH) # many entrez ids have no KEGG id; discard these sumis.na(egKEGGids)) egKEGGids = egKEGGids[!is.na(egKEGGids)] # For all possible entrez id pairs, find a KEGG id; if not available, drop that pair keggnet = matrix(ncol=3, nrow=0) # Empty matrix with 3 columns colnames(keggnet) = c("EG1", "EG2", "KEGG") mycounter = 1 # to count the current entrez id pair numEg = length(egKEGGids) # number of entrez ids for(i in 1:(numEg-1)) { for(j in (i+1):numEg) { eg1 = names(egKEGGids)[i] eg2 = names(egKEGGids)[j] kegg1 = as.character(unlist(egKEGGids[i])) # kegg id for the first gene kegg2 = as.character(unlist(egKEGGids[j])) # kegg id for the 2nd gene common = intersect(kegg1, kegg2) # common between two numCommon = length(common) # how many shared if(numCommon>0) { for(k in 1:numCommon) { keggnet = rbind(keggnet, c(eg1, eg2, common[k])) cat(i, "\t", j, "\t", numEg,"\t | ") cat(eg1,"\t", eg2, "\t", common[k], "\n") # monitor progress mycounter = mycounter+1 # move the counter } } } } # now write the data to a file for future use write.table(keggnet, file="mouse_KEGG_network.txt", col.names=T, row.names=F, sep="\t", quote=F) The resulting file has three columns for two entrez ids (EG1, EG2) and KEGG (ID) shared by the two entrez ids. There might be a faster way of using it though (in stead of the for loops). Also, there might already be such networks available within bioconductor. I imagine that you would need one such file for each species of your interest (to study "evolutionary aspects"). Best, Saroj ----- Original Message ----- From: "anupam sinha" <anupam.contact@gmail.com> To: Bioconductor at stat.math.ethz.ch Sent: Sunday, April 19, 2009 4:35:12 AM GMT -05:00 US/Canada Eastern Subject: [BioC] building enzyme centric metabolic network Hi all, I am working on the evolutionary aspects of metabolic net works(enzyme centric). This network will consists of nodes that are enzymes and any two enzymes are linked if they share a metabolite (or in the same pathway). Can anyone suggest me a way to build these networks using KEGGgraph or KEGGSOAP packages ? Thanks in advance. Regards, Anupam Sinha [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Computational Biology VBI @ Virginia Tech
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 4 months ago
United States
Hi Saroj -- All those 'for' loops and the idea that you'd have to 'monitor progress' make me think there's a better way to do this. Without commenting on the merits of the approach, if the goal is to get an 'incidence matrix' listing which genes (rows) appear in which pathways (columns) I might load the library library(org.Mm.eg.db) get a data frame of gene and path ids (probably there's a more AnnotationDbi way of doing this...) tbl <- toTable(org.Mm.egPATH) get the unique gene names ugenes <- unique(tbl$gene_id) For each path_id, create a logical vector with 'TRUE' when a gene_id appears in the path. This is a little tricky here; get("%in%") gets the "%in%" function, usually one could just use a function name; x=ugenes provides the 'x' (first) argument to %in%, forcing the genes in each pathway into the 'table' (second) argument to %in%, ordering ugenes %in% genes means that every pathway ends up with an identical length and order logical vector occur <- with(tbl, tapply(gene_id, path_id, get("%in%"), x=ugenes)) convert the result into a matrix by unlisting the tapply result m <- matrix(unlist(occur), ncol=dim(occur), dimnames=list(gene=ugenes, path=names(occur))) and voila > m[1:5,1:10] path gene 00010 00020 00030 00031 00040 00051 00052 00053 00061 00062 11298 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 11303 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 11304 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 11305 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 11306 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE some sanity checks > dim(tbl) ## 11039 gene / path combinations [1] 11039 2 > sum(m) [1] 11039 > colSums(m)[1:5] ## number of genes in the first 5 pathways 00010 00020 00030 00031 00040 54 27 25 2 22 > nrow(toTable(revmap(org.Mm.egPATH)["00010"])) [1] 54 > nrow(toTable(revmap(org.Mm.egPATH)["00020"])) [1] 27 > max(rowSums(m)) # maximum paths a gene belongs to [1] 31 > which.max(rowSums(m)) # gene 26413, row 2354 26413 2354 > nrow(toTable(org.Mm.egPATH["26413"])) [1] 31 It would be possible to change the structure of m into a more space-efficient one, but as rowSums and colSums indicate above it's very useuful in its current form. Martin Saroj K Mohapatra <smohapat at="" vbi.vt.edu=""> writes: > Hi Anupam: > > Using your second criterion, i.e., two enzymes are linked if they are in the same pathway, one strategy would be to use the information from org. packages for creating a network for each species. > > # For mouse, use the package org.Mm.eg.db to get the list of all entrez ids, associated KEGG id > library("org.Mm.eg.db") > egKEGGids = as.list(org.Mm.egPATH) > > # many entrez ids have no KEGG id; discard these > sumis.na(egKEGGids)) > egKEGGids = egKEGGids[!is.na(egKEGGids)] > > # For all possible entrez id pairs, find a KEGG id; if not available, drop that pair > keggnet = matrix(ncol=3, nrow=0) # Empty matrix with 3 columns > colnames(keggnet) = c("EG1", "EG2", "KEGG") > mycounter = 1 # to count the current entrez id pair > numEg = length(egKEGGids) # number of entrez ids > for(i in 1:(numEg-1)) { > for(j in (i+1):numEg) { > eg1 = names(egKEGGids)[i] > eg2 = names(egKEGGids)[j] > kegg1 = as.character(unlist(egKEGGids[i])) > # kegg id for the first gene > kegg2 = as.character(unlist(egKEGGids[j])) > # kegg id for the 2nd gene > common = intersect(kegg1, kegg2) # common between two > numCommon = length(common) # how many shared > if(numCommon>0) { > for(k in 1:numCommon) { > keggnet = rbind(keggnet, c(eg1, eg2, common[k])) > cat(i, "\t", j, "\t", numEg,"\t | ") > cat(eg1,"\t", eg2, "\t", common[k], "\n") > # monitor progress > mycounter = mycounter+1 # move the counter > } > } > } > } > > # now write the data to a file for future use > write.table(keggnet, file="mouse_KEGG_network.txt", > col.names=T, row.names=F, sep="\t", quote=F) > > The resulting file has three columns for two entrez ids (EG1, EG2) and KEGG (ID) shared by the two entrez ids. There might be a faster way of using it though (in stead of the for loops). Also, there might already be such networks available within bioconductor. > > I imagine that you would need one such file for each species of your interest (to study "evolutionary aspects"). > > Best, > > Saroj > > ----- Original Message ----- > From: "anupam sinha" <anupam.contact at="" gmail.com=""> > To: Bioconductor at stat.math.ethz.ch > Sent: Sunday, April 19, 2009 4:35:12 AM GMT -05:00 US/Canada Eastern > Subject: [BioC] building enzyme centric metabolic network > > Hi all, > I am working on the evolutionary aspects of metabolic net > works(enzyme centric). This network will consists of nodes that are enzymes > and any two enzymes are linked if they share a metabolite (or in the same > pathway). Can anyone suggest me a way to build these networks using > KEGGgraph or KEGGSOAP packages ? Thanks in advance. > > Regards, > > Anupam Sinha > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
Hi Martin: Thanks a lot. This _is_ a much better way of creating the incidence matrix. Saroj Martin Morgan wrote: > Hi Saroj -- > > All those 'for' loops and the idea that you'd have to 'monitor > progress' make me think there's a better way to do this. Without > commenting on the merits of the approach, if the goal is to get an > 'incidence matrix' listing which genes (rows) appear in which pathways > (columns) I might > > load the library > > library(org.Mm.eg.db) > > get a data frame of gene and path ids (probably there's a more > AnnotationDbi way of doing this...) > > tbl <- toTable(org.Mm.egPATH) > > get the unique gene names > > ugenes <- unique(tbl$gene_id) > > For each path_id, create a logical vector with 'TRUE' when a gene_id > appears in the path. This is a little tricky here; get("%in%") gets > the "%in%" function, usually one could just use a function name; > x=ugenes provides the 'x' (first) argument to %in%, forcing the genes > in each pathway into the 'table' (second) argument to %in%, ordering > ugenes %in% genes means that every pathway ends up with an identical > length and order logical vector > > occur <- with(tbl, tapply(gene_id, path_id, get("%in%"), x=ugenes)) > > convert the result into a matrix by unlisting the tapply result > > m <- matrix(unlist(occur), ncol=dim(occur), > dimnames=list(gene=ugenes, path=names(occur))) > > and voila > > >> m[1:5,1:10] >> > path > gene 00010 00020 00030 00031 00040 00051 00052 00053 00061 00062 > 11298 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > 11303 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > 11304 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > 11305 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > 11306 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > > some sanity checks > > >> dim(tbl) ## 11039 gene / path combinations >> > [1] 11039 2 > >> sum(m) >> > [1] 11039 > > >> colSums(m)[1:5] ## number of genes in the first 5 pathways >> > 00010 00020 00030 00031 00040 > 54 27 25 2 22 > >> nrow(toTable(revmap(org.Mm.egPATH)["00010"])) >> > [1] 54 > >> nrow(toTable(revmap(org.Mm.egPATH)["00020"])) >> > [1] 27 > > >> max(rowSums(m)) # maximum paths a gene belongs to >> > [1] 31 > >> which.max(rowSums(m)) # gene 26413, row 2354 >> > 26413 > 2354 > >> nrow(toTable(org.Mm.egPATH["26413"])) >> > [1] 31 > > It would be possible to change the structure of m into a more > space-efficient one, but as rowSums and colSums indicate above it's > very useuful in its current form. > > Martin > > > Saroj K Mohapatra <smohapat at="" vbi.vt.edu=""> writes: > > >> Hi Anupam: >> >> Using your second criterion, i.e., two enzymes are linked if they are in the same pathway, one strategy would be to use the information from org. packages for creating a network for each species. >> >> # For mouse, use the package org.Mm.eg.db to get the list of all entrez ids, associated KEGG id >> library("org.Mm.eg.db") >> egKEGGids = as.list(org.Mm.egPATH) >> >> # many entrez ids have no KEGG id; discard these >> sumis.na(egKEGGids)) >> egKEGGids = egKEGGids[!is.na(egKEGGids)] >> >> # For all possible entrez id pairs, find a KEGG id; if not available, drop that pair >> keggnet = matrix(ncol=3, nrow=0) # Empty matrix with 3 columns >> colnames(keggnet) = c("EG1", "EG2", "KEGG") >> mycounter = 1 # to count the current entrez id pair >> numEg = length(egKEGGids) # number of entrez ids >> for(i in 1:(numEg-1)) { >> for(j in (i+1):numEg) { >> eg1 = names(egKEGGids)[i] >> eg2 = names(egKEGGids)[j] >> kegg1 = as.character(unlist(egKEGGids[i])) >> # kegg id for the first gene >> kegg2 = as.character(unlist(egKEGGids[j])) >> # kegg id for the 2nd gene >> common = intersect(kegg1, kegg2) # common between two >> numCommon = length(common) # how many shared >> if(numCommon>0) { >> for(k in 1:numCommon) { >> keggnet = rbind(keggnet, c(eg1, eg2, common[k])) >> cat(i, "\t", j, "\t", numEg,"\t | ") >> cat(eg1,"\t", eg2, "\t", common[k], "\n") >> # monitor progress >> mycounter = mycounter+1 # move the counter >> } >> } >> } >> } >> >> # now write the data to a file for future use >> write.table(keggnet, file="mouse_KEGG_network.txt", >> col.names=T, row.names=F, sep="\t", quote=F) >> >> The resulting file has three columns for two entrez ids (EG1, EG2) and KEGG (ID) shared by the two entrez ids. There might be a faster way of using it though (in stead of the for loops). Also, there might already be such networks available within bioconductor. >> >> I imagine that you would need one such file for each species of your interest (to study "evolutionary aspects"). >> >> Best, >> >> Saroj >> >> ----- Original Message ----- >> From: "anupam sinha" <anupam.contact at="" gmail.com=""> >> To: Bioconductor at stat.math.ethz.ch >> Sent: Sunday, April 19, 2009 4:35:12 AM GMT -05:00 US/Canada Eastern >> Subject: [BioC] building enzyme centric metabolic network >> >> Hi all, >> I am working on the evolutionary aspects of metabolic net >> works(enzyme centric). This network will consists of nodes that are enzymes >> and any two enzymes are linked if they share a metabolite (or in the same >> pathway). Can anyone suggest me a way to build these networks using >> KEGGgraph or KEGGSOAP packages ? Thanks in advance. >> >> Regards, >> >> Anupam Sinha >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >
ADD REPLY

Login before adding your answer.

Traffic: 601 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6