Question: "topGOdata" object: How to supply gene scores with a predefined list of genes
0
9.7 years ago by
Scott Ochsner300
Scott Ochsner300 wrote:
Hi, I would like to attach gene "score" info to a predefined list of interesting genes to generate a topGOdata object. The predefined list of genes was obtained by: > library(limma) > library(topGO) > input<-cbind(FC=fit$coefficients[,1],pval=p.adjust(fit$p.value[,1],met ho d="BH")) > selectFUN<-function(x){return(abs(x[,1]) >=1 & x[,2] < 0.05)} > diffgenes<-selectFUN(input) > myInterestedGenes<-names(which(diffgenes==T)) > geneNames<-rownames(input) > geneList<-factor(as.integer(geneNames %in% myInterestedGenes)) > names(geneList)<-geneNames > str(geneList) Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ... - attr(*, "names")= chr [1:34760] "10338001" "10338003" "10338004" "10338017" ... Unfortunately, the predefined list does not contain any DE "score" information. I would greatly appreciate any help in attaching the score information to a predefined list or incorporating p.value as well as fold change cutoffs into a geneSel function when creating a topGOdata object, Thanks for any help, Scott Scott A. Ochsner, PhD One Baylor Plaza BCM130, Houston, TX 77030 Voice: (713) 798-6227 Fax: (713) 790-1275 > sessionInfo() R version 2.9.0 (2009-04-17) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] topGO_1.12.0 SparseM_0.80 GO.db_2.2.11 RSQLite_0.7-1 DBI_0.2-4 AnnotationDbi_1.6.1 Biobase_2.4.1 graph_1.22.2 limma_2.18.2 loaded via a namespace (and not attached): [1] grid_2.9.0 lattice_0.17-25 tools_2.9.0
go • 1.1k views
modified 9.7 years ago by Adrian Alexa400 • written 9.7 years ago by Scott Ochsner300
Answer: "topGOdata" object: How to supply gene scores with a predefined list of genes
0
9.7 years ago by
Hi Scott, I'm not sure I totally understand your question, but if you want to build a "topGOdata" object from a list a genes for which you have scores (quantifying differential expression) there is a simple way to do it. The first thing you need is a named numeric vector, where the gene identifiers are stored in the names attribute of the vector and the numeric values are the respective gene scores. The set of genes found in the names attribute defines the gene universe. For example, the following should work for you: geneList <- p.adjust(fit$p.value[,1],method="BH")) names(geneList) <- geneNames Then you will need to define a function for specifying the list of interesting genes based on the scores (in your case the adjusted p-values). The function must return a logical vector specifying which gene is selected and which not. The function must have one argument, named allScore and must not depend on any attributes of this object. If for example you want to select all genes with an adjusted p-value lower than 0.01, then the function should look like: topDiffGenes <- function(allScore) { return(allScore < 0.01) } Now you can can build a "topGOdata" object as follows (in the code bellow I assume you are using a Bioconductor annotation package, for example "hgu133a") ## build the topGOdata class GOdata <- new("topGOdata", ontology = "BP", allGenes = geneList, geneSel = topDiffGenes, annot = annFUN.db, affyLib = "hgu133a") ## display the GOdata object GOdata I hope this answers your question. Please let me know if you have further problems. Regards, Adrian On Tue, Aug 25, 2009 at 10:04 PM, Ochsner, Scott A<sochsner at="" bcm.tmc.edu=""> wrote: > Hi, > > I would like to attach gene "score" info to a predefined list of > interesting genes to generate a topGOdata object. ?The predefined list > of genes was obtained by: >> library(limma) >> library(topGO) >> > input<-cbind(FC=fit$coefficients[,1],pval=p.adjust(fit$p.value[,1],m etho > d="BH")) >> selectFUN<-function(x){return(abs(x[,1]) >=1 & x[,2] < 0.05)} >> diffgenes<-selectFUN(input) >> myInterestedGenes<-names(which(diffgenes==T)) >> geneNames<-rownames(input) >> geneList<-factor(as.integer(geneNames %in% myInterestedGenes)) >> names(geneList)<-geneNames >> str(geneList) > ?Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ... > ?- attr(*, "names")= chr [1:34760] "10338001" "10338003" "10338004" > "10338017" ... > > Unfortunately, the predefined list does not contain any DE "score" > information. > I would greatly appreciate any help in attaching the score information > to a predefined list or incorporating p.value as well as fold change > cutoffs into a geneSel function when creating a topGOdata object, > > Thanks for any help, > > Scott > > Scott A. Ochsner, PhD > One Baylor Plaza BCM130, Houston, TX 77030 > Voice: (713) 798-6227 ?Fax: (713) 790-1275 > >> sessionInfo() > R version 2.9.0 (2009-04-17) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base > > > other attached packages: > [1] topGO_1.12.0 ? ? ? ?SparseM_0.80 ? ? ? ?GO.db_2.2.11 > RSQLite_0.7-1 ? ? ? DBI_0.2-4 ? ? ? ? ? AnnotationDbi_1.6.1 > Biobase_2.4.1 ? ? ? graph_1.22.2 ? ? ? ?limma_2.18.2 > > loaded via a namespace (and not attached): > [1] grid_2.9.0 ? ? ?lattice_0.17-25 tools_2.9.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > ADD COMMENTlink written 9.7 years ago by Adrian Alexa400 Hi Adrian, Thanks for the response. You have confirmed for me that at the moment it is not possible to create a geneSel function which utilizes more than one argument. Unfortunately, I want to utilize a fold change cutoff in addition to a p.value cutoff. The only way I can do this is to provide a predefined list with the structure below where the factor level determines the genes of interest and the universe. Unfortunately, it does not appear possible to also give a gene score (p.value) to the structure below. I guess in situations were one wishes to utilize more than one selection criteria it will not be possible to use the KS test. Don't get me wrong. I still like the added value of being able to compare the classic, elim, and weight algorithms. > str(geneList) ?Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ... ?- attr(*, "names")= chr [1:34760] "10338001" "10338003" "10338004" "10338017" ... Thanks, Scott Scott A. Ochsner, PhD One Baylor Plaza BCM130, Houston, TX 77030 Voice: (713) 798-6227 Fax: (713) 790-1275 -----Original Message----- From: Adrian Alexa [mailto:adrian.alexa@gmail.com] Sent: Thursday, August 27, 2009 11:56 AM To: Ochsner, Scott A Cc: bioconductor at stat.math.ethz.ch Subject: Re: [BioC] "topGOdata" object: How to supply gene scores with a predefined list of genes Hi Scott, I'm not sure I totally understand your question, but if you want to build a "topGOdata" object from a list a genes for which you have scores (quantifying differential expression) there is a simple way to do it. The first thing you need is a named numeric vector, where the gene identifiers are stored in the names attribute of the vector and the numeric values are the respective gene scores. The set of genes found in the names attribute defines the gene universe. For example, the following should work for you: geneList <- p.adjust(fit$p.value[,1],method="BH")) names(geneList) <- geneNames Then you will need to define a function for specifying the list of interesting genes based on the scores (in your case the adjusted p-values). The function must return a logical vector specifying which gene is selected and which not. The function must have one argument, named allScore and must not depend on any attributes of this object. If for example you want to select all genes with an adjusted p-value lower than 0.01, then the function should look like: topDiffGenes <- function(allScore) { return(allScore < 0.01) } Now you can can build a "topGOdata" object as follows (in the code bellow I assume you are using a Bioconductor annotation package, for example "hgu133a") ## build the topGOdata class GOdata <- new("topGOdata", ontology = "BP", allGenes = geneList, geneSel = topDiffGenes, annot = annFUN.db, affyLib = "hgu133a") ## display the GOdata object GOdata I hope this answers your question. Please let me know if you have further problems. Regards, Adrian On Tue, Aug 25, 2009 at 10:04 PM, Ochsner, Scott A<sochsner at="" bcm.tmc.edu=""> wrote: > Hi, > > I would like to attach gene "score" info to a predefined list of > interesting genes to generate a topGOdata object. ?The predefined list > of genes was obtained by: >> library(limma) >> library(topGO) >> > input<-cbind(FC=fit$coefficients[,1],pval=p.adjust(fit$p.value[,1],met > ho > d="BH")) >> selectFUN<-function(x){return(abs(x[,1]) >=1 & x[,2] < 0.05)} >> diffgenes<-selectFUN(input) >> myInterestedGenes<-names(which(diffgenes==T)) >> geneNames<-rownames(input) >> geneList<-factor(as.integer(geneNames %in% myInterestedGenes)) >> names(geneList)<-geneNames >> str(geneList) > ?Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ... > ?- attr(*, "names")= chr [1:34760] "10338001" "10338003" "10338004" > "10338017" ... > > Unfortunately, the predefined list does not contain any DE "score" > information. > I would greatly appreciate any help in attaching the score information > to a predefined list or incorporating p.value as well as fold change > cutoffs into a geneSel function when creating a topGOdata object, > > Thanks for any help, > > Scott > > Scott A. Ochsner, PhD > One Baylor Plaza BCM130, Houston, TX 77030 > Voice: (713) 798-6227 ?Fax: (713) 790-1275 > >> sessionInfo() > R version 2.9.0 (2009-04-17) > i386-pc-mingw32 > > locale: > LC_COLLATE=English_United States.1252;LC_CTYPE=English_United > States.1252;LC_MONETARY=English_United > States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 > > attached base packages: > [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base > > > other attached packages: > [1] topGO_1.12.0 ? ? ? ?SparseM_0.80 ? ? ? ?GO.db_2.2.11 > RSQLite_0.7-1 ? ? ? DBI_0.2-4 ? ? ? ? ? AnnotationDbi_1.6.1 > Biobase_2.4.1 ? ? ? graph_1.22.2 ? ? ? ?limma_2.18.2 > > loaded via a namespace (and not attached): > [1] grid_2.9.0 ? ? ?lattice_0.17-25 tools_2.9.0 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >