topGO issues

0

Entering edit mode

Guest User ★ 13k

@guest-user-4897

Last seen 9.6 years ago

Hi List, I have illumina expression data from mouse chip. I was using topGO package with probe id and P values .I am not sure if I can use hgu95av2.db for annotation. I encountered follwing error. I appreciate your help. Thank you. -- output of sessionInfo(): data1=read.csv ("go.csv", header=T) data1 ProbeID p 1 2810273 0.022920083 2 5720014 0.042961586 3 2600424 0.025965275 4 3520176 0.032705434 5 1240739 0.000668000 6 1190735 0.016157765 ????????????????????????????????????????????????.. rn<-paste(data1[,1], sep="") P_values=data1[,-1] names(P_values)<-rn ####creating named vectors P_values ### this contains probeID with corresponding p-values 2810273 5720014 2600424 3520176 1240739 1190735 0.022920083 0.042961586 0.025965275 0.032705434 0.000668000 0.016157765 4280164 2480441 5870504 5390019 5220576 4060465 0.002103902 0.017717950 0.034700550 0.021078015 0.012964631 0.014807984 4050653 1470619 4610131 2650730 2690035 4670288 0.036829505 0.001426032 0.010399277 0.010104929 0.003046091 0.005084754 4900064 6400377 5560204 2120377 4150037 1240370 0.004906018 0.020934649 0.020285547 0.009254484 0.038157470 0.001123237 2350068 3400246 4730630 3850167 4860048 2000674 ?????????????????????????????????????????????????????????????????????? ????????????????????????????????.. affyLib <- paste(annotation (ALL), "db", sep = ".") library(package = affyLib, character.only = TRUE) [1] "hgu95av2.db" sum(topDiffGenes(P_values)) 75 sampleGOdata <- new("topGOdata", description = "Simple session", ontology = "BP", allGenes = P_values, geneSel = topDiffGenes, nodeSize = 10, annot = annFUN.db, affyLib = affyLib) Building most specific GOs ..... ( 0 GO terms found. ) Build GO DAG topology .......... ( 0 GO terms and 0 relations. ) Error in if is.na(index) || index < 0 || index > length(nd)) stop("vertex is not in graph: ", : missing value where TRUE/FALSE needed -- Sent via the guest posting facility at bioconductor.org.

Annotation GO probe Annotation GO probe • 3.6k views

ADD COMMENT • link updated 10.6 years ago by Marc Carlson ★ 7.2k • written 10.6 years ago by Guest User ★ 13k

0

Entering edit mode

Marc Carlson ★ 7.2k

@marc-carlson-2264

Last seen 7.7 years ago

United States

Hi Raj, The problem seems to be that instead of using the appropriate annotation package you are grabbing the one used by the ALL example object. As a 1st step, you might want to look at our website and see if there is a different annotation package that better matches your array. You can see the list of packages on our website here: http://www.bioconductor.org/packages/release/BiocViews.html#___Annotat ionData Marc On 09/11/2013 08:15 AM, kaushal Raj Chaudhary [guest] wrote: > Hi List, > > I have illumina expression data from mouse chip. I was using topGO package with probe id and P values .I am not sure if I can use hgu95av2.db for annotation. I encountered follwing error. I appreciate your help. Thank you. > > -- output of sessionInfo(): > > data1=read.csv ("go.csv", header=T) > data1 > ProbeID p > 1 2810273 0.022920083 > 2 5720014 0.042961586 > 3 2600424 0.025965275 > 4 3520176 0.032705434 > 5 1240739 0.000668000 > 6 1190735 0.016157765 > ????????????????????????????????????????????????.. > rn<-paste(data1[,1], sep="") > P_values=data1[,-1] > names(P_values)<-rn ####creating named vectors > P_values ### this contains probeID with corresponding p-values > > 2810273 5720014 2600424 3520176 1240739 1190735 > 0.022920083 0.042961586 0.025965275 0.032705434 0.000668000 0.016157765 > 4280164 2480441 5870504 5390019 5220576 4060465 > 0.002103902 0.017717950 0.034700550 0.021078015 0.012964631 0.014807984 > 4050653 1470619 4610131 2650730 2690035 4670288 > 0.036829505 0.001426032 0.010399277 0.010104929 0.003046091 0.005084754 > 4900064 6400377 5560204 2120377 4150037 1240370 > 0.004906018 0.020934649 0.020285547 0.009254484 0.038157470 0.001123237 > 2350068 3400246 4730630 3850167 4860048 2000674 > ???????????????????????????????????????????????????????????????????? ??????????????????????????????????.. > affyLib <- paste(annotation (ALL), "db", sep = ".") > library(package = affyLib, character.only = TRUE) > [1] "hgu95av2.db" > sum(topDiffGenes(P_values)) > 75 > > sampleGOdata <- new("topGOdata", > description = "Simple session", ontology = "BP", > allGenes = P_values, geneSel = topDiffGenes, > nodeSize = 10, > annot = annFUN.db, affyLib = affyLib) > > > Building most specific GOs ..... ( 0 GO terms found. ) > > Build GO DAG topology .......... ( 0 GO terms and 0 relations. ) > Error in if is.na(index) || index < 0 || index > length(nd)) stop("vertex is not in graph: ", : > missing value where TRUE/FALSE needed > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 10.6 years ago Marc Carlson ★ 7.2k

0

Entering edit mode

Hello, Here is some code I am using to run topGO. Essentially starting from text files which contain one gene list each (no headers, list name from file nama). This may be a bit excessive, but it seems to work for me. You need a feature data (fd) matrix / data frame which has at least the Columns Entrez (that contains Entrez ids for your array) and Symbol (that contains gene symbols corresponding to the Entrez ids) for your entire array. BR, Pekka ## Reading in gene lists, forming a glist readGeneList <- function(folder, pattern, recursive=T) { gl.files <- list.files(path=folder, pattern=pattern, recursive=recursive) names <- gsub(".txt", "",basename(gl.files)) glist <- lapply(gl.files, function(f){ cat("Reading:", f, "\n") read.delim(f, stringsAsFactors=F, header=F)[,1] }) names(glist) <- names glist } # Removing white-spaces and other problematic characters cleanIds <- function(ids) { ids <- gsub(" ", "", ids) ids <- gsub(",", "", ids) ids <- gsub(";", "", ids) ids <- gsub("\'", "", ids) ids <- gsub("\"", "", ids) ids <- ids[!is.na(ids)] ids <- ids[!ids =="0"] ids <- as.character(unique(ids)) } ## function assumes that using gene symbols: ## # fd = feature data containing at least mapping between allg for Symbol, Entrez # GObranch = "BP", "MF", "CC" mapGenesToTerms <- function(fd, GObranch="BP") { require(org.Hs.eg.db) keys <- cleanIds(as.character(unique(fd$Entrez))) cols <- c("GO", "SYMBOL") mapping <- select(org.Hs.eg.db, keys, cols, keytype="ENTREZID") mapping.bp <- mapping[mapping$ONTOLOGY==GObranch,] go.list <- by(mapping.bp[!mapping.bp$SYMBOL=="",], INDICES=mapping.bp$SYMBOL[!mapping.bp$SYMBOL==""], function(x){x$GO}) go.list } ## function assumes that using gene symbols: ## # Variables: # deg = differentially expressed genes in gene symbols # allg = all genes that were measured or are left after independent filtering # go.list from mapGenesToTerms() setUpGOdata <- function(deg, allg, go.list, GObranch="BP") { require(topGO) allg <- cleanIds(allg) deg <- cleanIds(deg) geneList <- factor(as.integer(allg %in% deg)) names(geneList) <- allg cat("Starting mapping of identifiers to GO-DAG... \n") GOdata <- new("topGOdata", ontology=GObranch, allGenes=geneList, annot=annFUN.gene2GO, nodeSize = 5, gene2GO=go.list) cat("Done \n") GOdata } # Running the function: only the weight01 contains all the results, # so in effect only weight01 can be used, others are only for comparison. # Especially the ParentChild and Classic can give results not in the top 500 of weight01 runTopGO <- function(godata, nnodes=500) { require(topGO) cat("Running Classic \n") go.test <- new("classicCount", testStatistic=GOFisherTest, name="Fisher test") resultFis <- getSigGroups(godata, go.test) cat("Running weight \n") go.weight <- new("weightCount", testStatistic=GOFisherTest, name="Fisher test", sigRatio="ratio") resultWeight <- getSigGroups(godata, go.weight) cat("Running parent-child \n") go.parentChild <- new("parentChild", testStatistic=GOFisherTest, name="Fisher test") resultparentChild <- getSigGroups(godata, go.parentChild) cat("Running weight01 \n") go.weight01 <- new("weight01Count", testStatistic=GOFisherTest, name="Fisher test") resultweight01 <- getSigGroups(godata, go.weight01) cat("Running elim \n") go.elimCount <- new("elimCount", testStatistic=GOFisherTest, name="Fisher test") resultelimCount <- getSigGroups(godata, go.elimCount) allres <- GenTable(godata, classic=resultFis, p.c=resultparentChild, weight=resultWeight, weight01=resultweight01, elim=resultelimCount, orderBy="weight01", ranksOf="classic", topNodes=nnodes) allres } # Function cycles through a list of gene lists and assumes that all is in SYMBOL format:# doTopGO <- function(glist, allg, fd, GObranch="BP", nnodes=500) { require(topGO) glnames <-names(glist) go.list <- mapGenesToTerms(fd=fd, GObranch=GObranch) resultslist <- lapply(1:length(glnames), function(n) { cat("############## \n") cat("Running:", glnames[n], "\n") g <- glist[[n]] cat("Number of genes:", length(g), "\n") gd <- setUpGOdata(deg=g, allg=allg, go.list=go.list, GObranch=GObranch) tg <- runTopGO(godata=gd, nnodes=nnodes) tg }) names(resultslist) <- names(glist) resultslist } ### Using the functions: ### ####################### ## Reading in the data: ## glist <- readGeneList(folder=f, pattern=".txt") allg <- fd$Symbol ## Running the topGO analysis on all the lists:# ## Gives a list of first 250 "weight01" topGO results: ## tgl <- doTopGO(glist=glist, allg=allg, fd=fd, GObranch="BP", nnodes=250) 2013/9/13 Marc Carlson <mcarlson@fhcrc.org> > Hi Raj, > > The problem seems to be that instead of using the appropriate annotation > package you are grabbing the one used by the ALL example object. As a 1st > step, you might want to look at our website and see if there is a different > annotation package that better matches your array. You can see the list of > packages on our website here: > > http://www.bioconductor.org/**packages/release/BiocViews.** > html#___AnnotationData<http: www.bioconductor.org="" packages="" release="" biocviews.html#___annotationdata=""> > > > Marc > > > > On 09/11/2013 08:15 AM, kaushal Raj Chaudhary [guest] wrote: > >> Hi List, >> >> I have illumina expression data from mouse chip. I was using topGO >> package with probe id and P values .I am not sure if I can use hgu95av2.db >> for annotation. I encountered follwing error. I appreciate your help. >> Thank you. >> >> -- output of sessionInfo(): >> >> data1=read.csv ("go.csv", header=T) >> data1 >> ProbeID p >> 1 2810273 0.022920083 >> 2 5720014 0.042961586 >> 3 2600424 0.025965275 >> 4 3520176 0.032705434 >> 5 1240739 0.000668000 >> 6 1190735 0.016157765 >> â¦â¦â¦â¦â¦â¦â¦â¦â¦â¦**â¦â¦â¦â¦â¦â¦.. >> rn<-paste(data1[,1], sep="") >> P_values=data1[,-1] >> names(P_values)<-rn ####creating named vectors >> P_values ### this contains probeID with corresponding p-values >> >> 2810273 5720014 2600424 3520176 1240739 1190735 >> 0.022920083 0.042961586 0.025965275 0.032705434 0.000668000 0.016157765 >> 4280164 2480441 5870504 5390019 5220576 4060465 >> 0.002103902 0.017717950 0.034700550 0.021078015 0.012964631 0.014807984 >> 4050653 1470619 4610131 2650730 2690035 4670288 >> 0.036829505 0.001426032 0.010399277 0.010104929 0.003046091 0.005084754 >> 4900064 6400377 5560204 2120377 4150037 1240370 >> 0.004906018 0.020934649 0.020285547 0.009254484 0.038157470 0.001123237 >> 2350068 3400246 4730630 3850167 4860048 2000674 >> â¦â¦â¦â¦â¦â¦â¦â¦â¦â¦**â¦â¦â¦â¦â¦â¦â¦â¦â¦â¦** >> â¦â¦â¦â¦â¦â¦â¦â¦â¦â¦**â¦â¦â¦â¦.. >> affyLib <- paste(annotation (ALL), "db", sep = ".") >> library(package = affyLib, character.only = TRUE) >> [1] "hgu95av2.db" >> sum(topDiffGenes(P_values)) >> 75 >> >> sampleGOdata <- new("topGOdata", >> description = "Simple session", ontology = "BP", >> allGenes = P_values, geneSel = topDiffGenes, >> nodeSize = 10, >> annot = annFUN.db, affyLib = affyLib) >> >> >> Building most specific GOs ..... ( 0 GO terms found. ) >> >> Build GO DAG topology .......... ( 0 GO terms and 0 relations. ) >> Error in if is.na(index) || index < 0 || index > length(nd)) >> stop("vertex is not in graph: ", : >> missing value where TRUE/FALSE needed >> >> >> -- >> Sent via the guest posting facility at bioconductor.org. >> >> ______________________________**_________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.e="" thz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: http://news.gmane.org/gmane.** >> science.biology.informatics.**conductor<http: news.gmane.org="" gmane="" .science.biology.informatics.conductor=""> >> > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > [[alternative HTML version deleted]]

ADD REPLY • link 10.6 years ago Pekka Kohonen ▴ 190

Login before adding your answer.