topGO error: allGenes should be a factor or a numeric vector
1
0
Entering edit mode
ccha ▴ 10
@ccha-23888
Last seen 4.2 years ago

Hello, I'm trying to perform GO-term analysis with some differentially expressed genes using the topGO package and following the steps in the booklet. Currently I am having trouble trying to prepare the input data.

This is my code taken from the booklet, but with my particular set of genes:

sampleGOdata <- new("topGOdata", ontology = "BP", allGenes = cluster2TOPgo_GeneList, geneSel = topDiffGenes, nodeSize = 10, annot = org.Mm.eg.db)

However when I run this, it comes up with "Error in .local(.Object, ...) : allGenes should be a factor or a numeric vector"

This is what my gene list looks like (entrez ID as row names and adjusted p values down the column versus their example gene list.

Can someone explain to me what I've done wrong and how I can fix this issue?

topGO topgo GO term analysis • 6.6k views
ADD COMMENT
1
Entering edit mode
Kevin Blighe ★ 4.0k
@kevin
Last seen 18 days ago
Republic of Ireland

The allGenes parameter should specify a named vector, with the values of this vector being [typically] p-values, and the names being gene names. It should look like this:

library(topGO)
library(ALL)
data(geneList)
head(geneList)
1095_s_at   1130_at   1196_at 1329_s_at 1340_s_at 1342_g_at 
1.0000000 1.0000000 0.6223795 0.5412240 1.0000000 1.0000000

So, please verify the content of your cluster2TOPgo_GeneList variable.

Kevin

ADD COMMENT
0
Entering edit mode

Hi Kevin, Thanks for your reply. When I run (headGeneList), it comes up with the exact same output as your code. However, when I run the head of my genelist, it comes up with this. Does this mean I have to transpose my dataframe so that the genes are columns and the pvalues are the first row?

ADD REPLY
0
Entering edit mode

It looks like it is in the incorrect format, yes. A named vector is created like this:

x <- c(1,2,3,4)
names(x) <- c('A','B','C','D')
ADD REPLY
0
Entering edit mode

I've turned my dataframe into a named vector, but still seem to get the same error. Here's the head of my vector (cluster2genelistv) and what it looks like when I view it.

enter image description here

enter image description here

ADD REPLY
1
Entering edit mode

Thanks. Looks like the values in cluster2genelistv should be numeric. Currently, they are encoded as characters.

ADD REPLY
0
Entering edit mode

Yep, that seemed to have worked by changing it to as. numeric instead of as.character: cluster2genelistv <- setNames(as.numeric(cluster2TOPgoGeneList$padj), rownames(cluster2TOPgoGeneList))

I think the input for allGenes is all good now (as indicated by the green bar when I run the code) but unfortunately it comes up with another error.

sampleGOdata <- new("topGOdata", ontology = "BP", allGenes = cluster2genelistv, geneSel = topDiffGenes, nodeSize = 10,  annot = annFUN.org, mapping="org.Mm.eg.db", ID = "SYMBOL")

Building most specific GOs ..... ( 0 GO terms found. )

Build GO DAG topology .......... ( 0 GO terms and 0 relations. ) Nothing to do: Error in split.default(names(sort(nl)), f.index) : first argument must be a vector

ADD REPLY
1
Entering edit mode

The next problem is that you have specified ID = "SYMBOL", but you are using [I assume] Entrez IDs in your input named vector, cluster2genelistv. So, you could try ID = "ENTREZ"

Actually, I usually run topGo in a different way - see here on Biostars: https://www.biostars.org/p/350710/#350712

ADD REPLY
1
Entering edit mode

Thank you kindly - changing ID to ENTREZ seemed to do the trick! The code from the Biostars post worked like a charm too, and was even able to get the graph.

ADD REPLY

Login before adding your answer.

Traffic: 617 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6