Hello, I'm trying to perform GO-term analysis with some differentially expressed genes using the topGO package and following the steps in the booklet. Currently I am having trouble trying to prepare the input data.
This is my code taken from the booklet, but with my particular set of genes:
sampleGOdata <- new("topGOdata", ontology = "BP", allGenes = cluster2TOPgo_GeneList, geneSel = topDiffGenes, nodeSize = 10, annot = org.Mm.eg.db)
However when I run this, it comes up with "Error in .local(.Object, ...) : allGenes should be a factor or a numeric vector"
This is what my gene list looks like (entrez ID as row names and adjusted p values down the column versus their example gene list.
Can someone explain to me what I've done wrong and how I can fix this issue?
Hi Kevin, Thanks for your reply. When I run (headGeneList), it comes up with the exact same output as your code. However, when I run the head of my genelist, it comes up with this. Does this mean I have to transpose my dataframe so that the genes are columns and the pvalues are the first row?
It looks like it is in the incorrect format, yes. A named vector is created like this:
I've turned my dataframe into a named vector, but still seem to get the same error. Here's the head of my vector (cluster2genelistv) and what it looks like when I view it.
Thanks. Looks like the values in
cluster2genelistv
should be numeric. Currently, they are encoded as characters.Yep, that seemed to have worked by changing it to as. numeric instead of as.character: cluster2genelistv <- setNames(as.numeric(cluster2TOPgoGeneList$padj), rownames(cluster2TOPgoGeneList))
I think the input for allGenes is all good now (as indicated by the green bar when I run the code) but unfortunately it comes up with another error.
Building most specific GOs ..... ( 0 GO terms found. )
Build GO DAG topology .......... ( 0 GO terms and 0 relations. ) Nothing to do: Error in split.default(names(sort(nl)), f.index) : first argument must be a vector
The next problem is that you have specified
ID = "SYMBOL"
, but you are using [I assume] Entrez IDs in your input named vector,cluster2genelistv
. So, you could tryID = "ENTREZ"
Actually, I usually run topGo in a different way - see here on Biostars: https://www.biostars.org/p/350710/#350712
Thank you kindly - changing ID to ENTREZ seemed to do the trick! The code from the Biostars post worked like a charm too, and was even able to get the graph.