How to create geneselectionfunction in TopGO: Error in .local(.Object, ...) : allGenes should be a factor or a numeric vector
1
0
Entering edit mode
AP • 0
@ap-14194
Last seen 20 months ago

Hello everyone,

I am trying to do gene enrichment analysis using TopGO by giving predefined set of differentially expressed genes. so far my code looks like this:

> gene.go <- read.csv("GO_total_upregulated_genes.csv",header = T, stringsAsFactors = F, sep = ",")
GeneID     Goterm
1 FUN_004018 GO:0016491
2 FUN_004018 GO:0046872
3 FUN_004018 GO:0055114
4 FUN_003797 GO:0000287
5 FUN_003797 GO:0030976
6 FUN_003797 GO:0030976
> gene2GO <- tapply(gene.go$Goterm, gene.go$GeneID, function(x)x)
$FUN_000109 [1] "GO:0005515" "GO:0005515" "GO:0005515" "GO:0005515"$FUN_000399
[1] "GO:0008374" "GO:0008374" "GO:0008374" "GO:0008374" "GO:0008374" "GO:0008374"

$FUN_000424 [1] "GO:0006950" "GO:0016021"$FUN_000451
[1] "GO:0035556" "GO:0035556" "GO:0035556" "GO:0035556"

$FUN_000481 [1] "GO:0000166" "GO:0000166" "GO:0016021" "GO:0016021" "GO:0006812" "GO:0016021" [7] "GO:0019829" "GO:0006812" "GO:0016021" "GO:0019829" "GO:0000166" "GO:0006812" [13] "GO:0016021" "GO:0019829" "GO:0016021" "GO:0016021" "GO:0000166" "GO:0000166" [19] "GO:0000166" "GO:0016021" "GO:0016021" "GO:0006812" "GO:0016021" "GO:0019829" [25] "GO:0006812" "GO:0016021" "GO:0019829" "GO:0000166" "GO:0006812" "GO:0016021" [31] "GO:0019829" "GO:0016021" "GO:0016021" "GO:0000166" "GO:0006812" "GO:0016021" [37] "GO:0019829" "GO:0000166" "GO:0006812" "GO:0016021" "GO:0019829" "GO:0016021" [43] "GO:0016021" "GO:0000166" "GO:0000166" "GO:0000166" "GO:0016021" "GO:0016021" [49] "GO:0006812" "GO:0016021" "GO:0019829"$FUN_000534
[1] "GO:0015079" "GO:0016020" "GO:0071805" "GO:0015079" "GO:0016020" "GO:0071805"
[7] "GO:0015079" "GO:0016020" "GO:0071805" "GO:0015079" "GO:0016020" "GO:0071805"
[13] "GO:0015079" "GO:0016020" "GO:0071805" "GO:0015079" "GO:0016020" "GO:0071805"
[19] "GO:0015079" "GO:0016020" "GO:0071805" "GO:0015079" "GO:0016020" "GO:0071805"

> infile <- "Total_upregulated_genes_pvalue.csv"
> pcutoff <- 0.05 # cutoff for defining significant genes
> DE <- read.delim(infile, stringsAsFactors = F, header = T, sep=",")
> tmp <- (DE$padj < pcutoff) > geneList <- tmp > names(geneList) <- (DE$GeneID)
FUN_000233 FUN_000410 FUN_000424 FUN_000451 FUN_000511 FUN_000590
TRUE       TRUE       TRUE       TRUE       TRUE       TRUE
> GOdata_UP_BP <- new("topGOdata",ontology = "BP",allGenes = geneList,
+               geneSelectionFun = function(x)(x == 1),
+               annot = annFUN.gene2GO, gene2GO = gene2GO)
Error in .local(.Object, ...) :
allGenes should be a factor or a numeric vector
>


I think I am doing something wrong while calling allGenes vector with p-values. Could you please help me out.

Thank you!!

R TopGO Geneenrichment • 449 views
0
Entering edit mode
@james-w-macdonald-5106
Last seen 20 hours ago
United States

Your geneList is not what it should be. What you want is a named vector of p-values for all of the genes you have tested, where the names correspond to the gene IDs. The help pages and vignettes for topGO aren't that great, so it's sometimes hard to figure that out. But you can inspect the data that come with topGO to sort of infer things.

> library(topGO)
> data(geneList)
1095_s_at   1130_at   1196_at 1329_s_at 1340_s_at 1342_g_at
1.0000000 1.0000000 0.6223795 0.5412240 1.0000000 1.0000000


In this case the IDs are from an Affy array, but you see the point?

0
Entering edit mode

James,

I don't think I followed you well, however I tried a couple of things to figure out, but still stuck.

tmp <- ifelse(DE$padj < pcutoff, 1, 0) geneList <- tmp names(geneList) <- (DE$GeneID)

FUN_000233 FUN_000410 FUN_000424 FUN_000451 FUN_000511 FUN_000590
1          1          1          1          1          1


I was able to get 1 for all of my significant p-values but was not able to get the exact p-values. Any suggestions please!!

0
Entering edit mode

What does the first line of the code you present do? Why are you doing that? If you want your geneList to be p-values why are you not just extracting the p-values?

0
Entering edit mode

Its assigning all the p-values that are significant (<0.05) as 1 and rest as 0. I was not able to make a code that could return me the actual p-values that's why.

Can I do something like: if the values are less than pcutoff return all scores

1
Entering edit mode

OK, let's try this another way.

You need a vector of the actual p-values, and that vector has to have a certain set of names. What you are doing is extracting the actual p-values that you say you can't get and then testing to see if they larger or smaller than your p-cutoff.

What you need to do is extract the p-values, and then assign names to them. If you can extract the p-values to test if they are larger or smaller than some value, how is it possible that you can't return the p-values themselves?

0
Entering edit mode

Hi James, Sorry if I am not able to articulate the problem I am having. I was able to extract the p-values to see if they are larger or smaller than the cutoff. after this I was able make Godata object and run Fishers test to get the table of GO terms for my set of genes. However I don't know what went wrong, my results look like this:

My code:

tmp <- ifelse(DE$padj < pcutoff, 1, 0) geneList <- tmp names(geneList) <- (DE$GeneID)
GOdata_UP_BP <- new("topGOdata",ontology = "BP",allGenes = geneList,
geneSelectionFun = function(x)(x == 1),
annot = annFUN.gene2GO, gene2GO = gene2GO)

resultFis_UP_BP <- runTest(GOdata_UP_BP,algorithm = "elim", statistic = "fisher")

tab <- GenTable(GOdata_UP_BP, raw.p.value = resultFis_UP_BP, topNodes = length(resultFis_UP_BP@score),
numChar = 120)


My results:

GO.ID                                           Term Annotated Significant
1 GO:0008213                             protein alkylation         1           1
2 GO:0046903                                      secretion         1           1
3 GO:0042219 cellular modified amino acid catabolic process         1           1
4 GO:0006979                   response to oxidative stress         1           1
5 GO:0019438         aromatic compound biosynthetic process        77          77
6 GO:0015672          monovalent inorganic cation transport         2           2
Expected raw.p.value
1        1           1
2        1           1
3        1           1
4        1           1
5       77           1
6        2           1


If you see here all that I am getting in p-value column is the number 1 not the actual p-values. So I thought may be I did something wrong when I assigned the p-values as 1 and 0.