Problem when using topGO for RNAseq data: The number of annotated genes for GO terms changes according to the number of genes in the gene-background-LIst.
0
0
Entering edit mode
colaneri ▴ 30
@colaneri-7770
Last seen 5.7 years ago
United States

I’am running topGO, to perform GO analysis in a RNA-seq data set. I have an small data set of 104 significant genes that I called “sigG”.

Firstly, I used genefilter to find genes that have similar level of expression than my “sigG”.  For each sigGene I got 10 genes which make a background set of 1040 genes (backG).

I run topGO but I found results that make suspect that something is going wrong. See the head of these table:

Using 10 background genes per sig gene:

1-For example in my last row I see 127 sig genes mapping to GO:0016740, but I only use a set of 104 genes.

2-When I include more backG (50 instead of 10 per sigG) I notice that the number of annotated genes for each GO term changed (increased for most of them)

Finally, if I took my sigGenes and I analyzed them with a variety of online GO tools I always get nice enrichment in very specific terms, independently of the tool. However not of those terms appear in this topGO analysis. I cannot get an explanation for these results. I will appreciate if someone can give me a hint of what is going one. Below you can find the R script used for this analysis

The code I used to generate these tables is:

 

# resSxT is a result DESeq2 object
# > class(resSxT)
# [1] "DESeqResults"
#attr(,"package")
#[1] "DESeq2"


#get the list of genes considered to be significant for the interaction test
sigGenesSxT <-rownames(subset(resSxT, pvalue < 0.01))
#get average expressions
overallBaseMeanSxT <- as.matrix(resSxT[, "baseMean", drop = F])
backG_SxT <- genefinder(overallBaseMeanSxT, sigGenesSxT, 50, method = "manhattan")

# get identified similar genes
backG_SxTwithSIG <- rownames(overallBaseMeanSxT)[as.vector(sapply(backG_SxT, function(x)x$indices))]

## remove Diff Expressed genes from background
backG_SxT <- setdiff(backG_SxTwithSIG, sigGenesSxT)

##RUNNING topGO on resSxT

onts = c( "MF", "BP", "CC" )
geneIDs = rownames(overallBaseMeanSxT)
inUniverse = geneIDs %in% c(sigGenesSxT, backG_SxT)
inSelection = geneIDs %in% backG_SxT
algSxT <- factor( as.integer( inSelection[inUniverse] ) )
names(algSxT) <- geneIDs[inUniverse]

tab = as.list(onts)
names(tab) = onts
### test all three top level ontologies
for(i in 1:3){
        ## prepare data
        tgdSxT <- new( "topGOdata", ontology=onts[i], allGenes = algSxT, nodeSize=5, annot=annFUN.org, mapping="org.Hs.eg.db", ID = "SYMBOL" )
        ## run tests
        resultTopGO_SxT.elim <- runTest(tgdSxT, algorithm = "elim", statistic = "Fisher" )
        resultTopGO_SxT.classic <- runTest(tgdSxT, algorithm = "classic", statistic = "Fisher" )
        ## look at results
        tab[[i]] <- GenTable( tgdSxT, Fisher.elim = resultTopGO_SxT.elim,
                              Fisher.classic = resultTopGO_SxT.classic,
                              orderBy = "Fisher.classic" , topNodes = 200)
}    
topGOResultsSxT <- as.data.frame(do.call(rbind,tab))
write.csv(topGOResultsSxT, file = "SxT_topGOResults.csv")        

 

topGO gene ontolology rnaseq • 1.9k views
ADD COMMENT

Login before adding your answer.

Traffic: 438 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6