Filtering Tophat/cufflinks data for GSEA with topGO
0
0
Entering edit mode
@aaronrosenstein-13457
Last seen 7.1 years ago

 

Hello all,

I am currently doing some GO enrichment using topGO on an RNA-seq experiment that was analyzed using tophat/cufflinks. 

Having used topGO before, I am fairly versed in its proper use. I would like to do GSEA using the KS test with the elim algorithm, however I want to be sure I am processing my data from the tophat pipeline correctly. I have no experience with tophat, as this part of the pipeline was done by someone else. The tophat output data I received comprises of ~13000 genes, however I am noticing that a fair amount of them are lncRNA, miRNAs,... In addition, near the bottom of the list, there are ~2500 genes with q value>0.9 where their logFC is negligible. When I execute my enrichment protocol in topGO, most of my top hits are high on the GO hierarchy like GO:0008150 (biological_process), and not all that meaningful. Should I be filtering my tophat data before inputting it into topGO in any way, like using only protein coding genes? or will this obscure the data? Just to be sure, here is my topGO code:

where all_genes is my named vector of q values with entrez IDs as names. 

GO_data_BP<-new("topGOdata",ontology="BP",allGenes=all_genes,geneSelectionFun=function(p) p<p_value,description="GO enrichment analysis",annot=annFUN.org,mapping="org.Mm.eg.db",ID="entrez",nodeSize=5)

result_KS_elim_BP<-runTest(GO_data_BP,algorithm="elim",statistic="ks")​

 

 

 

 

 

 

topGO tophat kolmogorov-smirnov test • 1.1k views
ADD COMMENT
0
Entering edit mode

What do you want to learn when using the KS test with the elim algorithm? What are the functions of your 13k genes??

ADD REPLY

Login before adding your answer.

Traffic: 741 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6