Using GAGE to Analyze Pathway Enrichment Directly from Fold Change Data
1
0
Entering edit mode
JMallory • 0
@jmallory-13488
Last seen 5.4 years ago

I have been using the following tutorial by Stephen Turner and Will Bush to look at some RNA-seq data. 

http://www.gettinggeneticsdone.com/2015/12/tutorial-rna-seq-differential.html

Looking into GAGE's documentation, it looks like this tutorial is using it in a somewhat non-standard way. Specifically, it looks like they are using it to conduct a GSEA-esque analysis, feeding it a vector of fold changes annotated by Entrez IDs and looking for enrichment within pathways contained in the `kegg.sets.hs` object. 

Were this a standard GSEA analysis, I would order transcripts by log2 fold change prior to analysis. In this use case of GAGE, should transcripts also be rank ordered prior to analysis? Running it both ways appears to make a large difference, at least in the case of my data.     

GAGE Gene Ontology rna-seq pathview • 2.5k views
ADD COMMENT
1
Entering edit mode
Luo Weijun ★ 1.6k
@luo-weijun-1783
Last seen 16 months ago
United States

Gene orders in the data should make no difference in GAGE analysis. You can randomly shuffle the rows in the example datasets (e.g. gse16873) and run GAGE, there will be no difference.

It is likely that there are multiple rows corresponding to the same gene IDs in your data. In other case, the order of rows would make a difference, as only the first row of the same gene ID will be mapped. GAGE actually assumes independence between genes/rows. So gene IDs in the user data and gene set should be unique. You will need merge your repetitive rows for the same gene ID in the a single row. You may check on the data preparation tutorial for details:

http://bioconductor.org/packages/release/bioc/vignettes/gage/inst/doc/dataPrep.pdf

ADD COMMENT
0
Entering edit mode

I see. Yes, I have multiple transcripts/splice variants of a single gene mapping to a single Entrez ID in my user data. If I am understanding correctly, this is the source of the issue. The algorithm is simply selecting the first fold change value associated with a given gene ID for use in further computations and disregarding other gene isoforms. Correct me if I am wrong and thank you for your response.   

ADD REPLY

Login before adding your answer.

Traffic: 641 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6