Search
Question: Using GAGE to Analyze Pathway Enrichment Directly from Fold Change Data
0
gravatar for JMallory
7 weeks ago by
JMallory0
JMallory0 wrote:

I have been using the following tutorial by Stephen Turner and Will Bush to look at some RNA-seq data. 

http://www.gettinggeneticsdone.com/2015/12/tutorial-rna-seq-differential.html

Looking into GAGE's documentation, it looks like this tutorial is using it in a somewhat non-standard way. Specifically, it looks like they are using it to conduct a GSEA-esque analysis, feeding it a vector of fold changes annotated by Entrez IDs and looking for enrichment within pathways contained in the `kegg.sets.hs` object. 

Were this a standard GSEA analysis, I would order transcripts by log2 fold change prior to analysis. In this use case of GAGE, should transcripts also be rank ordered prior to analysis? Running it both ways appears to make a large difference, at least in the case of my data.     

ADD COMMENTlink modified 7 weeks ago by Luo Weijun1.4k • written 7 weeks ago by JMallory0
1
gravatar for Luo Weijun
7 weeks ago by
Luo Weijun1.4k
United States
Luo Weijun1.4k wrote:

Gene orders in the data should make no difference in GAGE analysis. You can randomly shuffle the rows in the example datasets (e.g. gse16873) and run GAGE, there will be no difference.

It is likely that there are multiple rows corresponding to the same gene IDs in your data. In other case, the order of rows would make a difference, as only the first row of the same gene ID will be mapped. GAGE actually assumes independence between genes/rows. So gene IDs in the user data and gene set should be unique. You will need merge your repetitive rows for the same gene ID in the a single row. You may check on the data preparation tutorial for details:

http://bioconductor.org/packages/release/bioc/vignettes/gage/inst/doc/dataPrep.pdf

ADD COMMENTlink written 7 weeks ago by Luo Weijun1.4k

I see. Yes, I have multiple transcripts/splice variants of a single gene mapping to a single Entrez ID in my user data. If I am understanding correctly, this is the source of the issue. The algorithm is simply selecting the first fold change value associated with a given gene ID for use in further computations and disregarding other gene isoforms. Correct me if I am wrong and thank you for your response.   

ADD REPLYlink written 6 weeks ago by JMallory0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 146 users visited in the last hour