Entering edit mode
Hi Aric,
You mapped your reads to Ensembl genes instead of Entrez Gene.
Therefore, you gene ID looks like ENSG00000204248. The gene set data
provided or generated by gage package use Entrez Gene IDs (or KEGG
gene IDs for minor speices) by default. Therefore, none of your genes
mapped to the pathways, hence you got NA?s.
You may mapped that to Entrez Gene as I did in the step 1, i.e. using
TxDb.Hsapiens.UCSC.hg19.knownGene as your gene models. Or you may
convert your Ensembl gene ID to Entrez Gene ID. pathview package has
two functions, id2eg and mol.sum, help you with the ID conversion and
redundant ID merging. You may check the help info by:
?id2eg
HTHs.
Weijun
--------------------------------------------
On Tue, 11/12/13, Aric wrote:
Subject: DESeq2 with GAGE
Date: Tuesday, November 12, 2013, 8:42 PM
Weijun,
I am attempting to use GAGE for kegg analysis with RNA-seq
data that has
been analyzed with DESeq2. I am following the workflow
example from
October 7, 2013, Section 6.1, and am having some problems.
The problem
is that there are no results with GAGE and I believe it is a
problem
with the input file.
The input to `gage` is just a list of genes with their
associated
log2FoldChange.
# create DESeq2 results:
...
dds <- DESeq(dds)
# extract results alone
deseq2.res <- results(dds)
# extract just log2FoldChange
deseq2.fc <- deseq2.res$log2FoldChange
names(deseq2.fc) <- rownames(deseq2.res)
exp.fc = res.fc
Example result of exp.fc:
> head(exp.fc)
ENSG00000204248 ENSG00000256195 ENSG00000108797
ENSG00000225964
? ? ???5.745541? ? ?
? 5.589081? ? ? ? 4.727203?
? ? ? 5.253303
I am then to run gage as such,
> fc.kegg.p <- gage(exp.fc, gsets = kegg.gs, ref =
NULL, samp = NULL)
and this is where it is confusing. I have already given
`gage` the results
of the analysis, just a list of genes with their associated
log2FoldChanges. It seems that `gage` is looking for
expression of genes
across samples, but clearly the above workflow does not
provide
that. Here is an example of the output from the above `gage`
command:
> fc.kegg.p
$greater
? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ?
? ? p.geomean stat.mean p.val q.val set.size exp1
hsa00010 Glycolysis / Gluconeogenesis? ? ?
? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ?
NA? ? ???NaN? ? NA?
? NA? ? ? ? 0???NA
hsa00020 Citrate cycle (TCA cycle)? ? ?
? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ?
???NA? ?
???NaN? ? NA? ? NA?
? ? ? 0???NA
...
$less
? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ?
? ???p.geomean stat.mean p.val q.val
set.size exp1
hsa00010 Glycolysis / Gluconeogenesis? ? ?
? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ?
NA? ? ???NaN? ? NA?
? NA? ? ? ? 0???NA
...
$stats
? ? ? ? ? ? ? ?
? ? ? ? ? ? ? ?
? ? ? ? ???stat.mean
exp1
hsa00010 Glycolysis / Gluconeogenesis? ? ?
? ? NaN???NA
hsa00020 Citrate cycle (TCA cycle)? ? ?
? ? ???NaN???NA
...
I don't really understand how it can generate any p-values
from the
input data. Clearly I am missing something, but I don't see
any `gage`
input that does not want gene counts as exprs and a list of
controls
versus conditions. Any help would be greatly appreciated.
Thanks, Aric