Entering edit mode
Hi Valentina,
Well it's not all that surprising, because 6666 is a much bigger
number
than 0! What you mean of course, is that the large count 6666
represents
some sort of outlier process that you don't want to exclude.
The quick solution, as Mark has already suggested, is simply to reduce
the
prior.df in the call to estimateTagwiseDisp(). If you make prior.df
sufficiently small, then the offending genes will drop out of your DE
list.
However we have recently implemented a robust empirical Bayes
procedure
into edgeR and limma that I think you will find far more effective.
It
identifies outlier genes with very high variances while maintaining
and
even increasing power for the remaining non-outlier genes. To use it,
you
will need to switch to the quasi-likelihood routines of edgeR. You
will
also need to update to the Bioconductor 2.12 versions of edgeR and
limma
rather than the older Bioconductor release 2.11 versions that you are
currently using.
Assuming you have executed the code you give, you can continue:
design <- model.matrix(~group)
y <- estimateGLMTrendedDisp(y,design)
fit <- glmQLFTest(y,design,robust=TRUE)
topTags(fit,coef=2)
You could also try
glmQLFTest(y,design,robust=TRUE,plot=TRUE)
to get a diagnostic plot.
This methodology uses the quasi-likelihood approach
http://www.ncbi.nlm.nih.gov/pubmed/23104842
http://www.statsci.org/smyth/pubs/QuasiSeqPreprint.pdf
combined with robust empirical Bayes:
http://www.statsci.org/smyth/pubs/RobustEBayesPreprint.pdf
Note that the quasi-likelihood approach is (deliberately) somewhat
more
conservative than the classic edgeR exact test, but it should still
have
plenty of power to get good results for a 3 vs 3 experiment.
Best wishes
Gordon
On Wed, 19 Jun 2013, Valentina Indio wrote:
> I have a question about you edgeR, in order to fix a problem that
may be
> very simple for you.
>
> I'm using `edgeR` to perform differential expression analysis from
RNA-seq
> experiment.
>
> I have 6 samples of tumor cell, same tumor and same treatment: 3
patient
> with good prognosis and 3 patient with bad prognosis. I want to
compare
> the gene expression among the two groups.
>
> I ran the `edgeR` pakage like follow:
>
> x <- read.delim("my_reads_count.txt", row.names="GENE")
> group <- factor(c(1,1,1,2,2,2))
> y <- DGEList(counts=x,group=group)
> y <- calcNormFactors(y)
> y <- estimateCommonDisp(y)
> y <- estimateTagwiseDisp(y)
> et <- exactTest(y)
>
> I obtained a very odd results: in some cases I had a very low
p-value
> and FDR but looking at the raw data it is obvious that the
difference
> between the two groups can't be significant.
>
> This is an example for `my_reads_count.txt`:
>
> GENE sample1_1 sample1_2 sample1_3 sample2_1 sample2_2 sample2_3
> ENSG00000198842 0 3 2 2 6666 3
> ENSG00000257017 3 3 25 2002 29080 4
>
> And for `my_edgeR_results.txt`:
>
> GENE logFC logCPM PValue FDR
> ENSG00000198842 9.863211e+00 5.4879462930 5.368843e-07 1.953612e-04
> ENSG00000257017 9.500927e+00 7.7139869397 8.072384e-10 7.171947e-07
>
> It seems very odd that "0 3 2" versus "2 6666 3" is significant....
> I would like that the variance within the group is considered. Can
you
> help me? Some suggestion?
> Are there some alternative function in edgeR that is capable to fix
my
> problem??
-- output of sessionInfo():
> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-redhat-linux-gnu (64-bit)
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] edgeR_3.0.8 limma_3.12.3
tweeDEseqCountData_1.0.8 Biobase_2.16.0 BiocGenerics_0.2.0
BiocInstaller_1.8.3
loaded via a namespace (and not attached):
[1] tools_2.15.0
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}