Hello everyone,
I thought of conducting a parallel DE analysis with EdgeR and DESeq
using a
dataset that I have been working on lately.
The dataset has two conditions with two biological replicates each.
Let's say: Wild Type, Wild Type, Knock Out, Knock Out.
It's a smallRNA-Seq dataset, mapped to miRNAs.
I have tried various analyses using both programs and I have noticed
this.
There are very large differences in fold changes for some miRNAs
between the
two programs, even when using "RLE" for EdgeR normalization.
Example:
DESeq Code:
countDataSet = newCountDataSet (DATA, condition)
countDataSet = estimateSizeFactors(countDataSet)
countDataSet = estimateDispersions(countDataSet)
difexp = nbinomTest (countDataSet, "WildType", "KnockOut")
one of the results is:
id baseMean baseMeanA baseMeanB foldChange
log2FoldChange pval padj
100 623.8597966 349.3576527 898.3619406 2.57146776
1.362592066 0.001303353 0.182310802
And the size factors for DESeq are:
sizeFactors(countDataSet)
KO1 KO2 WT1 WT2
1.2969960 1.052 0.8850 0.84442
OK. So far so good.
EdgeR now.
dge <- DGEList(counts=DATA, group=condition)
dge<- calcNormFactors(dge)
dge <- estimateCommonDisp(dge, verbose=TRUE)
dge <-estimateTagwiseDisp(dge, verbose=TRUE)
et<- exactTest(dge)
Which results in:
logFC logCPM PValue FDR
100 -2.750 5.814103 9.40E-09 1.20E-05
With:
dge$samples
group lib.size norm.factors
WT1 1 2796302 0.9922204
WT2 1 2610244 0.9928183
KO1 2 3999488 1.0248098
KO2 2 3349646 0.9905555
We have logFC 1.3 for DESeq and 2.75 in EdgeR
And these results remain practically the same even by using:
dge<- calcNormFactors(dge.RLE, method="RLE")
logFC logCPM PValue FDR
210 -2.775952 5.823856 8.047631e-09 1.030902e-05
group lib.size norm.factors
KnockOut 3999488 1.0147156
KnockOut 3349646 0.9830163
WildType 2796302 0.9903844
WildType 2610244 1.0122579
Any thoughts?
This entry has (raw tags):
KO KO WT WT
131 123 287 195
Any thoughts on why I get 1.3 lFC vs 2.7lFC?
Thanks a lot,
Best Regards,
Ioannis
Hi Ioanni,
Quick thought:
On Thu, Nov 29, 2012 at 5:06 PM, Ioannis Vlachos <iv at="" on.gr=""> wrote:
> Hello everyone,
>
> I thought of conducting a parallel DE analysis with EdgeR and DESeq
using a
> dataset that I have been working on lately.
>
> The dataset has two conditions with two biological replicates each.
>
> Let's say: Wild Type, Wild Type, Knock Out, Knock Out.
>
> It's a smallRNA-Seq dataset, mapped to miRNAs.
>
> I have tried various analyses using both programs and I have noticed
this.
>
> There are very large differences in fold changes for some miRNAs
between the
> two programs, even when using "RLE" for EdgeR normalization.
[snip]
Have you tried doing a scatterplot of the DESeq logFC vs the edgeR
logFC for the same genes?
Is the range systematically compressed in one comparison vs. the
other, or?
Maybe its a function of the `baseMean` values?
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
Dear Ioannis,
Questions like this require reproducible code examples that other
people
can run for themselves to confirm the results. See the posting guide.
As it is, the inconsistencies in the three sets of output you give
(samples in different orders, groups with different names, same miR
with
different IDs, logFC of different signs, etc) do not give a reader any
confidence that the analyses are actually comparable. According to
the
DESeq output you give, this particular miR is much lower in conditionA
(wildtype) than in conditionB (knockout), but the raw data you give
show
very clearly that the opposite is true.
It may be that you have simply given different data to the two
packages or
else have aligned the results incorrectly.
Best wishes
Gordon
---------------- original message -----------------
[BioC] DESeq and EdgeR log fold differences
Ioannis Vlachos iv at on.gr
Thu Nov 29 23:06:17 CET 2012
Hello everyone,
I thought of conducting a parallel DE analysis with EdgeR and DESeq
using
a dataset that I have been working on lately.
The dataset has two conditions with two biological replicates each.
Let's say: Wild Type, Wild Type, Knock Out, Knock Out.
It's a smallRNA-Seq dataset, mapped to miRNAs.
I have tried various analyses using both programs and I have noticed
this.
There are very large differences in fold changes for some miRNAs
between
the two programs, even when using "RLE" for EdgeR normalization.
Example:
DESeq Code:
countDataSet = newCountDataSet (DATA, condition)
countDataSet = estimateSizeFactors(countDataSet)
countDataSet = estimateDispersions(countDataSet)
difexp = nbinomTest (countDataSet, "WildType", "KnockOut")
one of the results is:
id baseMean baseMeanA baseMeanB foldChange
log2FoldChange pval padj
100 623.8597966 349.3576527 898.3619406 2.57146776
1.362592066 0.001303353 0.182310802
And the size factors for DESeq are:
sizeFactors(countDataSet)
KO1 KO2 WT1 WT2
1.2969960 1.052 0.8850 0.84442
OK. So far so good.
EdgeR now.
dge <- DGEList(counts=DATA, group=condition)
dge<- calcNormFactors(dge)
dge <- estimateCommonDisp(dge, verbose=TRUE)
dge <-estimateTagwiseDisp(dge, verbose=TRUE)
et<- exactTest(dge)
Which results in:
logFC logCPM PValue FDR
100 -2.750 5.814103 9.40E-09 1.20E-05
With:
dge$samples
group lib.size norm.factors
WT1 1 2796302 0.9922204
WT2 1 2610244 0.9928183
KO1 2 3999488 1.0248098
KO2 2 3349646 0.9905555
We have logFC 1.3 for DESeq and 2.75 in EdgeR
And these results remain practically the same even by using:
dge<- calcNormFactors(dge.RLE, method="RLE")
logFC logCPM PValue FDR
210 -2.775952 5.823856 8.047631e-09 1.030902e-05
group lib.size norm.factors
KnockOut 3999488 1.0147156
KnockOut 3349646 0.9830163
WildType 2796302 0.9903844
WildType 2610244 1.0122579
Any thoughts?
This entry has (raw tags):
KO KO WT WT
131 123 287 195
Any thoughts on why I get 1.3 lFC vs 2.7lFC?
Thanks a lot,
Best Regards,
Ioannis
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}