Question: Fwd: Re: edgeR and sagenhaft

0

Naomi Altman •

**6.0k**wrote:
>To: "Mark Robinson" <mrobinson at="" wehi.edu.au="">
>From: Naomi Altman <naomi at="" stat.psu.edu="">
>Subject: Re: [BioC] edgeR and sagenhaft
>Cc:
>Bcc:
>X-Eudora-Signature: <work>
>Date: Sun, 15 Feb 2009 11:21:54 -0500
>
>Dear Mark,
>Thanks for your feedback.
>
>Here are some comments to the comments:
>
>These are biological replicates. There is also a technical
>replicate that was sequenced using another method, but I did not use
>it for this analysis.
>(It also has some interesting behavior.)
>
>For lib.size I use the total number of reads. This is RNA-seq data,
>so the number is large. The lib sizes do vary about 20%.
>I did the batch tests pairwise exactly as if they were different
genotypes.
>
>edgeR reported about 4x as many differentially expressed genes as
>sage.test. But there was almost no overlap with any of the genes
>selected as significant by sage.test.
>
>I am going to resort to my usual advice and look at some of the
>counts for genes tagged by each method as differentially
>expressed. I will report back if I solve the mystery.
>
>Thanks,
>Naomi
>
>At 07:35 PM 2/13/2009, you wrote:
>>Hi Naomi.
>>
>>Curious. A bit difficult to diagnose without digging into it.
There is
>>probably a reasonable explanation for all of this.
>>
>>For what its worth, a few comments/queries below.
>>
>>
>> > I have 4 large tag datasets A1, A2 and B1, B2. The purpose of
the
>> > experiment was to determine differences in gene expression
between A and
>> > B.
>> > A1 and B1 were done together as batch 1, and A2 and B2 were done
>> > together as batch 2.
>>
>>First question: are these technical replicates or biological? If
>>technical, you may consider the 'doPoisson=TRUE' option of deDGE()
since
>>that effectively sets r large (dispersion small), making it a
Poisson
>>calculation.
>>
>>
>> > I several analyses and am completely puzzled.
>> >
>> > First I ran sage.test (Fisher's exact test) on A1, B1 and on A2,
>> > B2. The results were strongly concordant in that there was a lot
of
>> > overlap in the significant gene list,
>> > and the same genes were up/down regulated (on the whole).
>> >
>> > Then I ran edgeR on all 4 samples. A large number of genes were
>> > declared significantly differentially expressed, but it was
almost
>> > completely disjoint from the genes "found" by sage.test. (Fewer
than
>> > 10 out of 4000). The $r$ values were strongly clustered around
2,
>> > although some were huge. Incidentally, the "exact" component of
the
>> > output does not seem to be described in ?edgeR, but I understand
it
>> > to be the p-value from the test.
>>
>>'r' values around 2 suggest there is significant variation over and
above
>>Poisson. But, maybe this is due to batch effects.
>>
>>Indeed, the 'exact' element is the p-value from the exact test
proposed in
>>the paper.
>>
>>What do you use for 'lib.size' -- total number of reads? Are they
>>drastically different from batch-to-batch/sample-to-sample? How do
the
>>batch effects manifest -- more total reads giving higher overall
counts,
>>or something different?
>>
>>
>> > Then I tested for batch effects by using sage.test on A1, A2 and
on
>> > B1, B2 and finally on A1 U B1 and A2 U B2. A fairly large number
of
>> > genes showed strong batch effects. These overlapped more with
the
>> > genotype within batch sage.test results than with the edgeR
results.
>>
>>
>>Strong batch effects that aren't explained by total counts would
result in
>>higher dispersion estimates (lower values of 'r') in edgeR, thus
giving
>>fewer DE genes. So, maybe this explains some of the lower overlap
here.
>>
>>
>> > Just to make things more confusing, the grad student who ran the
>> > samples used the normal approximation to the Poisson to test
genotype
>> > effects within batch. These
>> > were highly concordant between batches as well, but did not match
the
>> > sage.test results. I thought the p-values would be similar at
least
>> > for genes with large counts, but they were not.
>> >
>> > I am inclined to go with combining the sage.test results, but any
>> > advice would be very welcome
>>
>>
>>Not sure I've really contributed much, but there must be a
reasonable
>>explanation.
>>
>>Mark
>>
>>
>>
>>
>> >
>> > Thanks,
>> >
>> > Naomi S. Altman 814-865-3791
(voice)
>> > Associate Professor
>> > Dept. of Statistics 814-863-7114
(fax)
>> > Penn State University 814-865-1348
(Statistics)
>> > University Park, PA 16802-2111
>> >
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor at stat.math.ethz.ch
>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > Search the archives:
>> > http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >
>
>Naomi S. Altman 814-865-3791 (voice)
>Associate Professor
>Dept. of Statistics 814-863-7114 (fax)
>Penn State University 814-865-1348
(Statistics)
>University Park, PA 16802-2111
Naomi S. Altman 814-865-3791 (voice)
Associate Professor
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348
(Statistics)
University Park, PA 16802-2111