Fwd: Re: edgeR and sagenhaft
0
0
Entering edit mode
Naomi Altman ★ 6.0k
@naomi-altman-380
Last seen 3.4 years ago
United States
>To: "Mark Robinson" <mrobinson at="" wehi.edu.au=""> >From: Naomi Altman <naomi at="" stat.psu.edu=""> >Subject: Re: [BioC] edgeR and sagenhaft >Cc: >Bcc: >X-Eudora-Signature: <work> >Date: Sun, 15 Feb 2009 11:21:54 -0500 > >Dear Mark, >Thanks for your feedback. > >Here are some comments to the comments: > >These are biological replicates. There is also a technical >replicate that was sequenced using another method, but I did not use >it for this analysis. >(It also has some interesting behavior.) > >For lib.size I use the total number of reads. This is RNA-seq data, >so the number is large. The lib sizes do vary about 20%. >I did the batch tests pairwise exactly as if they were different genotypes. > >edgeR reported about 4x as many differentially expressed genes as >sage.test. But there was almost no overlap with any of the genes >selected as significant by sage.test. > >I am going to resort to my usual advice and look at some of the >counts for genes tagged by each method as differentially >expressed. I will report back if I solve the mystery. > >Thanks, >Naomi > >At 07:35 PM 2/13/2009, you wrote: >>Hi Naomi. >> >>Curious. A bit difficult to diagnose without digging into it. There is >>probably a reasonable explanation for all of this. >> >>For what its worth, a few comments/queries below. >> >> >> > I have 4 large tag datasets A1, A2 and B1, B2. The purpose of the >> > experiment was to determine differences in gene expression between A and >> > B. >> > A1 and B1 were done together as batch 1, and A2 and B2 were done >> > together as batch 2. >> >>First question: are these technical replicates or biological? If >>technical, you may consider the 'doPoisson=TRUE' option of deDGE() since >>that effectively sets r large (dispersion small), making it a Poisson >>calculation. >> >> >> > I several analyses and am completely puzzled. >> > >> > First I ran sage.test (Fisher's exact test) on A1, B1 and on A2, >> > B2. The results were strongly concordant in that there was a lot of >> > overlap in the significant gene list, >> > and the same genes were up/down regulated (on the whole). >> > >> > Then I ran edgeR on all 4 samples. A large number of genes were >> > declared significantly differentially expressed, but it was almost >> > completely disjoint from the genes "found" by sage.test. (Fewer than >> > 10 out of 4000). The $r$ values were strongly clustered around 2, >> > although some were huge. Incidentally, the "exact" component of the >> > output does not seem to be described in ?edgeR, but I understand it >> > to be the p-value from the test. >> >>'r' values around 2 suggest there is significant variation over and above >>Poisson. But, maybe this is due to batch effects. >> >>Indeed, the 'exact' element is the p-value from the exact test proposed in >>the paper. >> >>What do you use for 'lib.size' -- total number of reads? Are they >>drastically different from batch-to-batch/sample-to-sample? How do the >>batch effects manifest -- more total reads giving higher overall counts, >>or something different? >> >> >> > Then I tested for batch effects by using sage.test on A1, A2 and on >> > B1, B2 and finally on A1 U B1 and A2 U B2. A fairly large number of >> > genes showed strong batch effects. These overlapped more with the >> > genotype within batch sage.test results than with the edgeR results. >> >> >>Strong batch effects that aren't explained by total counts would result in >>higher dispersion estimates (lower values of 'r') in edgeR, thus giving >>fewer DE genes. So, maybe this explains some of the lower overlap here. >> >> >> > Just to make things more confusing, the grad student who ran the >> > samples used the normal approximation to the Poisson to test genotype >> > effects within batch. These >> > were highly concordant between batches as well, but did not match the >> > sage.test results. I thought the p-values would be similar at least >> > for genes with large counts, but they were not. >> > >> > I am inclined to go with combining the sage.test results, but any >> > advice would be very welcome >> >> >>Not sure I've really contributed much, but there must be a reasonable >>explanation. >> >>Mark >> >> >> >> >> > >> > Thanks, >> > >> > Naomi S. Altman 814-865-3791 (voice) >> > Associate Professor >> > Dept. of Statistics 814-863-7114 (fax) >> > Penn State University 814-865-1348 (Statistics) >> > University Park, PA 16802-2111 >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor at stat.math.ethz.ch >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> > http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > >Naomi S. Altman 814-865-3791 (voice) >Associate Professor >Dept. of Statistics 814-863-7114 (fax) >Penn State University 814-865-1348 (Statistics) >University Park, PA 16802-2111 Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111
SAGE GO edgeR SAGE GO edgeR • 699 views
ADD COMMENT

Login before adding your answer.

Traffic: 432 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6