Fwd: Re: edgeR and sagenhaft

0

Entering edit mode

Naomi Altman ★ 6.0k

@naomi-altman-380

Last seen 4.8 years ago

United States

>To: "Mark Robinson" <mrobinson at="" wehi.edu.au=""> >From: Naomi Altman <naomi at="" stat.psu.edu=""> >Subject: Re: [BioC] edgeR and sagenhaft >Cc: >Bcc: >X-Eudora-Signature: <work> >Date: Sun, 15 Feb 2009 11:21:54 -0500 > >Dear Mark, >Thanks for your feedback. > >Here are some comments to the comments: > >These are biological replicates. There is also a technical >replicate that was sequenced using another method, but I did not use >it for this analysis. >(It also has some interesting behavior.) > >For lib.size I use the total number of reads. This is RNA-seq data, >so the number is large. The lib sizes do vary about 20%. >I did the batch tests pairwise exactly as if they were different genotypes. > >edgeR reported about 4x as many differentially expressed genes as >sage.test. But there was almost no overlap with any of the genes >selected as significant by sage.test. > >I am going to resort to my usual advice and look at some of the >counts for genes tagged by each method as differentially >expressed. I will report back if I solve the mystery. > >Thanks, >Naomi > >At 07:35 PM 2/13/2009, you wrote: >>Hi Naomi. >> >>Curious. A bit difficult to diagnose without digging into it. There is >>probably a reasonable explanation for all of this. >> >>For what its worth, a few comments/queries below. >> >> >> > I have 4 large tag datasets A1, A2 and B1, B2. The purpose of the >> > experiment was to determine differences in gene expression between A and >> > B. >> > A1 and B1 were done together as batch 1, and A2 and B2 were done >> > together as batch 2. >> >>First question: are these technical replicates or biological? If >>technical, you may consider the 'doPoisson=TRUE' option of deDGE() since >>that effectively sets r large (dispersion small), making it a Poisson >>calculation. >> >> >> > I several analyses and am completely puzzled. >> > >> > First I ran sage.test (Fisher's exact test) on A1, B1 and on A2, >> > B2. The results were strongly concordant in that there was a lot of >> > overlap in the significant gene list, >> > and the same genes were up/down regulated (on the whole). >> > >> > Then I ran edgeR on all 4 samples. A large number of genes were >> > declared significantly differentially expressed, but it was almost >> > completely disjoint from the genes "found" by sage.test. (Fewer than >> > 10 out of 4000). The $r$ values were strongly clustered around 2, >> > although some were huge. Incidentally, the "exact" component of the >> > output does not seem to be described in ?edgeR, but I understand it >> > to be the p-value from the test. >> >>'r' values around 2 suggest there is significant variation over and above >>Poisson. But, maybe this is due to batch effects. >> >>Indeed, the 'exact' element is the p-value from the exact test proposed in >>the paper. >> >>What do you use for 'lib.size' -- total number of reads? Are they >>drastically different from batch-to-batch/sample-to-sample? How do the >>batch effects manifest -- more total reads giving higher overall counts, >>or something different? >> >> >> > Then I tested for batch effects by using sage.test on A1, A2 and on >> > B1, B2 and finally on A1 U B1 and A2 U B2. A fairly large number of >> > genes showed strong batch effects. These overlapped more with the >> > genotype within batch sage.test results than with the edgeR results. >> >> >>Strong batch effects that aren't explained by total counts would result in >>higher dispersion estimates (lower values of 'r') in edgeR, thus giving >>fewer DE genes. So, maybe this explains some of the lower overlap here. >> >> >> > Just to make things more confusing, the grad student who ran the >> > samples used the normal approximation to the Poisson to test genotype >> > effects within batch. These >> > were highly concordant between batches as well, but did not match the >> > sage.test results. I thought the p-values would be similar at least >> > for genes with large counts, but they were not. >> > >> > I am inclined to go with combining the sage.test results, but any >> > advice would be very welcome >> >> >>Not sure I've really contributed much, but there must be a reasonable >>explanation. >> >>Mark >> >> >> >> >> > >> > Thanks, >> > >> > Naomi S. Altman 814-865-3791 (voice) >> > Associate Professor >> > Dept. of Statistics 814-863-7114 (fax) >> > Penn State University 814-865-1348 (Statistics) >> > University Park, PA 16802-2111 >> > >> > _______________________________________________ >> > Bioconductor mailing list >> > Bioconductor at stat.math.ethz.ch >> > https://stat.ethz.ch/mailman/listinfo/bioconductor >> > Search the archives: >> > http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > >Naomi S. Altman 814-865-3791 (voice) >Associate Professor >Dept. of Statistics 814-863-7114 (fax) >Penn State University 814-865-1348 (Statistics) >University Park, PA 16802-2111 Naomi S. Altman 814-865-3791 (voice) Associate Professor Dept. of Statistics 814-863-7114 (fax) Penn State University 814-865-1348 (Statistics) University Park, PA 16802-2111

SAGE GO edgeR SAGE GO edgeR • 819 views

ADD COMMENT • link 16.9 years ago Naomi Altman ★ 6.0k

Login before adding your answer.