edgeR: mixing technical replicates from Illumina HiSeq and MiSeq

0

Entering edit mode

Nicolas Delhomme ▴ 320

@nicolas-delhomme-6252

Last seen 5.5 years ago

Sweden

Really nice approach ryan, I?ll keep it in mind :-) Nick, I usually don?t use cutoffs for that no. If I?m unsure (e.g. the effect is not obvious or is minimal - i.e. on the technical variance on the PCA is much smaller than the biological one) I would conduct the analyses for the different approaches and look how they influences the results and then would select the more conservative approach. I know this sounds vague, but making this decision is frequently dependent on how the other samples behave. Hence, what we always do when we make our data and analysis public is that we also make the analysis code public; i.e. we use knit / pandoc to create an HTML document that details every decision we?ve made. Nico --------------------------------------------------------------- Nicolas Delhomme The Street Lab Department of Plant Physiology Ume? Plant Science Center Tel: +46 90 786 5478 Email: nicolas.delhomme at umu.se SLU - Ume? universitet Ume? S-901 87 Sweden --------------------------------------------------------------- On 29 Aug 2014, at 11:29, Nick N <feralmedic at="" gmail.com=""> wrote: > Thanks Ryan and Nicolas! > > I was wondering whether there is some sort of decision tree that can be formalised. > > Nicolas, you would consider 3 options - merging, ignoring or adding a factor. Could you recommend some sort of cut-offs for each choice or is it more of a qualitative decision by looking at plots and PCA? By the way, my data is RNA-Seq - I forgot to mention it. > > Ryan, I would basically ask you the same question. > > > On Fri, Aug 29, 2014 at 9:42 AM, Ryan <rct at="" thompsonclan.org=""> wrote: > Hi Nick, > > Thanks to the underlying theory behind dispersion estimation, you can easily test whether your "technical replicates" really do represent technical replicates. Specifically, read counts in technical replicates should follow a Poisson distribution, which is a special case of the negative binomial with zero dispersion. So, simply fit a model using edgeR or DESeq2 with a separate coefficient for each group of technical replicates. Thus all the experimental variation will be absorbed into the model coefficients and the only thing left will be the technical variability of of the replicates. For true technical replicates, the dispersion should be zero for all genes. So if you estimate dispersions using this model, and plotBCV/plotDispEsts shows the dispersion very near to zero, then you can be confident that you really have technical replicates. If the dispersion is nonzero, then there is some additional source of unaccounted-for variation. > > I have used this method on a pilot dataset with several technical replicates for each condition. edgeR said the dispersion was something like 10^-3 or less for all genes except for the very low-expressed genes. > > -Ryan > > > On 8/28/14, 9:23 AM, Nick N wrote: > Hi, > > I have a study where a fraction of the samples have been replicated on 2 > Illumina platforms (HiSeq and Miseq). These are technical replicates - the > library preparation is the same using the same biological replicates - it's > only the sequencing which is different. > > My hunch was that I shall introduce the platform as as an additional > (blocking) factor in the analysis. Than I stumbled upon this post: > > https://stat.ethz.ch/pipermail/bioconductor/2010-April/033099.html > > It recommends pooling the replicates. The post seems to apply to a > different case ("pure" technical replicates, i.e. no differences in the > sequencing platform used) so I probably shall ignore it. But I still feel a > bit uncertain of the best way to treat the technical replicates. Can you, > please, advise me on this? > > many thanks! > Nick > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >

Sequencing edgeR DESeq2 Sequencing edgeR DESeq2 • 1.1k views

ADD COMMENT • link updated 9.7 years ago by Ryan C. Thompson ★ 7.9k • written 9.7 years ago by Nicolas Delhomme ▴ 320

0

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 8 months ago

Scripps Research, La Jolla, CA

Personally, I see only two possibilities. Either you have true technical replicates with Poisson variance (zero dispersion on technical replicates, as I described earlier) or you don't. In the former case, you merge the technical replicates. In the latter case, you then need to figure out whether the differences between replicates are a consistent effect dependent on sequencing platform, or just random (using PCA plots, etc.) For a consistent effect, you can adjust for it by including a blocking factor in the model. Otherwise I would probably just do a separate analysis for each sequencing platform. As for the threshold between "zero" and "non-zero" dispersion, I'm not really sure what a reasonable threshold is. You have to try it and see. On Fri Aug 29 02:29:15 2014, Nick N wrote: > Thanks Ryan and Nicolas! > > I was wondering whether there is some sort of decision tree that can > be formalised. > > Nicolas, you would consider 3 options - merging, ignoring or adding a > factor. Could you recommend some sort of cut-offs for each choice or > is it more of a qualitative decision by looking at plots and PCA? By > the way, my data is RNA-Seq - I forgot to mention it. > > Ryan, I would basically ask you the same question. > > > On Fri, Aug 29, 2014 at 9:42 AM, Ryan <rct at="" thompsonclan.org=""> <mailto:rct at="" thompsonclan.org="">> wrote: > > Hi Nick, > > Thanks to the underlying theory behind dispersion estimation, you > can easily test whether your "technical replicates" really do > represent technical replicates. Specifically, read counts in > technical replicates should follow a Poisson distribution, which > is a special case of the negative binomial with zero dispersion. > So, simply fit a model using edgeR or DESeq2 with a separate > coefficient for each group of technical replicates. Thus all the > experimental variation will be absorbed into the model > coefficients and the only thing left will be the technical > variability of of the replicates. For true technical replicates, > the dispersion should be zero for all genes. So if you estimate > dispersions using this model, and plotBCV/plotDispEsts shows the > dispersion very near to zero, then you can be confident that you > really have technical replicates. If the dispersion is nonzero, > then there is some additional source of unaccounted-for variation. > > I have used this method on a pilot dataset with several technical > replicates for each condition. edgeR said the dispersion was > something like 10^-3 or less for all genes except for the very > low-expressed genes. > > -Ryan > > > On 8/28/14, 9:23 AM, Nick N wrote: > > Hi, > > I have a study where a fraction of the samples have been > replicated on 2 > Illumina platforms (HiSeq and Miseq). These are technical > replicates - the > library preparation is the same using the same biological > replicates - it's > only the sequencing which is different. > > My hunch was that I shall introduce the platform as as an > additional > (blocking) factor in the analysis. Than I stumbled upon this post: > > https://stat.ethz.ch/__pipermail/bioconductor/2010-__April/033099.html > <https: stat.ethz.ch="" pipermail="" bioconductor="" 2010-april="" 033099.html=""> > > It recommends pooling the replicates. The post seems to apply to a > different case ("pure" technical replicates, i.e. no > differences in the > sequencing platform used) so I probably shall ignore it. But I > still feel a > bit uncertain of the best way to treat the technical > replicates. Can you, > please, advise me on this? > > many thanks! > Nick > > [[alternative HTML version deleted]] > > _________________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/__listinfo/bioconductor > <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: > http://news.gmane.org/gmane.__science.biology.informatics.__conductor > <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> > > >

ADD COMMENT • link 9.7 years ago Ryan C. Thompson ★ 7.9k

Login before adding your answer.