Entering edit mode
Really nice approach ryan, I?ll keep it in mind :-)
Nick, I usually don?t use cutoffs for that no. If I?m unsure (e.g. the
effect is not obvious or is minimal - i.e. on the technical variance
on the PCA is much smaller than the biological one) I would conduct
the analyses for the different approaches and look how they influences
the results and then would select the more conservative approach. I
know this sounds vague, but making this decision is frequently
dependent on how the other samples behave. Hence, what we always do
when we make our data and analysis public is that we also make the
analysis code public; i.e. we use knit / pandoc to create an HTML
document that details every decision we?ve made.
Nico
---------------------------------------------------------------
Nicolas Delhomme
The Street Lab
Department of Plant Physiology
Ume? Plant Science Center
Tel: +46 90 786 5478
Email: nicolas.delhomme at umu.se
SLU - Ume? universitet
Ume? S-901 87 Sweden
---------------------------------------------------------------
On 29 Aug 2014, at 11:29, Nick N <feralmedic at="" gmail.com=""> wrote:
> Thanks Ryan and Nicolas!
>
> I was wondering whether there is some sort of decision tree that can
be formalised.
>
> Nicolas, you would consider 3 options - merging, ignoring or adding
a factor. Could you recommend some sort of cut-offs for each choice or
is it more of a qualitative decision by looking at plots and PCA? By
the way, my data is RNA-Seq - I forgot to mention it.
>
> Ryan, I would basically ask you the same question.
>
>
> On Fri, Aug 29, 2014 at 9:42 AM, Ryan <rct at="" thompsonclan.org="">
wrote:
> Hi Nick,
>
> Thanks to the underlying theory behind dispersion estimation, you
can easily test whether your "technical replicates" really do
represent technical replicates. Specifically, read counts in technical
replicates should follow a Poisson distribution, which is a special
case of the negative binomial with zero dispersion. So, simply fit a
model using edgeR or DESeq2 with a separate coefficient for each group
of technical replicates. Thus all the experimental variation will be
absorbed into the model coefficients and the only thing left will be
the technical variability of of the replicates. For true technical
replicates, the dispersion should be zero for all genes. So if you
estimate dispersions using this model, and plotBCV/plotDispEsts shows
the dispersion very near to zero, then you can be confident that you
really have technical replicates. If the dispersion is nonzero, then
there is some additional source of unaccounted-for variation.
>
> I have used this method on a pilot dataset with several technical
replicates for each condition. edgeR said the dispersion was something
like 10^-3 or less for all genes except for the very low-expressed
genes.
>
> -Ryan
>
>
> On 8/28/14, 9:23 AM, Nick N wrote:
> Hi,
>
> I have a study where a fraction of the samples have been replicated
on 2
> Illumina platforms (HiSeq and Miseq). These are technical replicates
- the
> library preparation is the same using the same biological replicates
- it's
> only the sequencing which is different.
>
> My hunch was that I shall introduce the platform as as an additional
> (blocking) factor in the analysis. Than I stumbled upon this post:
>
> https://stat.ethz.ch/pipermail/bioconductor/2010-April/033099.html
>
> It recommends pooling the replicates. The post seems to apply to a
> different case ("pure" technical replicates, i.e. no differences in
the
> sequencing platform used) so I probably shall ignore it. But I still
feel a
> bit uncertain of the best way to treat the technical replicates. Can
you,
> please, advise me on this?
>
> many thanks!
> Nick
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>