Hi
I've conducted a 2 condition RNA seq experiment using "disease" versus
"control" cells. I have 16 biological replicates in my disease group
and 11
in my control. I'm using DESeq2 v 1.0.9 for the analysis.
>From the heatmap and pca plots (attached) its clear that there's
some
variability amongst the biological replicates in my groups which id
expect,
but also 6 of my disease samples seem to cluster closely with the
controls.
All of the samples in each group were prepared in the same way and
sequenced together and I can't identify any obvious batch effect that
could
be contributing to this.
I don't have much experience analysing this kind of data and my
statistics
knowledge is also unfortunately somewhat lacking but I'm wondering if
anyone has any experience with regards how well biological replicates
from
RNA seq data usually cluster together? I'm not sure if its more
appropriate to drop these 6 samples and continue the analysis with 10
V 11
in each group or leave them in as perhaps this is more
representative of variability of the disease biology.
I'd appreciate any advice anyone has! Thanks in advance
Emma
hi Emma,
On Wed, May 8, 2013 at 5:24 PM, Emma Quinn
<emmamquinn@googlemail.com>wrote:
> Hi
>
> I've conducted a 2 condition RNA seq experiment using "disease"
versus
> "control" cells. I have 16 biological replicates in my disease group
and 11
> in my control. I'm using DESeq2 v 1.0.9 for the analysis.
>
> >From the heatmap and pca plots (attached) its clear that there's
some
> variability amongst the biological replicates in my groups which id
expect,
> but also 6 of my disease samples seem to cluster closely with the
controls.
>
I would try to follow up with more sample preparation information to
help
explain these 6 samples. Are the size factors and/or total number of
mapped reads different for these? You might also want to run some QA
packages such as qa() from the ShortRead package.
> All of the samples in each group were prepared in the same way and
> sequenced together and I can't identify any obvious batch effect
that could
> be contributing to this.
>
Were all samples sequenced at the same time, or in different runs?
Were
the groups balanced across the runs?
>
> I don't have much experience analysing this kind of data and my
statistics
> knowledge is also unfortunately somewhat lacking but I'm wondering
if
> anyone has any experience with regards how well biological
replicates from
> RNA seq data usually cluster together? I'm not sure if its more
> appropriate to drop these 6 samples and continue the analysis with
10 V 11
> in each group or leave them in as perhaps this is more
> representative of variability of the disease biology.
>
It's not appropriate to drop some of the disease samples after seeing
they
cluster with control. As you can imagine, this could lead to every
experiment with enough samples generating significant differences.
But I
would try to follow up and see what preparation steps might have been
different with these. It might be possible to then deal with batch
effects
by including these variables (for example date of run) as terms in the
model, or first running a normalization package such as cqn or EDASeq
and
then passing this information as a normalization factor as described
in the
Appendix of the vignette.
Mike
[[alternative HTML version deleted]]