Question

Fwd: Outliers in RNA seq analysis using DESeq2

0

Entering edit mode

Emma Quinn ▴ 20

@emma-quinn-5516

Last seen 9.6 years ago

Hi I've conducted a 2 condition RNA seq experiment using "disease" versus "control" cells. I have 16 biological replicates in my disease group and 11 in my control. I'm using DESeq2 v 1.0.9 for the analysis. >From the heatmap and pca plots (attached) its clear that there's some variability amongst the biological replicates in my groups which id expect, but also 6 of my disease samples seem to cluster closely with the controls. All of the samples in each group were prepared in the same way and sequenced together and I can't identify any obvious batch effect that could be contributing to this. I don't have much experience analysing this kind of data and my statistics knowledge is also unfortunately somewhat lacking but I'm wondering if anyone has any experience with regards how well biological replicates from RNA seq data usually cluster together? I'm not sure if its more appropriate to drop these 6 samples and continue the analysis with 10 V 11 in each group or leave them in as perhaps this is more representative of variability of the disease biology. I'd appreciate any advice anyone has! Thanks in advance Emma

DESeq2 DESeq2 • 964 views

ADD COMMENT • link updated 11.0 years ago by Michael Love 41k • written 11.0 years ago by Emma Quinn ▴ 20

score 0 · Answer 1 · 2013-05-08

hi Emma, On Wed, May 8, 2013 at 5:24 PM, Emma Quinn <emmamquinn@googlemail.com>wrote: > Hi > > I've conducted a 2 condition RNA seq experiment using "disease" versus > "control" cells. I have 16 biological replicates in my disease group and 11 > in my control. I'm using DESeq2 v 1.0.9 for the analysis. > > >From the heatmap and pca plots (attached) its clear that there's some > variability amongst the biological replicates in my groups which id expect, > but also 6 of my disease samples seem to cluster closely with the controls. > I would try to follow up with more sample preparation information to help explain these 6 samples. Are the size factors and/or total number of mapped reads different for these? You might also want to run some QA packages such as qa() from the ShortRead package. > All of the samples in each group were prepared in the same way and > sequenced together and I can't identify any obvious batch effect that could > be contributing to this. > Were all samples sequenced at the same time, or in different runs? Were the groups balanced across the runs? > > I don't have much experience analysing this kind of data and my statistics > knowledge is also unfortunately somewhat lacking but I'm wondering if > anyone has any experience with regards how well biological replicates from > RNA seq data usually cluster together? I'm not sure if its more > appropriate to drop these 6 samples and continue the analysis with 10 V 11 > in each group or leave them in as perhaps this is more > representative of variability of the disease biology. > It's not appropriate to drop some of the disease samples after seeing they cluster with control. As you can imagine, this could lead to every experiment with enough samples generating significant differences. But I would try to follow up and see what preparation steps might have been different with these. It might be possible to then deal with batch effects by including these variables (for example date of run) as terms in the model, or first running a normalization package such as cqn or EDASeq and then passing this information as a normalization factor as described in the Appendix of the vignette. Mike [[alternative HTML version deleted]]