Question: Justification for collapsing technical replicates in DESeq2
2
gravatar for kieranrcampbell
2.9 years ago by
kieranrcampbell30 wrote:

Hi all,

Can someone enlighten me as to the justification for summing counts across technical replicates in DESeq2, especially with respect to the collapseReplicates() function?

I would have thought that statistically the correct thing to do would be to build a column into the design matrix to account for technical replicates and include all samples. This effectively doubles (or x N for N technical replicates) the number of samples you have which obviously "increases" the power and so affects the all-important p-values. On the other hand you are assuming biologically-identical replicates constitute independent samples, but I can't see how you would adjust for large batch effects any other way.

Thanks,

Kieran

ADD COMMENTlink modified 15 months ago by jovel_juan10 • written 2.9 years ago by kieranrcampbell30

I moved my comment below

ADD REPLYlink modified 15 months ago • written 15 months ago by jovel_juan10

The second argument to collapseReplicates is the factor that you want to collapse on. Here you gave it mock/zika and it collapsed to two samples, one per group. You want to instead collapse based on donor.

ADD REPLYlink written 15 months ago by Michael Love24k

comment deleted. Sorry I did not find a way to delete my comment, since it was not a response to the question

ADD REPLYlink modified 15 months ago • written 15 months ago by jovel_juan10

This isn't really an "answer" to the first question. On the support forum, the form at the bottom is "Add your answer" which is really supposed to be used by people who are answering the post at the top.

Either way, see above for my reply.

ADD REPLYlink written 15 months ago by Michael Love24k

And more generally, what would be the consequences of NOT collapsing technical replicates? 

ADD REPLYlink modified 15 months ago • written 15 months ago by jovel_juan10

Not collapsing replicates is not appropriate, in a simple way to describe this: failing to collapse technical replicates and providing these to a DE method is "pretending" you have more independent sample than you really do. You can think of a technical replicate as just more reads from the library of cDNA. You could take a library and split it in 2, again and again, and make many technical replicates. None of these would contain any biological variability, because they are from a single, static library of molecules.

So you can think of an idealized experiment, where you have say, 2 vs 2 biological replicates, which is very under-powered to find any significant differences in expression. But if you make many technical replicates from these, by splitting the reads, and pretend these are independent samples, the DE methods will think you have very low within-group biological variability, and tend to report many genes as DE. It will greatly increase your FPR for the "truly null" genes.

ADD REPLYlink written 15 months ago by Michael Love24k
Answer: Justification for collapsing technical replicates in DESeq2
1
gravatar for Michael Love
2.9 years ago by
Michael Love24k
United States
Michael Love24k wrote:

When you do differential expression across samples, the kind of variability you need to estimate is the variability across biological replicates. So you don't get a gain in power, because it's not helping you to estimate the variability that would go into a test of differential expression across conditions.

Technical replicate variability is small compared to biological replicate variability and the former is well approximated by a Poisson for the large majority of genes (I've looked into SEQC technical replicates and confirmed this to myself recently). Since the technical replicates of a biological replicate aren't helping you to estimate variability across biological replicates at all, it's best to simply add them together, increasing the sequencing depth of the individual biological replicate. Increasing sequencing depth increases power for differential expression, as does increasing the number of biological replicates.

"On the other hand you are assuming biologically-identical replicates constitute independent samples, but I can't see how you would adjust for large batch effects any other way."

I don't follow this last part, can you add a comment to my post which explains this question more?

ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by Michael Love24k

Hi Mike,

Thanks for your answer. The last part relates to an RNA-seq dataset I'm currently working on where batch effect dominates (ie technical replicate variability is large, sadly explains ~80% variance in the data...). So the follow-up questions would be (1) what's the best practice for dealing with dominating technical effects and (2) what's the point in doing technical replicates if we throw away that information by summing over counts?

Thanks,

Kieran

ADD REPLYlink written 2.9 years ago by kieranrcampbell30

For my above answer, a technical replicate is when you produce more sequences from the same library. And I wouldn't expect much variation above Poisson. 

The point of summing is that you increase the sequencing depth for that sample, which improves power by allowing more precise measurement of gene expression, and increases the set of genes which have minimal read counts.

If you prepare a new library, I wouldn't refer to this as a technical replicate.

Regarding what to do about batches, the recommended approach is to add a term which accounts for this sample dependence into the design, e.g. ~ batch + condition. This typically improves power if there are batches.

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Michael Love24k

Okay I think I was just confused by terms here and thought technical replicate and batch (ie independent library prep) were equivalent. Entirely makes sense that if you do different sequencing runs of the same library then just to sum the counts.

ADD REPLYlink written 2.9 years ago by kieranrcampbell30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 224 users visited in the last hour