Question: Variation between technical replicates (DESeq2, batch effects)
0
gravatar for Nik dAK
11 weeks ago by
Nik dAK10
Nik dAK10 wrote:

Hi all,

I read a lot about testing on datasets with batch effects, but in all cases the effect is on biological replicates and not technical replicates.

Just some quick terms for a better understanding: technical replicates: same sample, same RNA isolation and same library prep, just loaded 2x on the machine (for read depth) biological replicates: different samples (e.g. celllines) of same group (e.g. genotype) and different RNA isolation, but same library prep and loading onto the machine

I noticed two different types of variation between technical replicates:

A) systematic/plane shift in PCA --> batch effect due to different sequencing runs (see image A below)

B) dispersed --> small random technical variability within one sequencing run (see image B below)

Normally one would expect the variation between technical replicates to be small and non-systematic in the PCA (B).

Now I had the event of a batch effect between technical replicates (A). The experiment was design with 6x biological replicates (6 different samples for each group of interest, colored dots in PCA) and 2x technical replicates for each biological replicate (samples connected via line in PCA). The technical replicates were on two different sequencing runs.

The general approach for (B) is to simply add the technical replicates together and do the test between groups. As discussed in https://support.bioconductor.org/p/85536/.

Now for the case of a batch effect between technical replicates (A), it gets a bit ambiguous for me.

X) Ignore the batch effect and simply sum up and test (I feel bad about this)

Y) Not merge the technical replicates together, but test using a design including the batch as covariate (~genotype+batch)

Z) Test the two runs of technical replicates individually and keep the intersect of significant genes (probably loss of power due to lower library sizes and sample number)

Findings: Y results in a major increase in identified significant genes compared to X.

Which way would be the best to handle this situation? Will the fact that I am not summing up the technical replicates (y) be a problem?

And how much technical variation (without a batch effect, case B) can be "ignored" before again proceeding with one of the approaches X,Y,Z?

Additional note: I also tried SVA, but noticed that the first surrogate vector corresponds exactly to the batchrun covariate.

Thank you very much for your help!

EDIT (correct links) A-batcheffectPCA: https://ibb.co/VYpxnxh B-techvariancePCA: https://ibb.co/SQnWnjr

ADD COMMENTlink modified 10 weeks ago by Michael Love26k • written 11 weeks ago by Nik dAK10
Answer: Variation between technical replicates (DESeq2, batch effects)
3
gravatar for Michael Love
10 weeks ago by
Michael Love26k
United States
Michael Love26k wrote:

There is a big problem with your approach "Y", which is that you are doubling the sample size but you don't have double the samples. The variation appears more significant to the model for the across-sample comparisons than it truly is.

An extreme example:

> a <- rnorm(3)
> b <- rnorm(3)
> t.test(a, b)$p.value
[1] 0.7956494
> t.test(rep(a,each=50), rep(b,each=50))$p.value
[1] 0.01723657

You could use random effect models to account for multiple technical measurements on the same unit (we only support fixed effects in DESeq2), and then test across units. Summing the technical replicates is a simpler approach that's available to you in DESeq2, so I would use X over Y and Z. I believe you could also use limma-voom's duplicateCorrelation() function in their RNA-seq pipeline to model the multiple technical samples.

ADD COMMENTlink written 10 weeks ago by Michael Love26k

Thank you very much, this is very helpful!

ADD REPLYlink written 10 weeks ago by Nik dAK10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 222 users visited in the last hour