Analyzing RNA-Seq data without replicates using DESeq
Entering edit mode
Last seen 7.0 years ago
European Union

Hello all,

I am working on a RNA-Seq project that was sequenced without any biological replicates.

I know that it is bad practice to use this kind of data, but the researcher didn't seem to have a choice and this is the only data that i have to work with.

I have 4 samples, each one from a different condition, and i am interested in 4 comparisons with different conditions combinations.

I understood from the DESeq vignette that DESeq assumes that the samples that were entered to the analysis are all replicates, in order to estimate the dispersion. With that assumption, is it more accurate to build a data set and perform the normalization, estimations and testing separately for each comparison separately or should i still enter a raw counts table containing all 4 samples, estimate size factor and dispersions and then test each comparison separately?

With regular data with replicates, we usually normalize all samples together, but when assuming that different conditions are replicates, would it be more accurate to completely separate the comparisons?

Thank you very much,


deseq DESeq2 • 7.5k views
Entering edit mode
Last seen 13 hours ago
United States

It seems you've already found the material in ?DESeq which starts with "Experiments without replicates..." and explains how "one may not want to draw strong conclusions from such an analysis, it may still be useful for exploration and hypothesis generation" so I won't go over that again.

I can give an example of the expected outcomes in the no replicate situation for doing comparisons separately vs all samples together. Suppose we have samples A, B and C, where A and B have very few DE (so most points in the MA plot fall on the x-axis), and C has more DE genes compared to A or B. An analysis without replicates of only A and B will give larger Wald statistics and smaller p-values, than an analysis with all 3 samples and then a comparison of A and B using results(). That is because the inclusion of C will increase the dispersion estimates for all genes (because, in an experiment without replicates, true differential expression ends up increasing dispersion estimates). Note that the log2 fold changes will be similar in either analysis.

Entering edit mode

I will take this in consideration,

Thank you very much!

Login before adding your answer.

Traffic: 470 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6