Question

Analyzing RNA-Seq data without replicates using DESeq

1

Entering edit mode

solgakar@bi.technion.ac.il ▴ 90

@solgakarbitechnionacil-6453

Last seen 7.3 years ago

European Union

Hello all,

I am working on a RNA-Seq project that was sequenced without any biological replicates.

I know that it is bad practice to use this kind of data, but the researcher didn't seem to have a choice and this is the only data that i have to work with.

I have 4 samples, each one from a different condition, and i am interested in 4 comparisons with different conditions combinations.

I understood from the DESeq vignette that DESeq assumes that the samples that were entered to the analysis are all replicates, in order to estimate the dispersion. With that assumption, is it more accurate to build a data set and perform the normalization, estimations and testing separately for each comparison separately or should i still enter a raw counts table containing all 4 samples, estimate size factor and dispersions and then test each comparison separately?

With regular data with replicates, we usually normalize all samples together, but when assuming that different conditions are replicates, would it be more accurate to completely separate the comparisons?

Thank you very much,

Olga.

deseq DESeq2 • 7.6k views

ADD COMMENT • link updated 9.9 years ago by Michael Love 42k • written 9.9 years ago by solgakar@bi.technion.ac.il ▴ 90

score 3 · Answer 1 · 2014-09-18

It seems you've already found the material in ?DESeq which starts with "Experiments without replicates..." and explains how "one may not want to draw strong conclusions from such an analysis, it may still be useful for exploration and hypothesis generation" so I won't go over that again.

I can give an example of the expected outcomes in the no replicate situation for doing comparisons separately vs all samples together. Suppose we have samples A, B and C, where A and B have very few DE (so most points in the MA plot fall on the x-axis), and C has more DE genes compared to A or B. An analysis without replicates of only A and B will give larger Wald statistics and smaller p-values, than an analysis with all 3 samples and then a comparison of A and B using results(). That is because the inclusion of C will increase the dispersion estimates for all genes (because, in an experiment without replicates, true differential expression ends up increasing dispersion estimates). Note that the log2 fold changes will be similar in either analysis.