Search
Question: Analyzing RNA-Seq data without replicates using DESeq
1
gravatar for solgakar@bi.technion.ac.il
3.2 years ago by
European Union

Hello all,

I am working on a RNA-Seq project that was sequenced without any biological replicates.

I know that it is bad practice to use this kind of data, but the researcher didn't seem to have a choice and this is the only data that i have to work with.

I have 4 samples, each one from a different condition, and i am interested in 4 comparisons with different conditions combinations.

I understood from the DESeq vignette that DESeq assumes that the samples that were entered to the analysis are all replicates, in order to estimate the dispersion. With that assumption, is it more accurate to build a data set and perform the normalization, estimations and testing separately for each comparison separately or should i still enter a raw counts table containing all 4 samples, estimate size factor and dispersions and then test each comparison separately?

With regular data with replicates, we usually normalize all samples together, but when assuming that different conditions are replicates, would it be more accurate to completely separate the comparisons?

Thank you very much,

Olga.

ADD COMMENTlink modified 3.2 years ago by Michael Love14k • written 3.2 years ago by solgakar@bi.technion.ac.il50
3
gravatar for Michael Love
3.2 years ago by
Michael Love14k
United States
Michael Love14k wrote:

It seems you've already found the material in ?DESeq which starts with "Experiments without replicates..." and explains how "one may not want to draw strong conclusions from such an analysis, it may still be useful for exploration and hypothesis generation" so I won't go over that again.

I can give an example of the expected outcomes in the no replicate situation for doing comparisons separately vs all samples together. Suppose we have samples A, B and C, where A and B have very few DE (so most points in the MA plot fall on the x-axis), and C has more DE genes compared to A or B. An analysis without replicates of only A and B will give larger Wald statistics and smaller p-values, than an analysis with all 3 samples and then a comparison of A and B using results(). That is because the inclusion of C will increase the dispersion estimates for all genes (because, in an experiment without replicates, true differential expression ends up increasing dispersion estimates). Note that the log2 fold changes will be similar in either analysis.

ADD COMMENTlink written 3.2 years ago by Michael Love14k

I will take this in consideration,

Thank you very much!

ADD REPLYlink written 3.2 years ago by solgakar@bi.technion.ac.il50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 149 users visited in the last hour