We use DESeq2 package for differential expression quite a bit and I had a quick question regarding analyses without replicates, as this is something we run into when analyzing public datasets from tumors.
After reading, I noticed two potential methods with DESeq to perform a comparative analysis of expression between input samples without replicates...
1) run the DESeq analysis normally with sample name as condition, the algorithm treats your input samples as replicates and all input samples are used to estimate dispersion. then generate log2 normalized counts using counts(dds, normalized = TRUE)
2) use rlog transformation on input samples
I tried both methods and noticed that when looking at a particular gene, if you rank the samples by expression values, they are in the same order. However, the actual log2 values and range of these values are markedly different. The range is generally tighter with rLog, with lower log2 expression differences.
My question is - is there any insight on which method might be better to use for quantitative interpretation? In other words, I would like to be able to answer the question “what is the fold-change gene expression between sample X and sample Y for gene Z?” and am not sure which of the two methods is recommended for data without replicates.
my hesitation to use DEseq()'s LFC method was that I wanted a complete normalized dataframe of counts wherein all values from all samples could be easily extracted in R and compared to one another. when i use DESeq() it appears to only calculate fold changes between two samples at a time and i'm not sure if there is a way to get all sample's normalized counts into a single matrix from which i can calculate LFCs myself with more fluidity, as I can do with the rlog output.
maybe starting with the rlog() to generate a global dataset, then using DESeq() for sample pair comparisons is the way to go since you mention there are more parameters that are used to calculate LFC in the DESeq() method. thank you for the very informative response.