Search
Question: DEseq - treating input samples as replicates vs. rlog transformation
1
gravatar for np
2.9 years ago by
np10
United States
np10 wrote:

We use DESeq2 package for differential expression quite a bit and I had a quick question regarding analyses without replicates, as this is something we run into when analyzing public datasets from tumors.

After reading, I noticed two potential methods with DESeq to perform a comparative analysis of expression between input samples without replicates...

1) run the DESeq analysis normally with sample name as condition, the algorithm treats your input samples as replicates and all input samples are used to estimate dispersion.  then generate log2 normalized counts using counts(dds, normalized = TRUE)
2) use rlog transformation on input samples

I tried both methods and noticed that when looking at a particular gene, if you rank the samples by expression values, they are in the same order.   However, the actual log2 values and range of these values are markedly different.  The range is generally tighter with rLog, with lower log2 expression differences.

My question is - is there any insight on which method might be better to use for quantitative interpretation?  In other words, I would like to be able to answer the question “what is the fold-change gene expression between sample X and sample Y for gene Z?” and am not sure which of the two methods is recommended for data without replicates.
 

ADD COMMENTlink modified 2.9 years ago by Michael Love17k • written 2.9 years ago by np10
2
gravatar for Michael Love
2.9 years ago by
Michael Love17k
United States
Michael Love17k wrote:

hi np,

"run the DESeq analysis normally with sample name as condition, the algorithm treats your input samples as replicates and all input samples are used to estimate dispersion.  then generate log2 normalized counts using counts(dds, normalized = TRUE)"

this last part doesn't make sense. You are just producing (count/size factor) / (count/size factor), with this approach so you're only using one (size factor) of the many parameters estimated by DESeq(). You could skip DESeq() altogether and just run estimateSizeFactor for this approach, but I would instead recommend:

DESeq() produces robust LFC in the results() table, why don't you use these?

We haven't done a comparison of these fold changes vs rlog() for experiments without replicates. They will not necessarily be identical because the exact implementation is not identical, although the approach is very similar.

ADD COMMENTlink written 2.9 years ago by Michael Love17k

my hesitation to use DEseq()'s LFC method was that I wanted a complete normalized dataframe of counts wherein all values from all samples could be easily extracted in R and compared to one another.  when i use DESeq() it appears to only calculate fold changes between two samples at a time and i'm not sure if there is a way to get all sample's normalized counts into a single matrix from which i can calculate LFCs myself with more fluidity, as I can do with the rlog output.

maybe starting with the rlog() to generate a global dataset, then using DESeq() for sample pair comparisons is the way to go since you mention there are more parameters that are used to calculate LFC in the DESeq() method.  thank you for the very informative response.

 

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by np10
1
gravatar for Steve Lianoglou
2.9 years ago by
Denali
Steve Lianoglou12k wrote:

I'd use the data coming out of the rlog transformation for the purposes you describe. 

I'd imagine the differences among the data between the two methods are more pronounced when working with genes from the lower part of the expression spectrum, is that right?

ADD COMMENTlink written 2.9 years ago by Steve Lianoglou12k

i do notice the difference between methods are more significant when working with lowly expressed genes.

ADD REPLYlink written 2.9 years ago by np10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 146 users visited in the last hour