Question

DEseq - treating input samples as replicates vs. rlog transformation

1

Entering edit mode

np ▴ 10

@np-8420

Last seen 8.3 years ago

United States

We use DESeq2 package for differential expression quite a bit and I had a quick question regarding analyses without replicates, as this is something we run into when analyzing public datasets from tumors.

After reading, I noticed two potential methods with DESeq to perform a comparative analysis of expression between input samples without replicates...

1) run the DESeq analysis normally with sample name as condition, the algorithm treats your input samples as replicates and all input samples are used to estimate dispersion. then generate log2 normalized counts using counts(dds, normalized = TRUE)
2) use rlog transformation on input samples

I tried both methods and noticed that when looking at a particular gene, if you rank the samples by expression values, they are in the same order. However, the actual log2 values and range of these values are markedly different. The range is generally tighter with rLog, with lower log2 expression differences.

My question is - is there any insight on which method might be better to use for quantitative interpretation? In other words, I would like to be able to answer the question “what is the fold-change gene expression between sample X and sample Y for gene Z?” and am not sure which of the two methods is recommended for data without replicates.

deseq2 • 1.7k views

ADD COMMENT • link updated 8.8 years ago by Michael Love 41k • written 8.8 years ago by np ▴ 10

score 2 · Answer 1 · 2015-07-18

2

Entering edit mode

Michael Love 41k

@mikelove

Last seen 1 day ago

United States

hi np,

"run the DESeq analysis normally with sample name as condition, the algorithm treats your input samples as replicates and all input samples are used to estimate dispersion. then generate log2 normalized counts using counts(dds, normalized = TRUE)"

this last part doesn't make sense. You are just producing (count/size factor) / (count/size factor), with this approach so you're only using one (size factor) of the many parameters estimated by DESeq(). You could skip DESeq() altogether and just run estimateSizeFactor for this approach, but I would instead recommend:

DESeq() produces robust LFC in the results() table, why don't you use these?

We haven't done a comparison of these fold changes vs rlog() for experiments without replicates. They will not necessarily be identical because the exact implementation is not identical, although the approach is very similar.

ADD COMMENT • link 8.8 years ago Michael Love 41k

0

Entering edit mode

my hesitation to use DEseq()'s LFC method was that I wanted a complete normalized dataframe of counts wherein all values from all samples could be easily extracted in R and compared to one another. when i use DESeq() it appears to only calculate fold changes between two samples at a time and i'm not sure if there is a way to get all sample's normalized counts into a single matrix from which i can calculate LFCs myself with more fluidity, as I can do with the rlog output.

maybe starting with the rlog() to generate a global dataset, then using DESeq() for sample pair comparisons is the way to go since you mention there are more parameters that are used to calculate LFC in the DESeq() method. thank you for the very informative response.

ADD REPLY • link 8.8 years ago np ▴ 10

score 1 · Answer 2 · 2015-07-17

1

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 14 months ago

United States

I'd use the data coming out of the rlog transformation for the purposes you describe.

I'd imagine the differences among the data between the two methods are more pronounced when working with genes from the lower part of the expression spectrum, is that right?

ADD COMMENT • link 8.8 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

i do notice the difference between methods are more significant when working with lowly expressed genes.

ADD REPLY • link 8.8 years ago np ▴ 10