Question on the fold change estimate
2
0
Entering edit mode
@ea24864c
Last seen 2.6 years ago
United States

Dear Michael,

I have a raw count dataset with three conditions (A,B,C) and each condition has 2-3 biological replicates. I calculated the normalized counts (not log2 transformed) in DESeq2, and for instance the following lists the normalized count of a gene for each replicate.

#        A1        A2        A3        B1        B2        B3       C1       C2  
#108.74237   86.54679  109.39501  562.57695  570.28352  481.73985 1537.51446 1571.32569

I tried to estimate the fold change between C and A simply by calculating the ratio. That is, the mean of normalized counts in C divided by the mean of normalized counts in A is 15.31.

I also tried transforming the raw counts through the rlog method (set blind = FALSE), and the following gives the rlog values for the same gene in each replicate:

#        A1        A2        A3        B1        B2        B3       C1       C2 
# 7.956540  7.925518  7.983866  9.034676  9.043481  8.896545 10.052179 10.091817

I estimated the log2 fold change (C vs A) based on the rlog values, that, the mean of rlog values in C minus that in A. The resulting fold change estimate will be 4.34, much less than 15.31 above.

I understand that DESeq2 shrinks the log fold change to take care of genes of low counts, so it’s not simply the ratio of normalized counts. I also calculated the shrunken fold change (that is, 15.24) in DESeq2 and compared that to the one (15.31) I calculated previously, and they were very close.

After looking at the manual and paper, I thought log fold change shrinkage and the rlog method were conceptually consistent since they both tried to minimize the differences among samples of genes with low counts. But I am wondering why the fold change estimate using the rlog values will turn out to be so different. I would appreciate it very much that you would give any comments or correct my thought.

Also, the goal of my analysis to identify expression patterns of genes among the three conditions. I think transformed data from either rlog or vst would be a better choice to be used for the hierarchical clustering of the genes as the dependence of variance on the mean is removed and this might lead to more accurate clustering than using the normalized counts (without variance stabilization). Wondering if my understanding is correct.

Thanks so much for your help in advance!

DESeq2 • 2.7k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 10 hours ago
United States

I estimated the log2 fold change (C vs A) based on the rlog values, that, the mean of rlog values in C divided by that in A. The resulting fold change estimate will be 4.34, much less than 15.31 above.

rlog is on the log2 scale, so you should subtract if you wanted to compare.

See the vignette for details on the arguments, you can use e.g. ashr with contrast to compare pairs of conditions.

I understand that DESeq2 shrinks the log fold change to take care of genes of low counts, so it’s not simply the ratio of normalized counts. I also calculated the shrunken fold change (that is, 15.24) in DESeq2 and compared that to the one (15.31) I calculated previously, and they were very close.

Yes, we recommend using lfcShrink with either apeglm or ashr. Ashr would allow you to do pairwise comparisons here using contrast.

Also, the goal of my analysis to identify expression patterns of genes among the three conditions. I think transformed data from either rlog or vst would be a better choice to be used for the hierarchical clustering of the genes as the dependence of variance on the mean is removed and this might lead to more accurate clustering than using the normalized counts (without variance stabilization). Wondering if my understanding is correct.

Yes, I'd recommend VST then clustering. We prefer VST a bit over rlog now, but both are options. VST is faster.

ADD COMMENT
0
Entering edit mode

rlog is on the log2 scale, so you should subtract if you wanted to compare

Thanks Michael! Sorry for the typo. Yes, I meant the subtraction and corrected the original text. When using rlog values, the fold change estimate is 4.34, very different from the shrunken fold change (15.24) or the estimate from the ratio of normalized counts (15.31). Wondering if it’s expected that the rlog transformation will largely change the fold change of each gene even with moderate or high counts but the expression patterns (trend) across samples will always be reserved no matter what transformation (rlog or vst) is applied. I thought that the estimate of the fold change from rlog values or normalized counts should be similar, but I was wrong. I will surely use lfcshrink to calculate the log2 fold change for each gene, but just curious about the discrepancy in the example I provided.

ADD REPLY
0
Entering edit mode

Right, so the LFC is ~3.9 and the difference in rlog values is ~2.

Right, we don't recommend the rlog for LFC calculation, this and the VST (VST is now the recommended transformation generally) are for computing sample distances.

What do you get with VST by the way?

ADD REPLY
0
Entering edit mode
swbarnes2 ★ 1.3k
@swbarnes2-14086
Last seen 6 hours ago
San Diego

If you want to kind of replicate the fold change on your own, take the normalized counts, (not rlogged) get the ratio of the averages of your groups, and compare that to the unshrunken fold change. Unless you've got a whopper of another element in your design, the two values should be quite close.

rlogged and vst values are not used in log fold calculations.

ADD COMMENT

Login before adding your answer.

Traffic: 652 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6