Search
Question: Actually meaning of log2FoldChange, p-value & padj in DESeq2 results
0
4 weeks ago by
hffqyd0
hffqyd0 wrote:

Hi, I'm new to RNA-seq and differentially expressed gene (DEG) analysis using DESeq2, the R package. But I encountered a problem need for help.

The experiment has  samples of 3 group (control, treat1, treat2) with 3 replications each. Having RNA-sequenced, the reads counts were inputed to DESeq2 for DEG analysis, according to DESeq2's manual.

    dds=DESeqDataSetFromMatrix(countData = counts,
colData = samples,
design = ~ group)
dds = DESeq(dds)
result=results(dds, contrast=c("group", "treat1", "control"))


In the result, there was log2FoldChange, p value, padj, etc for each gene. The question is when I compared the log2FoldChange with the raw reads of some genes it seemed that the log2FoldChange compared "control VS treat1" (here treat1 was the base level) or "treat1 VS treat2", instead of "treat1 VS control" (here control was the base level, which was my expectation). So it would be the case that a gene up expressed in treat1 (by looking at the raw reads) was down expressed in the results. Below is a subset of the result.

<style type="text/css"> .tg {border-collapse:collapse;border-spacing:0;border-color:#ccc;} .tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:0px;overflow:hidden;word-break:normal;border-color:#ccc;color:#333;background-color:#fff;} .tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:0px;overflow:hidden;word-break:normal;border-color:#ccc;color:#333;background-color:#f0f0f0;} .tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top} </style>

gene1 104 113 155 518 514 471 0 1 0 121.8857286 99.70389301 129.1534889 489.6363946 454.1863516 426.6076245 0 1.150867377 0 191.3693721 10.14006488 1.185315798 8.554736972 1.18E-17 6.17E-17
gene2 44 36 47 903 1020 1033 2 1 1 51.56703902 31.76407211 39.16267082 853.5553366 901.303655 935.6383781 2.048006023 1.150867377 1.223606775 313.0459591 9.221995767 0.729090182 12.64863524 1.14E-36 1.02E-35
gene3 28 30 38 133 132 119 0 0 0 32.81538847 26.47006009 31.66343599 125.7174527 116.6392965 107.784092 0 0 0 49.00996952 9.133180831 1.200947709 7.604977935 2.85E-14 1.30E-13
gene4 457 573 721 34150 35421 35562 65 65 62 535.5940189 505.5781477 600.7720354 32280.08277 31299.09487 32210.23427 66.56019573 74.80637953 75.86362004 10849.84292 8.790220784 0.109694018 80.13400281 0 0
gene5 2 1 1 14 9 15 473 395 372 2.343956319 0.882335336 0.833248315 13.23341607 7.952679309 13.58623008 484.3534243 454.592614 455.1817202 159.217736 -5.33149575 0.272127405 -19.59191047 1.81E-85 3.54E-84
gene6 34 55 44 158 162 187 3498 3292 2806 39.84725743 48.5284435 36.66292588 149.3485528 143.1482276 169.3750017 3581.962533 3788.655406 3433.44061 1265.663218 -4.548224914 0.088341971 -51.48430413 0 0
gene7 1170 1703 1749 337 339 424 8399 7277 7116 1371.214447 1502.617078 1457.351304 318.5472297 299.5509206 384.0374369 8600.601292 8374.861905 8707.185809 3446.218602 -4.679805864 0.064073627 -73.03794184 0 0
gene8 25 31 29 1194 1279 1196 22834 21585 19438 29.29945399 27.35239543 24.16420115 1128.621342 1130.164093 1083.275412 23382.08476 24841.47234 23784.46849 8381.211387 -4.429334677 0.041503905 -106.7209146 0 0

Could you please tell me why? Was there I'm doing something wrong? Thank you very much!

modified 29 days ago • written 4 weeks ago by hffqyd0

What do you get from colData(dds)?

​I still think you have mixed up the labeling of your samples, as those Log2Fold changes look more like treat1/treat2.

> colData(dds)
DataFrame with 9 rows and 4 columns
sampleID    type    group sizeFactor
<factor> <factor>  <factor>  <numeric>
D3-1     D3-1      T0    control  0.9765596
D3-2     D3-2      T1    control  0.8689098
D3-3     D3-3      T2    control  0.8172560
D2-1     D2-1      T3 trea1  0.8532582
D2-2     D2-2      T4 trea1  1.1333559
D2-3     D2-3      T5 trea1  1.2001224
P2-1     P2-1      T6   treat2  1.0579279
P2-2     P2-2      T7   treat2  1.1316941
P2-3     P2-3      T8   treat2  1.1040590

> samples
sampleID type    group
1     D3-1   T0    control
2     D3-2   T1    control
3     D3-3   T2    control
4     D2-1   T3 trea1
5     D2-2   T4 trea1
6     D2-3   T5 trea1
7     P2-1   T6   treat2
8     P2-2   T7   treat2
9     P2-3   T8   treat2


If I mixed up, how did I? The samples looked right.

It can be mixed up when you attach the column data to the count table. We mention that this has to be done very carefully in the documentation. I’m not saying it’s mixed up but it could be. And trust that the LFC calculation is correct. It’s been the same for 6 years.

The samples in your counts excerpt that you labeled "control" have the size factors associated with treat1 from colData.  Let me guess.  You made the group column by copying and modifying a command line from a tutorial, without understanding what you were doing?

1

Let's try to gently lead the thread to finding the source of the problem.

0
4 weeks ago by
Michael Love20k
United States
Michael Love20k wrote:

I can't really parse that table very well, but you can trust that when you specify contrast it is using the first element to pick which variable, then putting the second element in the numerator and the third element in the denominator of the fold change.

If there are no other variables in the design, then the MLE will be log2 of mean of norm. counts / mean of norm counts.