Search
Question: Actually meaning of log2FoldChange, p-value & padj in DESeq2 results
0
gravatar for hffqyd
4 weeks ago by
hffqyd0
hffqyd0 wrote:

Hi, I'm new to RNA-seq and differentially expressed gene (DEG) analysis using DESeq2, the R package. But I encountered a problem need for help.

The experiment has  samples of 3 group (control, treat1, treat2) with 3 replications each. Having RNA-sequenced, the reads counts were inputed to DESeq2 for DEG analysis, according to DESeq2's manual.

    dds=DESeqDataSetFromMatrix(countData = counts,
                                 colData = samples,
                               design = ~ group)
    dds = DESeq(dds)
    result=results(dds, contrast=c("group", "treat1", "control"))

In the result, there was log2FoldChange, p value, padj, etc for each gene. The question is when I compared the log2FoldChange with the raw reads of some genes it seemed that the log2FoldChange compared "control VS treat1" (here treat1 was the base level) or "treat1 VS treat2", instead of "treat1 VS control" (here control was the base level, which was my expectation). So it would be the case that a gene up expressed in treat1 (by looking at the raw reads) was down expressed in the results. Below is a subset of the result.
 

<style type="text/css"> .tg {border-collapse:collapse;border-spacing:0;border-color:#ccc;} .tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:0px;overflow:hidden;word-break:normal;border-color:#ccc;color:#333;background-color:#fff;} .tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:0px;overflow:hidden;word-break:normal;border-color:#ccc;color:#333;background-color:#f0f0f0;} .tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top} </style>

symbol control-1_reads control-2_reads control-3_reads treat1-1_reads treat1-2_reads treat1-3_reads treat2-1_reads treat2-2_reads treat2-3_reads control-1_norm control-2_norm control-3_norm treat1-1_norm treat1-2_norm treat1-3_norm treat2-1_norm treat2-2_norm treat2-3_norm baseMean log2FoldChange lfcSE stat pvalue padj
gene1 104 113 155 518 514 471 0 1 0 121.8857286 99.70389301 129.1534889 489.6363946 454.1863516 426.6076245 0 1.150867377 0 191.3693721 10.14006488 1.185315798 8.554736972 1.18E-17 6.17E-17
gene2 44 36 47 903 1020 1033 2 1 1 51.56703902 31.76407211 39.16267082 853.5553366 901.303655 935.6383781 2.048006023 1.150867377 1.223606775 313.0459591 9.221995767 0.729090182 12.64863524 1.14E-36 1.02E-35
gene3 28 30 38 133 132 119 0 0 0 32.81538847 26.47006009 31.66343599 125.7174527 116.6392965 107.784092 0 0 0 49.00996952 9.133180831 1.200947709 7.604977935 2.85E-14 1.30E-13
gene4 457 573 721 34150 35421 35562 65 65 62 535.5940189 505.5781477 600.7720354 32280.08277 31299.09487 32210.23427 66.56019573 74.80637953 75.86362004 10849.84292 8.790220784 0.109694018 80.13400281 0 0
gene5 2 1 1 14 9 15 473 395 372 2.343956319 0.882335336 0.833248315 13.23341607 7.952679309 13.58623008 484.3534243 454.592614 455.1817202 159.217736 -5.33149575 0.272127405 -19.59191047 1.81E-85 3.54E-84
gene6 34 55 44 158 162 187 3498 3292 2806 39.84725743 48.5284435 36.66292588 149.3485528 143.1482276 169.3750017 3581.962533 3788.655406 3433.44061 1265.663218 -4.548224914 0.088341971 -51.48430413 0 0
gene7 1170 1703 1749 337 339 424 8399 7277 7116 1371.214447 1502.617078 1457.351304 318.5472297 299.5509206 384.0374369 8600.601292 8374.861905 8707.185809 3446.218602 -4.679805864 0.064073627 -73.03794184 0 0
gene8 25 31 29 1194 1279 1196 22834 21585 19438 29.29945399 27.35239543 24.16420115 1128.621342 1130.164093 1083.275412 23382.08476 24841.47234 23784.46849 8381.211387 -4.429334677 0.041503905 -106.7209146 0 0

_reads: raw reads counts; _norm: DESeq2 normalized conts;

Could you please tell me why? Was there I'm doing something wrong? Thank you very much!

 

 

ADD COMMENTlink modified 29 days ago • written 4 weeks ago by hffqyd0

What do you get from colData(dds)? 

​I still think you have mixed up the labeling of your samples, as those Log2Fold changes look more like treat1/treat2.

ADD REPLYlink written 4 weeks ago by swbarnes250
> colData(dds)
DataFrame with 9 rows and 4 columns
     sampleID    type    group sizeFactor
     <factor> <factor>  <factor>  <numeric>
D3-1     D3-1      T0    control  0.9765596
D3-2     D3-2      T1    control  0.8689098
D3-3     D3-3      T2    control  0.8172560
D2-1     D2-1      T3 trea1  0.8532582
D2-2     D2-2      T4 trea1  1.1333559
D2-3     D2-3      T5 trea1  1.2001224
P2-1     P2-1      T6   treat2  1.0579279
P2-2     P2-2      T7   treat2  1.1316941
P2-3     P2-3      T8   treat2  1.1040590


> samples
  sampleID type    group
1     D3-1   T0    control
2     D3-2   T1    control
3     D3-3   T2    control
4     D2-1   T3 trea1
5     D2-2   T4 trea1
6     D2-3   T5 trea1
7     P2-1   T6   treat2
8     P2-2   T7   treat2
9     P2-3   T8   treat2

 

If I mixed up, how did I? The samples looked right.

ADD REPLYlink modified 29 days ago • written 29 days ago by hffqyd0

It can be mixed up when you attach the column data to the count table. We mention that this has to be done very carefully in the documentation. I’m not saying it’s mixed up but it could be. And trust that the LFC calculation is correct. It’s been the same for 6 years.

ADD REPLYlink written 29 days ago by Michael Love20k

The samples in your counts excerpt that you labeled "control" have the size factors associated with treat1 from colData.  Let me guess.  You made the group column by copying and modifying a command line from a tutorial, without understanding what you were doing?

ADD REPLYlink modified 29 days ago • written 29 days ago by swbarnes250
1

Let's try to gently lead the thread to finding the source of the problem.

ADD REPLYlink written 29 days ago by Michael Love20k
0
gravatar for Michael Love
4 weeks ago by
Michael Love20k
United States
Michael Love20k wrote:

I can't really parse that table very well, but you can trust that when you specify contrast it is using the first element to pick which variable, then putting the second element in the numerator and the third element in the denominator of the fold change.

If there are no other variables in the design, then the MLE will be log2 of mean of norm. counts / mean of norm counts.

ADD COMMENTlink written 4 weeks ago by Michael Love20k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 400 users visited in the last hour