Question

Odd DESeq2 results - high significance in genes with too many zeros

1

Entering edit mode

mbk0asis • 0

@mbk0asis-7155

Last seen 22 months ago

Korea, Republic Of

Hi~

I have 36 samples of RNA-seq data described as follow,

sample  genotype    age
HD_old_1    HD  Old
HD_old_2    HD  Old
HD_old_3    HD  Old
HD_old_4    HD  Old
HD_old_5    HD  Old
HD_old_6    HD  Old
HD_old_7    HD  Old
HD_old_8    HD  Old
HD_old_9    HD  Old
HD_old_10   HD  Old
HD_young_1  HD  Young
HD_young_2  HD  Young
HD_young_3  HD  Young
HD_young_4  HD  Young
HD_young_5  HD  Young
HD_young_6  HD  Young
HD_young_7  HD  Young
HD_young_8  HD  Young
HD_young_9  HD  Young
HD_young_10 HD  Young
WT_old1_1   WT  Old1
WT_old1_2   WT  Old1
WT_old1_3   WT  Old1
WT_old1_4   WT  Old1
WT_old1_5   WT  Old1
WT_old1_6   WT  Old1
WT_old2_1   WT  Old2
WT_old2_2   WT  Old2
WT_old2_3   WT  Old2
WT_old2_4   WT  Old2
WT_old2_5   WT  Old2
WT_young_1  WT  Young
WT_young_2  WT  Young
WT_young_3  WT  Young
WT_young_4  WT  Young
WT_young_5  WT  Young

While comparing WT_old1 and WT_young, I found some genes with too many zeros were detected as significant DEGs. So I extracted WT samples only and ran DESeq2 again, then the genes became non-significant (also normalized counts).

Below is the expression levels of a gene detected as significant.

    WT_old1_1   WT_old1_2   WT_old1_3   WT_old1_4   WT_old1_5   WT_old1_6   WT_young_1  WT_young_2  WT_young_3  WT_young_4  WT_young_5  mean_old    mean_young  log2FoldChange  pvalue  padj
HD+WT   0.0 0.0 0.0 0.0 0.0 536.7   0.0 0.0 0.0 0.0 0.0 89.5    0.0 25.3    5.02E-31    2.52E-27
WT_only 0.0 0.0 0.0 0.0 0.0 1128.0  0.0 0.0 0.0 0.0 0.0 188.0   0.0 25.4    NA  NA

In summary, when all samples were input this gene became significant, while the same gene was not significant when only WT samples used.

Why do I get different significance values depending on the input samples?

Thank you!

deseq2 rna-seq • 1.1k views

ADD COMMENT • link updated 3.8 years ago by Michael Love 42k • written 3.8 years ago by mbk0asis • 0

score 4 · Accepted Answer · 2020-07-06

4

Entering edit mode

Michael Love 42k

@mikelove

Last seen 13 hours ago

United States

This question is a FAQ in the vignette.

If you don’t want genes with too many genes you should prefilter such that there are x many samples with a count of 10 or higher.

ADD COMMENT • link 3.8 years ago Michael Love 42k

0

Entering edit mode

Thank you for the advice!

I think you recommended to read a following question.

"If I have multiple groups, should I run all together or split into pairs of groups?"

Basically, I should split samples to run DESeq in my condition in which pretty much all samples have high in-group variabilities if I'm understanding right.

Then, however, another problem emerges. When groups are split for DESeq, normalized count values of each pair of groups became different. For example, normalized count values for gene X are different between group A vs group B and group A vs group C.

Following is a dataframe I just made up (not actual DESeq2 normalized counts),

    A_1 A_2 A_3 B_1 B_2 B_3 C_1 C_2 C_3
raw 1   2   3   5   6   7   8   9   10
Norm (A vs B)   2   4   6   10  12  14  -   -   -
Norm (A vs C)   3   5   7   -   -   -   10  13  16

But I observed the normalized counts for group A change depending on the paring group.

Is there a way to preserve the normalized count values across all contrasts?

Thank you!

ADD REPLY • link 3.8 years ago mbk0asis • 0

1

Entering edit mode

You can do:

dds <- estimateSizeFactors(dds)
dds1 <- dds[,idx]
dds1 <- DESeq(dds1)
...

ADD REPLY • link 3.8 years ago Michael Love 42k

0

Entering edit mode

Thank you for relply!

So I can just subset samples after estimateSizeFactors.

But I got an error saying Error in designAndArgChecker(object, betaPrior) : full model matrix is less than full rank, when I ran next.

> dds <- DESeqDataSetFromMatrix(cnts, colData = coldat, design = ~conds)
> dds <- estimateSizeFactors(dds)
> dds
class: DESeqDataSet 
dim: 23418 37 
metadata(1): version
assays(1): counts
rownames(23418): 0610005C13Rik 0610007N19Rik ... Zzef1 Zzz3
rowData names(0):
colnames(37): HD_Muscle_14w_F_1 HD_Muscle_14w_F_2 ... WT_Muscle_2m_4
  WT_Muscle_2m_5
colData names(2): conds sizeFactor
> dds1 <- dds[,1:21]
> dds1 <- DESeq(dds1)

Error in designAndArgChecker(object, betaPrior) : 
  full model matrix is less than full rank

> dds1@colData
DataFrame with 21 rows and 2 columns
                              conds        sizeFactor
                           <factor>         <numeric>
HD_Muscle_14w_F_1   HD_Muscle_Old_F 0.547472117372437
HD_Muscle_14w_F_2   HD_Muscle_Old_F  1.09062663329413
HD_Muscle_14w_F_3   HD_Muscle_Old_F 0.388627829768327
HD_Muscle_14w_F_4   HD_Muscle_Old_F 0.516901128754207
HD_Muscle_14w_F_5   HD_Muscle_Old_F 0.410221719625506
...                             ...               ...
HD_Muscle_5w_M_1  HD_Muscle_Young_M 0.273877249985157
HD_Muscle_5w_M_2  HD_Muscle_Young_M   1.0525913993678
HD_Muscle_5w_M_3  HD_Muscle_Young_M 0.439962063692821
HD_Muscle_5w_M_4  HD_Muscle_Young_M 0.514829797859048
HD_Muscle_5w_M_5  HD_Muscle_Young_M  1.14383604673933

Thank you!

ADD REPLY • link 3.8 years ago mbk0asis • 0

1

Entering edit mode

You need to run droplevels() on the design variables.

ADD REPLY • link 3.8 years ago Michael Love 42k

0

Entering edit mode

After several days of struggling, I finally managed to achieve what I wanted.

Thank you, Michael!

ADD REPLY • link 3.8 years ago mbk0asis • 0