Hi~
I have 36 samples of RNA-seq data described as follow,
sample genotype age
HD_old_1 HD Old
HD_old_2 HD Old
HD_old_3 HD Old
HD_old_4 HD Old
HD_old_5 HD Old
HD_old_6 HD Old
HD_old_7 HD Old
HD_old_8 HD Old
HD_old_9 HD Old
HD_old_10 HD Old
HD_young_1 HD Young
HD_young_2 HD Young
HD_young_3 HD Young
HD_young_4 HD Young
HD_young_5 HD Young
HD_young_6 HD Young
HD_young_7 HD Young
HD_young_8 HD Young
HD_young_9 HD Young
HD_young_10 HD Young
WT_old1_1 WT Old1
WT_old1_2 WT Old1
WT_old1_3 WT Old1
WT_old1_4 WT Old1
WT_old1_5 WT Old1
WT_old1_6 WT Old1
WT_old2_1 WT Old2
WT_old2_2 WT Old2
WT_old2_3 WT Old2
WT_old2_4 WT Old2
WT_old2_5 WT Old2
WT_young_1 WT Young
WT_young_2 WT Young
WT_young_3 WT Young
WT_young_4 WT Young
WT_young_5 WT Young
While comparing WT_old1
and WT_young
, I found some genes with too many zeros were detected as significant DEGs. So I extracted WT samples only and ran DESeq2 again, then the genes became non-significant (also normalized counts).
Below is the expression levels of a gene detected as significant.
WT_old1_1 WT_old1_2 WT_old1_3 WT_old1_4 WT_old1_5 WT_old1_6 WT_young_1 WT_young_2 WT_young_3 WT_young_4 WT_young_5 mean_old mean_young log2FoldChange pvalue padj
HD+WT 0.0 0.0 0.0 0.0 0.0 536.7 0.0 0.0 0.0 0.0 0.0 89.5 0.0 25.3 5.02E-31 2.52E-27
WT_only 0.0 0.0 0.0 0.0 0.0 1128.0 0.0 0.0 0.0 0.0 0.0 188.0 0.0 25.4 NA NA
In summary, when all samples were input this gene became significant, while the same gene was not significant when only WT samples used.
Why do I get different significance values depending on the input samples?
Thank you!
Thank you for the advice!
I think you recommended to read a following question.
"If I have multiple groups, should I run all together or split into pairs of groups?"
Basically, I should split samples to run
DESeq
in my condition in which pretty much all samples have high in-group variabilities if I'm understanding right.Then, however, another problem emerges. When groups are split for
DESeq
, normalized count values of each pair of groups became different. For example, normalized count values forgene X
are different betweengroup A vs group B
andgroup A vs group C
.Following is a dataframe I just made up (not actual DESeq2 normalized counts),
But I observed the normalized counts for
group A
change depending on the paring group.Is there a way to preserve the normalized count values across all contrasts?
Thank you!
You can do:
Thank you for relply!
So I can just subset samples after
estimateSizeFactors
.But I got an error saying
Error in designAndArgChecker(object, betaPrior) : full model matrix is less than full rank
, when I ran next.Thank you!
You need to run droplevels() on the design variables.
After several days of struggling, I finally managed to achieve what I wanted.
Thank you, Michael!