Question

DESeq2- Mysterious interaction suppresses DEGs

0

Entering edit mode

shaunpeterson • 0

@e4d569d8

Last seen 10 days ago

United States

I have puzzling results from an RNA-Seq experiment. The experimental setup is 4 conditions: two tissues (cortex and striatum), and two treatments (drug and DMSO), each is in triplicate (12 samples total). I have set up my design model at the "group" level with no intercept, so that I could directly specify the desired contrast without releveling. Looking at the effect of the drug in the cortex I see several DEGs, but none in the striatum. This would be fine, but the strange thing happens when I remove one of the cortex replicates. PCA of all the samples shows one of the cortex samples which clusters with the striatum replicates. This could have been a mistake by the researchers dissecting or mis-labeling a tube or something, so I excluded it. I expected that this would affect the DEGs coming from the cortex, but what I saw was that it affected the results from the striatum contrast! I now have DEGs in the striatum, even though I did not alter any of the striatum samples. Is there a hidden interaction occurring between the samples? Have I set up my design matrix incorrectly? Any insight into this strange phenomenon would be greatly appreciated!

> cond <- as.factor(paste0(gse$Tissue,"_",gse$Treatment))
> cond<- relevel(cond, "Cortex_DMSO")
> gse$group <- cond
> gse$group

 [1] Cortex_DMSO      Cortex_DMSO      Cortex_DMSO      Cortex_Rumitin   Cortex_Rumitin   Cortex_Rumitin   Striatum_DMSO    Striatum_DMSO   
 [9] Striatum_DMSO    Striatum_Rumitin Striatum_Rumitin Striatum_Rumitin
Levels: Cortex_DMSO Cortex_Rumitin Striatum_DMSO Striatum_Rumitin

> dds <- DESeqDataSet(gse, design = ~ 0 + group)
> dds <- DESeq(dds)

> res.cort<- lfcShrink(dds, contrast=c("group","Cortex_Rumitin","Cortex_DMSO"), type="ashr")
> res.cort
log2 fold change (MMSE): group Cortex_Rumitin vs Cortex_DMSO 
Wald test p-value: group Cortex_Rumitin vs Cortex_DMSO 
DataFrame with 19004 rows and 5 columns
            baseMean log2FoldChange      lfcSE     pvalue      padj
           <numeric>      <numeric>  <numeric>  <numeric> <numeric>
Gnai3     1038.88144   -0.000482174 0.00969773  0.1743862  0.971595
Cdc45       65.64955   -0.003120309 0.04345455  0.0793545  0.893919
H19          2.73165   -0.000422471 0.05318124  0.8731500  1.000000
Scml2       44.48048   -0.000260775 0.02449895  0.8204180  1.000000
Apoh         4.78234   -0.007466221 0.10939134  0.1343830  0.950330
...              ...            ...        ...        ...       ...
Hmga2-ps1   31.16452    0.000383322  0.0206632 0.67033426  1.000000
Gm35162      1.82351    0.000515772  0.0653898 0.86600591  1.000000
Gm54112     14.84016   -0.003030118  0.0720300 0.36330174  1.000000
Klhl17    1483.64391    0.000523043  0.0112509 0.18522629  0.972951
Rn7sk       74.60347   -0.058693122  0.2358592 0.00488479  0.357740

> summary(res.cort)

out of 19004 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 22, 0.12%
LFC < 0 (down)     : 41, 0.22%
outliers [1]       : 36, 0.19%
low counts [2]     : 0, 0%
(mean count < 1)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results        

----------
> res.stri<- lfcShrink(dds, contrast=c("group","Striatum_Rumitin","Striatum_DMSO"),type="ashr")
> res.stri

log2 fold change (MMSE): group Striatum_Rumitin vs Striatum_DMSO 
Wald test p-value: group Striatum_Rumitin vs Striatum_DMSO 
DataFrame with 19004 rows and 5 columns
            baseMean log2FoldChange     lfcSE    pvalue      padj
           <numeric>      <numeric> <numeric> <numeric> <numeric>
Gnai3     1038.88144    1.38462e-04 0.0114670  0.893608  0.996519
Cdc45       65.64955    7.10672e-05 0.0223572  0.967872  0.998015
H19          2.73165    1.82210e-04 0.0336336  0.845834  0.996519
Scml2       44.48048    9.96938e-04 0.0305389  0.567218  0.996519
Apoh         4.78234   -9.52567e-04 0.0351848  0.274633  0.976907
...              ...            ...       ...       ...       ...
Hmga2-ps1   31.16452   -0.000393574 0.0272692  0.826926  0.996519
Gm35162      1.82351    0.000249077 0.0338485  0.777089  0.996519
Gm54112     14.84016    0.000579603 0.0336583  0.584960  0.996519
Klhl17    1483.64391   -0.000304130 0.0134503  0.798946  0.996519
Rn7sk       74.60347    0.003086375 0.0352105  0.204464  0.957125

> summary(res.stri)

out of 19004 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 0, 0%
LFC < 0 (down)     : 0, 0%
outliers [1]       : 36, 0.19%
low counts [2]     : 0, 0%
(mean count < 1)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
----------

![]PCA of all samples with outlier Cortex sample highlighted[1]

![PCA without Cortex outlier. Note Striatum samples are unchanged.][1]

----------
- -RE-RUN IDENTICALLY AS BEFORE, JUST WITHOUT THE CORTEX OUTLIER - -
res.stri<- lfcShrink(dds, contrast=c("Group","Striatum_Rumitin","Striatum_DMSO"),type="ashr")

> res.stri
log2 fold change (MMSE): Group Striatum_Rumitin vs Striatum_DMSO 
Wald test p-value: Group Striatum_Rumitin vs Striatum_DMSO 
DataFrame with 18995 rows and 5 columns
            baseMean log2FoldChange     lfcSE    pvalue      padj
           <numeric>      <numeric> <numeric> <numeric> <numeric>
Gnai3     1037.96827    0.000650442 0.0248222  0.894532  0.990115
Cdc45       66.30511    0.000279675 0.0501738  0.974007  0.995281
H19          2.62274    0.001234327 0.0884534  0.840425        NA
Scml2       43.06311    0.005715537 0.0687505  0.535760  0.943023
Apoh         4.88169   -0.006701980 0.0988151  0.280998        NA
...              ...            ...       ...       ...       ...
Hmga2-ps1   31.17408    -0.00189327 0.0617784  0.830524  0.984211
Gm35162      1.89299     0.00158597 0.0896914  0.788511        NA
Gm54112     14.46826     0.00372433 0.0876773  0.590337  0.956754
Klhl17    1476.69182    -0.00149574 0.0307298  0.804329  0.983584
Rn7sk       78.46717     0.01496945 0.0778847  0.203525  0.833373

> summary(res.stri)

out of 18995 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 51, 0.27%
LFC < 0 (down)     : 37, 0.19%**
outliers [1]       : 32, 0.17%
low counts [2]     : 2210, 12%
(mean count < 5)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results


----------

sessionInfo( )
R version 4.3.3 (2024-02-29 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22631)

Matrix products: default


locale:
[1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8    LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                           LC_TIME=English_United States.utf8    

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] grid      stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] GenomicFeatures_1.54.4      PhosR_1.13.1                vsn_3.70.0                  gridExtra_2.3               clusterProfiler_4.10.1     
 [6] eulerr_7.0.2                ggvenn_0.1.10               scales_1.3.0                stringr_1.5.1               pathview_1.42.0            
[11] biomaRt_2.58.2              enrichplot_1.22.0           tibble_3.2.1                tidyr_1.3.1                 ggrepel_0.9.5              
[16] ReportingTools_2.42.3       RUVSeq_1.36.0               edgeR_4.0.16                limma_3.58.1                EDASeq_2.36.0              
[21] ShortRead_1.60.0            GenomicAlignments_1.38.2    Rsamtools_2.18.0            Biostrings_2.70.3           XVector_0.42.0             
[26] sva_3.50.0                  BiocParallel_1.36.0         mgcv_1.9-1                  nlme_3.1-164                Gviz_1.46.1                
[31] org.Mm.eg.db_3.18.0         AnnotationDbi_1.64.1        genefilter_1.84.0           ashr_2.2-63                 apeglm_1.24.0              
[36] ggbeeswarm_0.7.2            PoiClaClu_1.0.2.1           glmpca_0.2.0                RColorBrewer_1.1-3          hexbin_1.28.3              
[41] pheatmap_1.0.12             ggplot2_3.5.1               dplyr_1.1.4                 DESeq2_1.42.1               SummarizedExperiment_1.32.0
[46] Biobase_2.62.0              MatrixGenerics_1.14.0       matrixStats_1.3.0           GenomicRanges_1.54.1        GenomeInfoDb_1.38.8        
[51] IRanges_2.36.0              S4Vectors_0.40.2            magrittr_2.0.3              tximeta_1.20.3              BiocFileCache_2.10.2       
[56] dbplyr_2.5.0                rmarkdown_2.26              knitr_1.46                  readr_2.1.5                 BiocGenerics_0.48.1        
[61] BiocStyle_2.30.0            BiocManager_1.30.23

DESeq2 PCAplot RNA-Seq DesignModel • 262 views

ADD COMMENT • link updated 12 days ago by Michael Love 42k • written 15 days ago by shaunpeterson • 0

0

Entering edit mode

The images I tried to attach didn't attach. Here is the PCA (MDS) of all the samples, with the Cortex outlier highlighed.

enter image description here

And this is the Volcano plot of the Striatum samples when nothing is removed:

enter image description here

After the outlier cortex sample is removed, the 6 striatum samples are unchanged in the PCA/MDS plot:

enter image description here

But now the striatum is showing DEGs even though it was a cortex sample which was removed.

enter image description here

ADD REPLY • link 15 days ago shaunpeterson • 0

score 1 · Answer 1 · 2024-05-18

1

Entering edit mode

Michael Love 42k

@mikelove

Last seen 1 day ago

United States

Removing samples from one group nevertheless changes the variance estimation (dispersion) for all comparisons. So you could have pushed one comparison into significance this way.

ADD COMMENT • link 14 days ago Michael Love 42k

0

Entering edit mode

Thanks so much for your comment. Yes, this makes sense to me, especially since the effect of the drug is subtle. But it raises a larger concern about analysis design. For an experiment like this where the goal is to see the effect of a drug on two independent tissues (or cell lines or whatever), is it valid, or even preferred, to restrict the input to just these two groups from the start and use a single-factor design matrix (~Treatment)? And then repeat the same process with the other tissue/cell line? It seems like this might not be best practice. But when the variance between the tissues/cell lines is greater (and not of interest) than the variance caused by the treatment, maybe this approach is warranted?

ADD REPLY • link 12 days ago shaunpeterson • 0

1

Entering edit mode

We have some text about this particular question in the Frequently Asked Questions (FAQ) part of the software vignette.

ADD REPLY • link 12 days ago Michael Love 42k