Multi-group design when performing DE analysis
1
0
Entering edit mode
@nnaharfancy-20022
Last seen 3.4 years ago
Imperial College London

Hi All,

I am comparing bulkRNA-seq data for differential expression between three groups. When I keep all the samples together and run a contrast between two groups I get less numbers of DEGs compared to when I just take two groups of samples at a time and perform the DE analysis? Between them the common genes were less than 50%. Is this difference expected? What is the best way to perform this kind of analysis? Thank you very much.

>SampleData.selected$Disease_Status
 [1] HI_MS HI_MS HI_MS LI_MS LI_MS LI_MS CTRL  HI_MS HI_MS HI_MS HI_MS LI_MS LI_MS LI_MS CTRL  CTRL  CTRL  CTRL 
Levels: CTRL LI_MS HI_MS
#############taking all three groups at a time

dds <- DESeqDataSetFromMatrix(countData = Count.selected,
                              colData = SampleData.selected, 
                              design = ~RNA_Batch+RIN+Sex+Disease_Status)

dds <- estimateSizeFactors(dds)
idx <- rowSums( counts(dds, normalized=TRUE) > 0) >= 5
dds <- dds[idx,]

dds <- DESeq(dds)
res <- results(dds, contrast=c("Disease_Status","HI_MS","CTRL"), independentFiltering = TRUE)
res <- res[order(res$padj), ]

summary(res)

out of 37443 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 48, 0.13%
LFC < 0 (down)     : 12, 0.032%
outliers [1]       : 0, 0%
low counts [2]     : 0, 0%
(mean count < 0)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

################## taking just two groups at a time
keep <-  which(SampleData$Disease_Status %in% c("CTRL", "HI_MS"))
dds <- DESeqDataSetFromMatrix(countData = Count.selected[, keep],
                              colData = SampleData.selected[keep, ], 
                              design = ~RNA_Batch+RIN+Sex+Disease_Status)

dds <- estimateSizeFactors(dds)
idx <- rowSums( counts(dds, normalized=TRUE) > 0) >= 5
dds <- dds[idx,]

dds <- DESeq(dds)
res <- results(dds, contrast=c("Disease_Status","HI_MS","CTRL"), independentFiltering = TRUE)
res <- res[order(res$padj), ]

summary(res)

out of 34811 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 62, 0.18%
LFC < 0 (down)     : 25, 0.072%
outliers [1]       : 0, 0%
low counts [2]     : 9449, 27%
(mean count < 4)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

> intersect(rownames(res.two.groups)[1:87], rownames(res.all.groups.together)[1:60])
 [1] "IGHG1"     "IGKC"      "A2M"       "FCGBP"     "CAPS"      "LINC00551" "COL1A2"    "RASSF9"    "NES"       "OLFML2A"   "COL5A1"    "CPAMD8"   
[13] "ESAM"      "SNX31"     "CLIC6"     "COL4A6"    "C10orf105" "LEF1"      "HPSE2"     "RPS2P5"    "NPIPB2"   


> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8   
 [6] LC_MESSAGES=en_GB.UTF-8    LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] grid      stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] psych_1.8.12                AnnotationHub_2.16.1        BiocFileCache_1.8.0         dbplyr_1.4.2                clusterProfiler_3.12.0     
 [6] pathfindR_1.3.0             pathview_1.24.0             org.Hs.eg.db_3.8.2          AnnotationDbi_1.46.0        knitr_1.22                 
[11] ComplexHeatmap_2.0.0        ggpubr_0.2                  magrittr_1.5                ggrepel_0.8.1               reshape2_1.4.3             
[16] dplyr_0.8.0.1               ggplot2_3.1.1               pheatmap_1.0.12             DESeq2_1.24.0               EDASeq_2.18.0              
[21] ShortRead_1.42.0            GenomicAlignments_1.20.0    SummarizedExperiment_1.14.0 DelayedArray_0.10.0         matrixStats_0.54.0         
[26] Rsamtools_2.0.0             GenomicRanges_1.36.0        GenomeInfoDb_1.20.0         Biostrings_2.52.0           XVector_0.24.0             
[31] IRanges_2.18.0              S4Vectors_0.22.0            BiocParallel_1.18.0         Biobase_2.44.0              BiocGenerics_0.30.0        

loaded via a namespace (and not attached):
  [1] R.utils_2.8.0                 tidyselect_0.2.5              RSQLite_2.1.1                 htmlwidgets_1.3              
  [5] DESeq_1.36.0                  munsell_0.5.0                 codetools_0.2-16              withr_2.1.2                  
  [9] colorspace_1.4-1              GOSemSim_2.10.0               rstudioapi_0.10               DOSE_3.10.2                  
 [13] KEGGgraph_1.44.0              urltools_1.7.3                GenomeInfoDbData_1.2.1        mnormt_1.5-5                 
 [17] hwriter_1.3.2                 polyclip_1.10-0               MCMCpack_1.4-4                bit64_0.9-7                  
 [21] farver_1.1.0                  coda_0.19-3                   xfun_0.6                      R6_2.4.0                     
 [25] doParallel_1.0.14             clue_0.3-57                   graphlayouts_0.5.0            locfit_1.5-9.1               
 [29] bitops_1.0-6                  fgsea_1.10.1                  gridGraphics_0.4-1            assertthat_0.2.1             
 [33] promises_1.0.1                scales_1.0.0                  ggraph_2.0.0                  nnet_7.3-12                  
 [37] enrichplot_1.4.0              gtable_0.3.0                  mcmc_0.9-6                    tidygraph_1.1.2              
 [41] rlang_0.4.0                   MatrixModels_0.4-1            genefilter_1.66.0             GlobalOptions_0.1.1          
 [45] splines_3.6.1                 rtracklayer_1.44.0            lazyeval_0.2.2                acepack_1.4.1                
 [49] europepmc_0.3                 checkmate_1.9.3               BiocManager_1.30.4            yaml_2.2.0                   
 [53] GenomicFeatures_1.36.0        backports_1.1.4               httpuv_1.5.1                  qvalue_2.16.0                
 [57] Hmisc_4.2-0                   tools_3.6.1                   ggplotify_0.0.4               RColorBrewer_1.1-2           
 [61] ggridges_0.5.1                Rcpp_1.0.1                    plyr_1.8.4                    base64enc_0.1-3              
 [65] progress_1.2.1                zlibbioc_1.30.0               purrr_0.3.2                   RCurl_1.95-4.12              
 [69] prettyunits_1.0.2             rpart_4.1-15                  GetoptLong_0.1.7              viridis_0.5.1                
 [73] cowplot_0.9.4                 cluster_2.1.0                 data.table_1.12.2             DO.db_2.9                    
 [77] SparseM_1.77                  circlize_0.4.8                triebeard_0.3.0               aroma.light_3.14.0           
 [81] mime_0.6                      hms_0.4.2                     evaluate_0.13                 xtable_1.8-4                 
 [85] XML_3.98-1.19                 gridExtra_2.3                 shape_1.4.4                   compiler_3.6.1               
 [89] biomaRt_2.40.0                tibble_2.1.1                  crayon_1.3.4                  R.oo_1.22.0                  
 [93] htmltools_0.3.6               later_0.8.0                   Formula_1.2-3                 tidyr_0.8.3                  
 [97] geneplotter_1.62.0            DBI_1.0.0                     tweenr_1.0.1                  MASS_7.3-51.4                
[101] MuSiC_0.1.1                   rappdirs_0.3.1                Matrix_1.2-17                 R.methodsS3_1.7.1            
[105] igraph_1.2.4.1                pkgconfig_2.0.2               rvcheck_0.1.5                 foreign_0.8-72               
[109] xml2_1.2.0                    foreach_1.4.4                 annotate_1.62.0               stringr_1.4.0                
[113] digest_0.6.18                 graph_1.62.0                  rmarkdown_1.14                fastmatch_1.1-0              
[117] htmlTable_1.13.1              curl_3.3                      shiny_1.3.2                   quantreg_5.38                
[121] rjson_0.2.20                  nlme_3.1-141                  jsonlite_1.6                  viridisLite_0.3.0            
[125] pillar_1.4.0                  lattice_0.20-38               KEGGREST_1.24.0               httr_1.4.0                   
[129] survival_2.44-1.1             GO.db_3.8.2                   interactiveDisplayBase_1.22.0 glue_1.3.1                   
[133] UpSetR_1.4.0                  png_0.1-7                     iterators_1.0.10              bit_1.1-14                   
[137] Rgraphviz_2.28.0              ggforce_0.3.1                 stringi_1.4.3                 nnls_1.4                     
[141] blob_1.1.1                    latticeExtra_0.6-28           memoise_1.1.0                



deseq2 • 384 views
ADD COMMENT
0
Entering edit mode
Kevin Blighe ★ 3.9k
@kevin
Last seen 1 day ago
Republic of Ireland

In your code, you do not define the following objects (although we can infer what they are):

res.two.groups
res.all.groups.together

By removing samples from your dataset, the size factors that are calculated will differ (please review the DESeq2 published manuscript). Your rowSums calculation may also differ. Based on this, you will obtain slightly different results when you perform a differential expression analysis.

If you have no other reason to exclude samples in the way that you have done, then you should keep all samples in your cohort and normalise them together.

Reasons for exclusion could be, for example, a case where the transcriptional profiles are so vastly different such that normalising them together introduces bias.

Kevin

ADD COMMENT
0
Entering edit mode

Hi Kevin,

Thank you very much for your reply. Yes, my hunch was that, this is the case (removing samples will change the normalisation). Thanks for further explaining the reason for exclusion.

Nurun

ADD REPLY

Login before adding your answer.

Traffic: 930 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6