Batch effect adjust in model matrix
Entering edit mode
Giuseppe • 0
Last seen 2 days ago

Hello everyone, I'm working with an RNAseq (bulk) dataset and I need to correct the batch effect induced by days of experiment (showed also by MDS plot)

I have two genotypes (APOE and C57) and each has been processed in different days, so:

  • APOE_Day1
  • APOE_Day2
  • C57_Day3
  • C57_Day4

I need some suggestion on how to model this batch effect in the design matrix in order to find difference among C57 and APOE group

below my code

x <- read.delim("RNAdataset_APOE4KI.txt",  stringsAsFactors=FALSE, header = TRUE)
DG <- readDGE(x ,header=FALSE)
matrix= DG$counts
DG <- DGEList(matrix, group= as.character(x$Genotype))
DG$samples$Experiment3= x$Experiment3

Design matrix to filter out low expressed genes taking into account the group

design= model.matrix(~0+ DG$samples$group)
colnames(design)= c("APOE","C57")


keep <- filterByExpr(DG, design=design)
DG <- DG[keep,]
DG$samples$lib.size <- colSums(DG$counts) 
DG <- calcNormFactors(DG, method = "TMM") #samples are normalized using TMM method
logCPM <- cpm(DG, log=TRUE)

Design matrix for differential analsysis

design= model.matrix(??????)

R version 4.3.1 (2023-06-16)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Big Sur ... 10.16

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Users/IOE23854723/miniconda3/envs/rstudioosx64/lib/libopenblasp-r0.3.23.dylib;  LAPACK version 3.11.0

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Rome
tzcode source: system (macOS)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] orca_1.1-1          plotly_4.10.2       tinytex_0.46        DT_0.28             htmlwidgets_1.6.2   DGEobj.utils_1.0.6  sva_3.48.0          BiocParallel_1.34.2
 [9] genefilter_1.82.1   mgcv_1.9-0          nlme_3.1-163        gplots_3.1.3        RColorBrewer_1.1-3  ggrepel_0.9.3       magrittr_2.0.3      dplyr_1.1.2        
[17] ggplot2_3.4.3       edgeR_3.42.4        limma_3.56.2        knitr_1.43         

loaded via a namespace (and not attached):
 [1] DBI_1.1.3               bitops_1.0-7            rlang_1.1.1             matrixStats_1.0.0       compiler_4.3.1          RSQLite_2.3.1           png_0.1-8              
 [8] vctrs_0.6.3             stringr_1.5.0           pkgconfig_2.0.3         crayon_1.5.2            fastmap_1.1.1           XVector_0.40.0          ellipsis_0.3.2         
[15] labeling_0.4.3          caTools_1.18.2          utf8_1.2.3              purrr_1.0.2             bit_4.0.5               xfun_0.40               zlibbioc_1.46.0        
[22] cachem_1.0.8            GenomeInfoDb_1.36.1     jsonlite_1.8.7          blob_1.2.4              parallel_4.3.1          R6_2.5.1                bslib_0.5.1            
[29] stringi_1.7.12          jquerylib_0.1.4         Rcpp_1.0.11             assertthat_0.2.1        IRanges_2.34.1          Matrix_1.6-1            splines_4.3.1          
[36] tidyselect_1.2.0        rstudioapi_0.15.0       yaml_2.3.7              codetools_0.2-19        lattice_0.21-8          tibble_3.2.1            Biobase_2.60.0         
[43] withr_2.5.0             KEGGREST_1.40.0         survival_3.5-7          Biostrings_2.68.1       pillar_1.9.0            MatrixGenerics_1.12.2   KernSmooth_2.23-22     
[50] DGEobj_1.1.2            stats4_4.3.1            generics_0.1.3          RCurl_1.98-1.12         S4Vectors_0.38.1        munsell_0.5.0           scales_1.2.1           
[57] gtools_3.9.4            xtable_1.8-4            glue_1.6.2              lazyeval_0.2.2          tools_4.3.1             data.table_1.14.8       annotate_1.78.0        
[64] locfit_1.5-9.8          XML_3.99-0.14           grid_4.3.1              tidyr_1.3.0             crosstalk_1.2.0         AnnotationDbi_1.62.2    colorspace_2.1-0       
[71] GenomeInfoDbData_1.2.10 cli_3.6.1               fansi_1.0.4             viridisLite_0.4.2       gtable_0.3.4            sass_0.4.7              digest_0.6.33          
[78] BiocGenerics_0.46.0     farver_2.1.1            memoise_2.0.1           htmltools_0.5.6         lifecycle_1.0.3         httr_1.4.7              bit64_4.0.5

Thanks a lot in advance Giuseppe

BatchEffect edgeR • 275 views
Entering edit mode
ATpoint ★ 3.6k
Last seen 14 minutes ago

Each sample is a different day, that is called "nested", so there is nothing you can do here. Next time do a better experimental design, here that would for example be one ApoE and one C57 per day. If that was the case you could do ~batch+group.

All you can do here is ~group, hoping that the biological differences are much stronger than the batch effect that you cannot correct for. It's a perfect example why you need to think about analysis before doing the experiment.


Login before adding your answer.

Traffic: 281 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6