Hello everyone, I'm working with an RNAseq (bulk) dataset and I need to correct the batch effect induced by days of experiment (showed also by MDS plot)
I have two genotypes (APOE and C57) and each has been processed in different days, so:
- APOE_Day1
- APOE_Day2
- C57_Day3
- C57_Day4
I need some suggestion on how to model this batch effect in the design matrix in order to find difference among C57 and APOE group
below my code
x <- read.delim("RNAdataset_APOE4KI.txt", stringsAsFactors=FALSE, header = TRUE)
DG <- readDGE(x ,header=FALSE)
matrix= DG$counts
DG <- DGEList(matrix, group= as.character(x$Genotype))
DG$samples$Experiment3= x$Experiment3
Design matrix to filter out low expressed genes taking into account the group
design= model.matrix(~0+ DG$samples$group)
colnames(design)= c("APOE","C57")
Filtering
keep <- filterByExpr(DG, design=design)
DG <- DG[keep,]
DG$samples$lib.size <- colSums(DG$counts)
DG <- calcNormFactors(DG, method = "TMM") #samples are normalized using TMM method
logCPM <- cpm(DG, log=TRUE)
Design matrix for differential analsysis
design= model.matrix(??????)
sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Big Sur ... 10.16
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Users/IOE23854723/miniconda3/envs/rstudioosx64/lib/libopenblasp-r0.3.23.dylib; LAPACK version 3.11.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/Rome
tzcode source: system (macOS)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] orca_1.1-1 plotly_4.10.2 tinytex_0.46 DT_0.28 htmlwidgets_1.6.2 DGEobj.utils_1.0.6 sva_3.48.0 BiocParallel_1.34.2
[9] genefilter_1.82.1 mgcv_1.9-0 nlme_3.1-163 gplots_3.1.3 RColorBrewer_1.1-3 ggrepel_0.9.3 magrittr_2.0.3 dplyr_1.1.2
[17] ggplot2_3.4.3 edgeR_3.42.4 limma_3.56.2 knitr_1.43
loaded via a namespace (and not attached):
[1] DBI_1.1.3 bitops_1.0-7 rlang_1.1.1 matrixStats_1.0.0 compiler_4.3.1 RSQLite_2.3.1 png_0.1-8
[8] vctrs_0.6.3 stringr_1.5.0 pkgconfig_2.0.3 crayon_1.5.2 fastmap_1.1.1 XVector_0.40.0 ellipsis_0.3.2
[15] labeling_0.4.3 caTools_1.18.2 utf8_1.2.3 purrr_1.0.2 bit_4.0.5 xfun_0.40 zlibbioc_1.46.0
[22] cachem_1.0.8 GenomeInfoDb_1.36.1 jsonlite_1.8.7 blob_1.2.4 parallel_4.3.1 R6_2.5.1 bslib_0.5.1
[29] stringi_1.7.12 jquerylib_0.1.4 Rcpp_1.0.11 assertthat_0.2.1 IRanges_2.34.1 Matrix_1.6-1 splines_4.3.1
[36] tidyselect_1.2.0 rstudioapi_0.15.0 yaml_2.3.7 codetools_0.2-19 lattice_0.21-8 tibble_3.2.1 Biobase_2.60.0
[43] withr_2.5.0 KEGGREST_1.40.0 survival_3.5-7 Biostrings_2.68.1 pillar_1.9.0 MatrixGenerics_1.12.2 KernSmooth_2.23-22
[50] DGEobj_1.1.2 stats4_4.3.1 generics_0.1.3 RCurl_1.98-1.12 S4Vectors_0.38.1 munsell_0.5.0 scales_1.2.1
[57] gtools_3.9.4 xtable_1.8-4 glue_1.6.2 lazyeval_0.2.2 tools_4.3.1 data.table_1.14.8 annotate_1.78.0
[64] locfit_1.5-9.8 XML_3.99-0.14 grid_4.3.1 tidyr_1.3.0 crosstalk_1.2.0 AnnotationDbi_1.62.2 colorspace_2.1-0
[71] GenomeInfoDbData_1.2.10 cli_3.6.1 fansi_1.0.4 viridisLite_0.4.2 gtable_0.3.4 sass_0.4.7 digest_0.6.33
[78] BiocGenerics_0.46.0 farver_2.1.1 memoise_2.0.1 htmltools_0.5.6 lifecycle_1.0.3 httr_1.4.7 bit64_4.0.5
Thanks a lot in advance Giuseppe