Hi all, Im wondering if its possible to meaningfully combine data from EPIC and EPICV2 arrays. I have a MSet from each array, however, when I use combineArrays I end up with 0 probes.
When I compared the two arrays for shared probes by removing the last part of the EPICV2 probe IDs ( ex. cg25324105_BC11 -> cg25324105), I figure that I should have about 726,890 shared probes. Is this issue arising due to the different probe names between the two arrays? Or maybe because of the different annotations between the two, with my EPICV1 annotated with ilm10b4.hg19, and my EPICV2 with 20a1.hg38?
I have ~50 samples that have been run on both arrays, and if these pairs cluster closely post-join and norm I hope to combine data from these arrays for an EWAS
Any help or insight in moving forward would be greatly appreciated!
My combine arrays output:
> MSet3 <- combineArrays(MSet_V2, MSet_V1,
+ outType = "IlluminaHumanMethylationEPIC",
+ verbose = TRUE)
[convertArray] Casting as IlluminaHumanMethylationEPIC
> MSet3
class: MethylSet
dim: 0 337
metadata(0):
assays(2): Meth Unmeth
rownames(0):
rowData names(0):
colnames(337): 207925050049_R07C01 207882990112_R05C01 ...
200526590027_R07C01 200526590047_R07C01
colData names(120): Sample_Name Sample_Plate ... gPC5 ArrayTypes
Annotation
array: IlluminaHumanMethylationEPIC
annotation: ilm10b4.hg19
Preprocessing
Method: Raw (no normalization or bg correction)
minfi version: 1.44.0
Manifest version: 0.99.1
My MSets :
MSet_V1
class: MethylSet
dim: 866238 113
metadata(0):
assays(2): Meth Unmeth
rownames(866238): cg18478105 cg09835024 ... cg10633746 cg12623625
rowData names(0):
colnames(113): 201364900064_R04C01 201364910121_R08C01 ...
200526590027_R07C01 200526590047_R07C01
colData names(59): Sample_Name barcode ... Basename filenames
Annotation
array: IlluminaHumanMethylationEPIC
annotation: ilm10b4.hg19
Preprocessing
Method: Raw (no normalization or bg correction)
minfi version: 1.44.0
Manifest version: 0.3.0
MSet_V2
class: MethylSet
dim: 936990 224
metadata(0):
assays(2): Meth Unmeth
rownames(936990): cg25324105_BC11 cg25383568_TC11 ...
ch.12.78471492F_BC21 ch.21.43742285F_BC21
rowData names(0):
colnames(224): 207925050049_R07C01 207882990112_R05C01 ...
207882990011_R08C01 207865030077_R04C01
colData names(78): Sample_Name Sample_Plate ... Basename filenames
Annotation
array: IlluminaHumanMethylationEPICv2
annotation: 20a1.hg38
Preprocessing
Method: Raw (no normalization or bg correction)
minfi version: 1.44.0
Manifest version: 0.99.1
Obligatory Session Info:
> sessionInfo( )
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /u/local/compilers/intel/oneapi/2022.1.1/mkl/2022.0.1/lib/intel64/libmkl_gf_lp64.so.2; LAPACK version 3.9.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: America/Los_Angeles
tzcode source: system (glibc)
attached base packages:
[1] grid parallel stats4 stats graphics grDevices utils
[8] datasets methods base
other attached packages:
[1] IlluminaHumanMethylationEPICv2manifest_0.99.2
[2] IlluminaHumanMethylationEPICmanifest_0.3.0
[3] IlluminaHumanMethylationEPICv2anno.20a1.hg38_0.99.1
[4] methylCC_1.16.0
[5] FlowSorted.Blood.450k_1.40.0
[6] remotes_2.5.0
[7] RColorBrewer_1.1-3
[8] DMRcatedata_2.20.3
[9] ExperimentHub_2.10.0
[10] AnnotationHub_3.10.0
[11] BiocFileCache_2.10.2
[12] dbplyr_2.5.0
[13] mCSEA_1.22.0
[14] Homo.sapiens_1.3.1
[15] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[16] org.Hs.eg.db_3.18.0
[17] GO.db_3.18.0
[18] OrganismDbi_1.44.0
[19] GenomicFeatures_1.54.4
[20] AnnotationDbi_1.64.1
[21] mCSEAdata_1.22.0
[22] DMRcate_2.16.1
[23] Gviz_1.46.1
[24] minfiData_0.48.0
[25] IlluminaHumanMethylation450kmanifest_0.4.0
[26] missMethyl_1.36.0
[27] IlluminaHumanMethylationEPICanno.ilm10b4.hg19_0.6.0
[28] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.1
[29] minfi_1.48.0
[30] bumphunter_1.44.0
[31] locfit_1.5-9.9
[32] iterators_1.0.14
[33] foreach_1.5.2
[34] Biostrings_2.70.3
[35] XVector_0.42.0
[36] SummarizedExperiment_1.32.0
[37] Biobase_2.62.0
[38] MatrixGenerics_1.14.0
[39] matrixStats_1.2.0
[40] GenomicRanges_1.54.1
[41] GenomeInfoDb_1.38.8
[42] IRanges_2.36.0
[43] S4Vectors_0.40.2
[44] BiocGenerics_0.48.1
[45] limma_3.58.1
[46] lubridate_1.9.3
[47] forcats_1.0.0
[48] stringr_1.5.1
[49] dplyr_1.1.4
[50] purrr_1.0.2
[51] readr_2.1.5
[52] tidyr_1.3.1
[53] tibble_3.2.1
[54] ggplot2_3.5.0
[55] tidyverse_2.0.0
[56] BiocManager_1.30.22
loaded via a namespace (and not attached):
[1] ProtGenerics_1.34.0 bitops_1.0-7
[3] httr_1.4.7 tools_4.3.0
[5] doRNG_1.8.6 backports_1.4.1
[7] utf8_1.2.4 R6_2.5.1
[9] HDF5Array_1.30.1 lazyeval_0.2.2
[11] rhdf5filters_1.14.1 permute_0.9-7
[13] withr_3.0.0 prettyunits_1.2.0
[15] gridExtra_2.3 base64_2.0.1
[17] preprocessCore_1.64.0 cli_3.6.2
[19] genefilter_1.84.0 askpass_1.2.0
[21] Rsamtools_2.18.0 foreign_0.8-84
[23] siggenes_1.76.0 illuminaio_0.44.0
[25] R.utils_2.12.3 dichromat_2.0-0.1
[27] scrime_1.3.5 BSgenome_1.70.2
[29] readxl_1.4.3 rstudioapi_0.16.0
[31] RSQLite_2.3.6 generics_0.1.3
[33] BiocIO_1.12.0 gtools_3.9.5
[35] Matrix_1.6-5 interp_1.1-6
[37] fansi_1.0.6 abind_1.4-5
[39] R.methodsS3_1.8.2 lifecycle_1.0.4
[41] yaml_2.3.8 edgeR_4.0.16
[43] rhdf5_2.46.1 SparseArray_1.2.4
[45] blob_1.2.4 promises_1.2.1
[47] crayon_1.5.2 lattice_0.21-8
[49] annotate_1.80.0 KEGGREST_1.42.0
[51] pillar_1.9.0 knitr_1.45
[53] beanplot_1.3.1 rjson_0.2.21
[55] codetools_0.2-19 glue_1.7.0
[57] data.table_1.15.4 vctrs_0.6.5
[59] png_0.1-8 cellranger_1.1.0
[61] gtable_0.3.4 cachem_1.0.8
[63] xfun_0.43 S4Arrays_1.2.1
[65] mime_0.12 survival_3.5-5
[67] statmod_1.5.0 interactiveDisplayBase_1.40.0
[69] ellipsis_0.3.2 nlme_3.1-162
[71] bit64_4.0.5 bsseq_1.38.0
[73] progress_1.2.3 filelock_1.0.3
[75] nor1mix_1.3-2 rpart_4.1.19
[77] colorspace_2.1-0 DBI_1.2.2
[79] Hmisc_5.1-2 nnet_7.3-18
[81] tidyselect_1.2.1 bit_4.0.5
[83] compiler_4.3.0 curl_5.2.1
[85] graph_1.80.0 htmlTable_2.4.2
[87] xml2_1.3.6 DelayedArray_0.28.0
[89] rtracklayer_1.62.0 checkmate_2.3.1
[91] scales_1.3.0 quadprog_1.5-8
[93] RBGL_1.78.0 rappdirs_0.3.3
[95] digest_0.6.35 rmarkdown_2.26
[97] GEOquery_2.70.0 htmltools_0.5.7
[99] pkgconfig_2.0.3 jpeg_0.1-10
[101] base64enc_0.1-3 sparseMatrixStats_1.14.0
[103] fastmap_1.1.1 ensembldb_2.26.0
[105] rlang_1.1.3 htmlwidgets_1.6.4
[107] shiny_1.8.0 DelayedMatrixStats_1.24.0
[109] BiocParallel_1.36.0 mclust_6.1
[111] R.oo_1.26.0 VariantAnnotation_1.48.1
[113] RCurl_1.98-1.14 magrittr_2.0.3
[115] Formula_1.2-5 GenomeInfoDbData_1.2.11
[117] Rhdf5lib_1.24.2 munsell_0.5.0
[119] Rcpp_1.0.12 stringi_1.8.3
[121] zlibbioc_1.48.2 MASS_7.3-58.4
[123] plyr_1.8.9 deldir_2.0-2
[125] splines_4.3.0 multtest_2.58.0
[127] hms_1.1.3 rngtools_1.5.2
[129] biomaRt_2.58.2 BiocVersion_3.18.1
[131] XML_3.99-0.16.1 evaluate_0.23
[133] latticeExtra_0.6-30 biovizBase_1.50.0
[135] tzdb_0.4.0 httpuv_1.6.14
[137] openssl_2.1.1 reshape_0.8.9
[139] xtable_1.8-4 restfulr_0.0.15
[141] AnnotationFilter_1.26.0 later_1.3.2
[143] plyranges_1.22.0 memoise_2.0.1
[145] GenomicAlignments_1.38.2 cluster_2.1.4
[147] timechange_0.3.0
Thank you Tim, this is a great help in getting me started!
Hi, I am stuck in the same point, would you advise, how did you manage?