Hi,
I'm trying to combine methylation RGsets together using minfi's combineArrays. For this specific command, I have one 450k and one EPIC RGset. This command worked using the same datasets a few months ago, but now I get an error message as shown below. What am I doing wrong? Could it be due to the updated minfi version? (1_38 vs 1_36)
Thanks!
Sophie.
> rgSetProstate
class: RGChannelSet
dim: 574981 1735
metadata(0):
assays(2): Green Red
rownames(574981): 10600322 10600328 ... 74810485 74810492
rowData names(0):
colnames(1735): 6057833166_R02C02 6057833166_R04C02 ...
GSM4199991_202226400157_R07C01 GSM4199992_202226400157_R08C01
colData names(1): ArrayTypes
Annotation
array: IlluminaHumanMethylation450k
annotation: ilmn12.hg19
> rgSetEPIC
class: RGChannelSet
dim: 1051539 240
metadata(0):
assays(2): Green Red
rownames(1051539): 1600101 1600111 ... 99810990 99810992
rowData names(0):
colnames(240): GSM2998021_201868500150_R01C01
GSM2998022_201868500150_R03C01 ... GSM4199991_202226400157_R07C01
GSM4199992_202226400157_R08C01
colData names(0):
Annotation
array: IlluminaHumanMethylationEPIC
annotation: ilm10b4.hg19
> rgSetCombined <- combineArrays(rgSetProstate,rgSetEPIC, outType = "IlluminaHumanMethylation450k",verbose = TRUE)
[convertArray] Casting as IlluminaHumanMethylation450k
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function 'colData<-' for signature '"RGChannelSet", "character"'
> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.5 (Nitrogen)
Matrix products: default
BLAS: /gpfs/igmmfs01/software/pkg/el7/apps/R/4.0.3/lib64/R/lib/libRblas.so
LAPACK: /gpfs/igmmfs01/software/pkg/el7/apps/R/4.0.3/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.0
[2] IlluminaHumanMethylationEPICmanifest_0.3.0
[3] IlluminaHumanMethylation450kmanifest_0.4.0
[4] minfi_1.38.0
[5] bumphunter_1.34.0
[6] locfit_1.5-9.4
[7] iterators_1.0.13
[8] foreach_1.5.1
[9] Biostrings_2.60.0
[10] XVector_0.32.0
[11] SummarizedExperiment_1.22.0
[12] Biobase_2.52.0
[13] MatrixGenerics_1.4.0
[14] matrixStats_0.59.0
[15] GenomicRanges_1.44.0
[16] GenomeInfoDb_1.28.0
[17] IRanges_2.26.0
[18] S4Vectors_0.30.0
[19] BiocGenerics_0.38.0
loaded via a namespace (and not attached):
[1] rjson_0.2.20 ellipsis_0.3.2
[3] siggenes_1.66.0 mclust_5.4.7
[5] base64_2.0 rstudioapi_0.13
[7] bit64_4.0.5 AnnotationDbi_1.54.0
[9] fansi_0.5.0 xml2_1.3.2
[11] codetools_0.2-18 splines_4.1.0
[13] sparseMatrixStats_1.4.0 cachem_1.0.5
[15] scrime_1.3.5 Rsamtools_2.8.0
[17] annotate_1.70.0 dbplyr_2.1.1
[19] png_0.1-7 HDF5Array_1.20.0
[21] readr_1.4.0 compiler_4.1.0
[23] httr_1.4.2 assertthat_0.2.1
[25] Matrix_1.3-4 fastmap_1.1.0
[27] limma_3.48.0 prettyunits_1.1.1
[29] tools_4.1.0 glue_1.4.2
[31] GenomeInfoDbData_1.2.6 dplyr_1.0.6
[33] rappdirs_0.3.3 doRNG_1.8.2
[35] Rcpp_1.0.6 vctrs_0.3.8
[37] rhdf5filters_1.4.0 multtest_2.48.0
[39] preprocessCore_1.54.0 nlme_3.1-152
[41] rtracklayer_1.52.0 DelayedMatrixStats_1.14.0
[43] stringr_1.4.0 lifecycle_1.0.0
[45] restfulr_0.0.13 rngtools_1.5
[47] XML_3.99-0.6 beanplot_1.2
[49] zlibbioc_1.38.0 MASS_7.3-54
[51] hms_1.1.0 rhdf5_2.36.0
[53] GEOquery_2.60.0 RColorBrewer_1.1-2
[55] yaml_2.2.1 curl_4.3.1
[57] memoise_2.0.0 biomaRt_2.48.0
[59] reshape_0.8.8 stringi_1.6.2
[61] RSQLite_2.2.7 genefilter_1.74.0
[63] BiocIO_1.2.0 GenomicFeatures_1.44.0
[65] filelock_1.0.2 BiocParallel_1.26.0
[67] rlang_0.4.11 pkgconfig_2.0.3
[69] bitops_1.0-7 nor1mix_1.3-0
[71] lattice_0.20-44 purrr_0.3.4
[73] Rhdf5lib_1.14.0 GenomicAlignments_1.28.0
[75] bit_4.0.4 tidyselect_1.1.1
[77] plyr_1.8.6 magrittr_2.0.1
[79] R6_2.5.0 generics_0.1.0
[81] DelayedArray_0.18.0 DBI_1.1.1
[83] pillar_1.6.1 survival_3.2-11
[85] KEGGREST_1.32.0 RCurl_1.98-1.3
[87] tibble_3.1.2 crayon_1.4.1
[89] utf8_1.2.1 BiocFileCache_2.0.0
[91] progress_1.2.2 grid_4.1.0
[93] data.table_1.14.0 blob_1.2.1
[95] digest_0.6.27 xtable_1.8-4
[97] tidyr_1.1.3 illuminaio_0.34.0
[99] openssl_1.4.4 askpass_1.1
[101] quadprog_1.5-8
Thank you very much for looking into this! Yes, the output is different, as shown below. I realise now that I was using my previous combined dataset (rgSetCombined) with the original rgSetEPIC, and these have different colData, because the combined dataset has colData names(1): ArrayTypes but rgSetEPIC has colData names(0):. So my issue is now, how can I combine a previously combined dataset, which has colData names(1): ArrayTypes, with my newly created dataset, which has no colData names? I have tried manually setting the colData for the rgSet450k but combineArrays gives me the same error message. I will copy this below.
[...] continues until [1186]
What you have in your
colData
doesn't meet the expectations that I would normally have. Theminfi:::.pDataFix
function is supposed to generate aDataFrame
containing only "Slide", "Array", "Sample_Name", "Basename", and "SampleID", so it's weird that your combinedRGChannelSet
has only 'ArrayTypes', which isn't what I would expect. Did you change that?I would also not expect a
colData
for the newly generated EPIC array to be empty. It should look like the example data that I presented above. So the next step is to figure out why you are ending up withRGChannelSets
that don't meet expectations.I didn't realise that wasn't the normal RGset format. I created them by using the idat files in specified folders. I can't remember the reason, but I am using several datasets from GEO and TCGA, and it was difficult to combine it all together. And I created a separate sample sheet, which is also time-consuming. But there may be a better way to do this?
Ah, I get it. Usually you have a csv file that comes from the Illumina software and you use
read.metharray.sheet
to generate a 'targets' file that has all the stuff in it that I expected. If you are getting the data from a bunch of different places it might be a pain to make a fake csv file, so using the 'base' argument instead is probably the way to go.After some trial and error, what solved this for me was to remove the colData column from rgSet450k instead of adding the same column to rgSetProstate, so neither rgSet has any column in colData.
Thank you!