minfi error read.metharray.exp
1
0
Entering edit mode
Grace • 0
@609f1b8d
Last seen 18 months ago
United Kingdom

I am undertaking analysis of four EPIC methylation arrays on version 4.2.2 of R using the minfi package.

However, whenever I run the following line of code I encounter the following error. Please note I have searched my sample sheet for any duplicate values but there are none.

If you could please help in any way or make informed suggestions for progression I could not be more grateful!

Thank you


#  RGsetEx <- read.metharray.exp(targets = sheet)
# Error in read.metharray(basenames = files, extended = extended, verbose = verbose,  : 
  !anyDuplicated(basenames) is not TRUE 

sessionInfo( )

```R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.utf8  LC_CTYPE=English_United Kingdom.utf8   
[3] LC_MONETARY=English_United Kingdom.utf8 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.utf8    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods  
[9] base     

other attached packages:
 [1] ggplot2_3.4.0                                      
 [2] circlize_0.4.15                                    
 [3] reshape2_1.4.4                                     
 [4] corpcor_1.6.10                                     
 [5] CpGassoc_2.60                                      
 [6] data.table_1.14.6                                  
 [7] qqman_0.1.8                                        
 [8] tidyr_1.2.1                                        
 [9] pvclust_2.2-0                                      
[10] sqldf_0.4-11                                       
[11] RSQLite_2.2.20                                     
[12] gsubfn_0.7                                         
[13] proto_1.0.0                                        
[14] pcaMethods_1.90.0                                  
[15] sva_3.46.0                                         
[16] BiocParallel_1.32.5                                
[17] genefilter_1.80.3                                  
[18] mgcv_1.8-41                                        
[19] nlme_3.1-160                                       
[20] dplyr_1.0.10                                       
[21] limma_3.54.0                                       
[22] WGCNA_1.72-1                                       
[23] fastcluster_1.2.3                                  
[24] dynamicTreeCut_1.63-1                              
[25] GO.db_3.16.0                                       
[26] AnnotationDbi_1.60.0                               
[27] missMethyl_1.32.0                                  
[28] IlluminaHumanMethylationEPICanno.ilm10b4.hg19_0.6.0
[29] MatrixEQTL_2.3                                     
[30] IlluminaHumanMethylation450kanno.ilmn12.hg19_0.6.1 
[31] IlluminaHumanMethylation450kmanifest_0.4.0         
[32] FlowSorted.Blood.EPIC_2.2.0                        
[33] ExperimentHub_2.6.0                                
[34] AnnotationHub_3.6.0                                
[35] BiocFileCache_2.6.0                                
[36] dbplyr_2.3.0                                       
[37] FlowSorted.Blood.450k_1.36.0                       
[38] IlluminaHumanMethylationEPICanno.ilm10b2.hg19_0.6.0
[39] IlluminaHumanMethylationEPICmanifest_0.3.0         
[40] minfi_1.44.0                                       
[41] bumphunter_1.40.0                                  
[42] locfit_1.5-9.7                                     
[43] iterators_1.0.14                                   
[44] foreach_1.5.2                                      
[45] Biostrings_2.66.0                                  
[46] XVector_0.38.0                                     
[47] SummarizedExperiment_1.28.0                        
[48] Biobase_2.58.0                                     
[49] MatrixGenerics_1.10.0                              
[50] matrixStats_0.63.0                                 
[51] GenomicRanges_1.50.2                               
[52] GenomeInfoDb_1.34.6                                
[53] IRanges_2.32.0                                     
[54] S4Vectors_0.36.1                                   
[55] BiocGenerics_0.44.0                                

loaded via a namespace (and not attached):
  [1] utf8_1.2.2                    tidyselect_1.2.0             
  [3] htmlwidgets_1.6.1             grid_4.2.2                   
  [5] munsell_0.5.0                 codetools_0.2-18             
  [7] preprocessCore_1.60.2         chron_2.3-58                 
  [9] interp_1.1-3                  statmod_1.5.0                
 [11] withr_2.5.0                   colorspace_2.0-3             
 [13] filelock_1.0.2                knitr_1.41                   
 [15] rstudioapi_0.14               GenomeInfoDbData_1.2.9       
 [17] bit64_4.0.5                   rhdf5_2.42.0                 
 [19] vctrs_0.5.1                   generics_0.1.3               
 [21] xfun_0.36                     R6_2.5.1                     
 [23] doParallel_1.0.17             illuminaio_0.40.0            
 [25] bitops_1.0-7                  rhdf5filters_1.10.0          
 [27] cachem_1.0.6                  reshape_0.8.9                
 [29] DelayedArray_0.23.2           assertthat_0.2.1             
 [31] promises_1.2.0.1              BiocIO_1.8.0                 
 [33] scales_1.2.1                  nnet_7.3-18                  
 [35] gtable_0.3.1                  rlang_1.0.6                  
 [37] calibrate_1.7.7               GlobalOptions_0.1.2          
 [39] splines_4.2.2                 rtracklayer_1.58.0           
 [41] impute_1.72.3                 GEOquery_2.66.0              
 [43] checkmate_2.1.0               BiocManager_1.30.19          
 [45] yaml_2.3.6                    GenomicFeatures_1.50.3       
 [47] backports_1.4.1               httpuv_1.6.8                 
 [49] Hmisc_4.7-2                   tcltk_4.2.2                  
 [51] tools_4.2.2                   nor1mix_1.3-0                
 [53] ellipsis_0.3.2                RColorBrewer_1.1-3           
 [55] siggenes_1.72.0               Rcpp_1.0.9                   
 [57] plyr_1.8.8                    base64enc_0.1-3              
 [59] sparseMatrixStats_1.10.0      progress_1.2.2               
 [61] zlibbioc_1.44.0               purrr_1.0.1                  
 [63] RCurl_1.98-1.9                prettyunits_1.1.1            
 [65] rpart_4.1.19                  openssl_2.0.5                
 [67] deldir_1.0-6                  cluster_2.1.4                
 [69] magrittr_2.0.3                hms_1.1.2                    
 [71] mime_0.12                     xtable_1.8-4                 
 [73] XML_3.99-0.13                 jpeg_0.1-10                  
 [75] mclust_6.0.0                  shape_1.4.6                  
 [77] gridExtra_2.3                 compiler_4.2.2               
 [79] biomaRt_2.54.0                tibble_3.1.8                 
 [81] crayon_1.5.2                  htmltools_0.5.4              
 [83] later_1.3.0                   tzdb_0.3.0                   
 [85] Formula_1.2-4                 DBI_1.1.3                    
 [87] MASS_7.3-58.1                 rappdirs_0.3.3               
 [89] Matrix_1.5-1                  readr_2.1.3                  
 [91] cli_3.6.0                     quadprog_1.5-8               
 [93] pkgconfig_2.0.3               GenomicAlignments_1.34.0     
 [95] foreign_0.8-83                xml2_1.3.3                   
 [97] annotate_1.76.0               rngtools_1.5.2               
 [99] multtest_2.54.0               beanplot_1.3.1               
[101] doRNG_1.8.6                   scrime_1.3.5                 
[103] stringr_1.5.0                 digest_0.6.31                
[105] base64_2.0.1                  htmlTable_2.4.1              
[107] edgeR_3.40.2                  DelayedMatrixStats_1.20.0    
[109] restfulr_0.0.15               curl_5.0.0                   
[111] shiny_1.7.4                   Rsamtools_2.14.0             
[113] rjson_0.2.21                  lifecycle_1.0.3              
[115] Rhdf5lib_1.20.0               askpass_1.1                  
[117] fansi_1.0.3                   pillar_1.8.1                 
[119] lattice_0.20-45               KEGGREST_1.38.0              
[121] fastmap_1.1.0                 httr_1.4.4                   
[123] survival_3.4-0                interactiveDisplayBase_1.36.0
[125] glue_1.6.2                    png_0.1-8                    
[127] BiocVersion_3.16.0            bit_4.0.5                    
[129] stringi_1.7.12                HDF5Array_1.26.0             
[131] blob_1.2.3                    org.Hs.eg.db_3.16.0          
[133] latticeExtra_0.6-30           memoise_2.0.1
methylationArrayAnalysis minfiData minfi MethylationArrayData MethylationArray • 1.6k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

R is doing a simple test of duplication.

anyDuplicated(sheet$Basename)

And it is returning TRUE. My prior in this instance is to think R is correct and you are mistaken. What do you get if you do that test yourself?

ADD COMMENT
0
Entering edit mode

Thank you James

The console actually reports there are 2 duplicates. However, when I use the conditional formatting approach across my excel sample sheet, no duplicates are detected (I have also tried with eye). Any suggestions?

ADD REPLY
0
Entering edit mode

The 'Basename' isn't something that exists in the SampleSheet.csv that is read in. As an example

> library(minfi)
> library(minfiData)
> example("read.metharray.sheet")

rd.mt.> if(require(minfiData)) {
rd.mt.+ 
rd.mt.+ baseDir <- system.file("extdata", package = "minfiData")
rd.mt.+ sheet <- read.metharray.sheet(baseDir)
rd.mt.+ 
rd.mt.+ }
Loading required package: minfiData
Loading required package: IlluminaHumanMethylation450kmanifest
Loading required package: IlluminaHumanMethylation450kanno.ilmn12.hg19
[read.metharray.sheet] Found the following CSV files:
[1] "C:/Users/jmacdon/AppData/Local/R/win-library/4.2/minfiData/extdata/SampleSheet.csv"
> sheet
  Sample_Name Sample_Well Sample_Plate Sample_Group Pool_ID person age sex
1    GroupA_3          H5         <NA>       GroupA    <NA>    id3  83   M
2    GroupA_2          D5         <NA>       GroupA    <NA>    id2  58   F
3    GroupB_3          C6         <NA>       GroupB    <NA>    id3  83   M
4    GroupB_1          F7         <NA>       GroupB    <NA>    id1  75   F
5    GroupA_1          G7         <NA>       GroupA    <NA>    id1  75   F
6    GroupB_2          H7         <NA>       GroupB    <NA>    id2  58   F
  status  Array      Slide
1 normal R02C02 5723646052
2 normal R04C01 5723646052
3 cancer R05C02 5723646052
4 cancer R04C02 5723646053
5 normal R05C02 5723646053
6 cancer R06C02 5723646053
                                                                                         Basename
1 C:/Users/jmacdon/AppData/Local/R/win-library/4.2/minfiData/extdata/5723646052/5723646052_R02C02
2 C:/Users/jmacdon/AppData/Local/R/win-library/4.2/minfiData/extdata/5723646052/5723646052_R04C01
3 C:/Users/jmacdon/AppData/Local/R/win-library/4.2/minfiData/extdata/5723646052/5723646052_R05C02
4 C:/Users/jmacdon/AppData/Local/R/win-library/4.2/minfiData/extdata/5723646053/5723646053_R04C02
5 C:/Users/jmacdon/AppData/Local/R/win-library/4.2/minfiData/extdata/5723646053/5723646053_R05C02
6 C:/Users/jmacdon/AppData/Local/R/win-library/4.2/minfiData/extdata/5723646053/5723646053_R06C02
> any(duplicated(sheet$Basename))
[1] FALSE

## read that file in by hand
> z <- read.csv( "C:/Users/jmacdon/AppData/Local/R/win-library/4.2/minfiData/extdata/SampleSheet.csv")
> z
           X.Header.               X          X.1          X.2     X.3
1  Investigator Name        MrNoName                                  
2       Project Name DNA Methylation                                  
3    Experiment Name            Test                                  
4               Date       17-Feb-11                                  
5                                                                     
6             [Data]                                                  
7        Sample_Name     Sample_Well Sample_Plate Sample_Group Pool_ID
8             person             age          sex       status        
9           GroupA_3              H5                    GroupA        
10               id3              83            M       normal        
11          GroupA_2              D5                    GroupA        
12               id2              58            F       normal        
13          GroupB_3              C6                    GroupB        
14               id3              83            M       cancer        
15          GroupB_1              F7                    GroupB        
16               id1              75            F       cancer        
17          GroupA_1              G7                    GroupA        
18               id1              75            F       normal        
19          GroupB_2              H7                    GroupB        
20               id2              58            F       cancer        
          X.4              X.5
1                             
2                             
3                             
4                             
5                             
6                             
7  Sentrix_ID Sentrix_Position
8                             
9  5723646052           R02C02
10                            
11 5723646052           R04C01
12                            
13 5723646052           R05C02
14                            
15 5723646053           R04C02
16                            
17 5723646053           R05C02
18                            
19 5723646053           R06C02
20

The 'Basename' is generated from the sample sheet as well as the path for the sample sheet. You could load the sample sheet into Excel and look for duplicates, but it's a combination of the Sentrix_ID and Sentrix_Position that is being duplicated. And for whatever reason you have dups.

ADD REPLY
0
Entering edit mode

Thank you, you were correct there was a problem with the Sentrix_ID. The issue has now been resolved :)

ADD REPLY

Login before adding your answer.

Traffic: 744 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6