Question

Subscript out of bounds while normalising using affy::rma()

0

Entering edit mode

geoffrey ▴ 10

@9d2f950e

Last seen 2.9 years ago

Germany

Hello all,

I was constructing an Affybatch object then normalising it. However, when I used my costumed hugene10stv1cdf, it always reported subscript out of bounds error. When I used default hthgu133a as cdf, the rma() function runs without any problem. I'm using affy 1.64.0

library(GEOquery)
library(limma)
library(splines)
library(affy)
getGEOSuppFiles("GSE19392")
untar("./GSE19392/GSE19392_RAW.tar",exdir = "~/")
cels<-list.files("~/",pattern = "CEL")
base::sapply(base::paste("~/",cels,sep = "/"),gunzip)
cels<-list.files("~/",pattern = "CEL")
cels<-paste0("~/",cels)
library(hugene10stv1cdf)
rawdata<-ReadAffy(filenames = cels,cdfname = "hugene10stv1cdf")
# rawdata<-ReadAffy(filenames = cels)
normdata<-affy::rma(rawdata,destructive = T)

Error in exprs(object)[index, , drop = FALSE] : subscript out of bounds

sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
 [1] splines   parallel  grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GEOquery_2.54.1        hthgu133acdf_2.18.0    affy_1.64.0            Biobase_2.46.0         BiocGenerics_0.32.0   
[6] limma_3.42.2           hugene10stv1cdf_2.18.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6                  lattice_0.20-38             tidyr_1.1.3                 assertthat_0.2.1           
 [5] utf8_1.2.1                  R6_2.5.0                    GenomeInfoDb_1.22.1         stats4_3.6.2               
 [9] RSQLite_2.2.6               pillar_1.6.0                zlibbioc_1.32.0             rlang_0.4.10               
[13] curl_4.3.1                  blob_1.2.1                  S4Vectors_0.24.4            Matrix_1.2-18              
[17] preprocessCore_1.48.0       BiocParallel_1.20.1         readr_1.4.0                 RCurl_1.98-1.3             
[21] bit_4.0.4                   DelayedArray_0.12.3         compiler_3.6.2              pkgconfig_2.0.3            
[25] tidyselect_1.1.1            SummarizedExperiment_1.16.1 tibble_3.1.0                GenomeInfoDbData_1.2.2     
[29] ff_4.0.4                    IRanges_2.20.2              matrixStats_0.58.0          fansi_0.4.2                
[33] crayon_1.4.1                dplyr_1.0.5                 withr_2.4.2                 bitops_1.0-6               
[37] lifecycle_1.0.0             DBI_1.1.1                   magrittr_2.0.1              cli_2.5.0                  
[41] cachem_1.0.4                XVector_0.26.0              affyio_1.56.0               xml2_1.3.2                 
[45] ellipsis_0.3.1              generics_0.1.0              vctrs_0.3.7                 tools_3.6.2                
[49] bit64_4.0.5                 glue_1.4.2                  purrr_0.3.4                 hms_1.0.0                  
[53] fastmap_1.1.0               AnnotationDbi_1.48.0        BiocManager_1.30.15         GenomicRanges_1.38.0       
[57] sessioninfo_1.1.1           memoise_2.0.0

To reproduce the workable normalisation using default cdf, please uncomment the line below rawdata<-ReadAffy(filenames = cels,cdfname = "hugene10stv1cdf")

Thanks a lot.

Normalization affy AffymetrixChip • 976 views

ADD COMMENT • link 2.9 years ago geoffrey ▴ 10

score 1 · Answer 1 · 2021-05-13

Hi,

GSE19392 utilised the U133a, so, should you not be using the CDF for that array? The affy package is okay for this array type, and it should automatically detect and then download / install the CDF for you, if not already there.

You imply that you then created a custom CDF for an Affymetrix ST array. affy [the package] cannot be used for ST arrays, and one should instead use oligo for these arrays.

Affymetrix arrays come in 3 main groups:

U133
HuGene
HuEx

*others exist but these are the main groups

HuGene and HuEx are ‘ST’ arrays, which have fundamental differences from the other arrays. The main differences with ST arrays:

mismatch (MM) probes are mostly absent
to accommodate more targets, feature size reduced from 121µm^2 to 25µm^2
target entire length of gene, whereas U133 mainly targets 3` only
manufacturer-supplied annotations differ

The absence of MM probes affects downstream processing:

affy Bioconductor package cannot process ST arrays. Must instead use oligo
some normalisation (e.g. mas5) and QC methods that use perfect match (PM) and MM probes cannot be used

Kevin