Question

normalized multiple microarrays with different design files oligo

0

Entering edit mode

benjamin-gansemer • 0

@benjamin-gansemer-22748

Last seen 4.3 years ago

I'm attempting to re-analyze some old microarray data. The microarrays are in NimbleGen format, originally .pair files, but i have (successfully) converted them to .xys files. I would like to normalize them together, as one data set, using oligo. The issues is that one of the microarrays uses the 090901RatHX12expr.ndf design file and the other two use 100718RatHX12expr.ndf.

My question is: is it possible to create one dataset of all 3 arrays, using oligo, despite having two different design files?

I can generate an ExpressionFeatureSet of the first array with the different design file, and a separate set of the other two arrays. When I've tried creating one set using all three arrays, I get the below error:

allData <- read.xysfiles(allXYS, phenoData = allPD, checkType = F)
Loading required package: pd.090901.rat.hx12.expr
Platform design info loaded.
Checking designs for each XYS file... Error in smartReadXYS(filenames, sampleNames) : 
  './raw-data/xysfiles/BR1/P32_control_apex_A01_532.xys' and './raw-data/xysfiles/BR2/531207_A01_EB-P32-SGN-CA_2012-03-16_532.xys' use different designs.

Here is the code used to generate the list of .xys files and other information need to generate the ExpressionFeatureSet:

allXYS <- c(BR1xys, BR2xys, BR3xys)

#metadata
allConditions <- data.frame(Key=rep(c("P32HA", "P32HB", "P32DA", "P32DB", "P60HA", "P60HB", "P60DA", "P60DB", "P32HA", "P32HB", "P32DA", "P32DB", "P60HA", "P60HB", "P60DA", "P60DB", "P32DA", "P32DB", "P32HA", "P32HB", "P60DA", "P60DB", "P60HA", "P60HB"), each=3))
rownames(allConditions) <- basename(allXYS)
allLVLs <- c("exprs", "_ALL_")
allMtData <- data.frame(channel=factor("exprs", levels=allLVLs), labelDescription="Sample type")
allPD <- new("AnnotatedDataFrame", data=allConditions, varMetadata=allMtData)

#ExpressionFeatureSet
allData <- read.xysfiles(allXYS, phenoData = allPD, checkType = F)

> sessionInfo()
R version 3.5.3 (2019-03-11)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Fedora 30 (MATE-Compiz)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] pd.090901.rat.hx12.expr_0.0.1 pd.100718.rat.hx12.expr_0.0.1 DBI_1.1.0                     genefilter_1.64.0            
 [5] limma_3.38.3                  pdInfoBuilder_1.46.0          oligo_1.46.0                  Biostrings_2.50.2            
 [9] XVector_0.22.0                IRanges_2.16.0                S4Vectors_0.20.1              affxparser_1.54.0            
[13] RSQLite_2.1.5                 Biobase_2.42.0                BiocGenerics_0.28.0           oligoClasses_1.44.0          

loaded via a namespace (and not attached):
 [1] SummarizedExperiment_1.12.0 xfun_0.11                   splines_3.5.3               lattice_0.20-38            
 [5] vctrs_0.2.1                 yaml_2.2.0                  blob_1.2.0                  XML_3.98-1.20              
 [9] survival_3.1-8              rlang_0.4.1                 pillar_1.4.3                BiocParallel_1.16.6        
[13] bit64_0.9-7                 affyio_1.52.0               matrixStats_0.55.0          GenomeInfoDbData_1.2.0     
[17] foreach_1.4.7               zlibbioc_1.28.0             codetools_0.2-16            memoise_1.1.0              
[21] knitr_1.25                  ff_2.2-14                   GenomeInfoDb_1.18.2         AnnotationDbi_1.44.0       
[25] preprocessCore_1.44.0       Rcpp_1.0.3                  xtable_1.8-4                backports_1.1.5            
[29] BiocManager_1.30.10         DelayedArray_0.8.0          annotate_1.60.1             bit_1.1-14                 
[33] digest_0.6.23               GenomicRanges_1.34.0        grid_3.5.3                  tools_3.5.3                
[37] bitops_1.0-6                RCurl_1.95-4.12             tibble_2.1.3                crayon_1.3.4               
[41] pkgconfig_2.0.3             zeallot_0.1.0               Matrix_1.2-15               rstudioapi_0.10            
[45] iterators_1.0.12            compiler_3.5.3

microarray oligo • 579 views

ADD COMMENT • link updated 4.3 years ago by James W. MacDonald 65k • written 4.3 years ago by benjamin-gansemer • 0

score 0 · Answer 1 · 2020-01-22

If the two arrays use different designs, that may (probably does) imply that at least some of the probes are different. This could include different placement on the array, or different numbers of probes per probeset. Without knowing if that is true or not, it is pretty dangerous to think that you should just pile them all together and hope for the best.

It may be that there are no differences between the arrays, and NimbleGen just generated different design files, in which case you might hypothetically get around it by reading in separately and combining. That might require some work, and I'm not about to suggest you do so, nor will I help you to figure out how to do that. If you want to go off-reservation like that, have at it, but you're going to be on your own.

The fact that there are two design files also implies that they were run in different batches, and with only three total arrays you probably don't have any replication, and hence no way to adjust for batch-specific differences. So any changes are likely to be fold changes alone, and completely confounded with batch, which is to say most likely not informative anyway. So maybe it's not worth bothering with this analysis?

You also have an old version of R/BioC, so you should upgrade to R-3.6.X and the current release version of Bioconductor.