I am updating my analysis pipeline for Affymetrix arrays, and would like to use the library oligo
for the analysis of all types/generations of arrays. Until now I used the functionality of the library affy
to normalize the 1st generation of arrays (those arrays including MM probes), and usually I do this by the GC-RMA procedure. However, upon checking the documentation, I could not find whether GC-RMA normalization is possible with oligo
. Any hints would be appreciated!
Thanks, Guido
Example code using some 430a arrays from GEO (because this one of the few 'old' arrays for which a PdInfo library is availaible at BioC):
Source: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE20282
# Install required PdInfo package
> BiocManager::install("pd.moe430a")
# preferred way using library Oligo:
> library(oligo)
# Read in data
> affy.data <- read.celfiles(filenames = list.celfiles(), pkgname = "pd.moe430a")
Loading required package: pd.moe430a
Loading required package: RSQLite
Loading required package: DBI
Platform design info loaded.
Reading in : GSM508528.CEL
Reading in : GSM508529.CEL
Reading in : GSM508530.CEL
Reading in : GSM508531.CEL
Reading in : GSM508532.CEL
Reading in : GSM508533.CEL
>
# RMA normalization works fine...:
> norm.data <- oligo::rma(affy.data)
Background correcting
Normalizing
Calculating Expression
>
# ... also through an alternative route
> norm.data2 <- oligo::fitProbeLevelModel(affy.data)
Background correcting... OK
Normalizing... OK
Summarizing... OK
Extracting...
Estimates... OK
StdErrors... OK
Weights..... OK
Residuals... OK
Scale....... OK
>
## convert oligoPLM object to ExpressionSet!
> norm.data2 <- opset2eset(norm.data2)
>
#check
> norm.data
ExpressionSet (storageMode: lockedEnvironment)
assayData: 22690 features, 6 samples
element names: exprs
protocolData
rowNames: GSM508528.CEL GSM508529.CEL ... GSM508533.CEL (6 total)
varLabels: exprs dates
varMetadata: labelDescription channel
phenoData
rowNames: GSM508528.CEL GSM508529.CEL ... GSM508533.CEL (6 total)
varLabels: index
varMetadata: labelDescription channel
featureData: none
experimentData: use 'experimentData(object)'
Annotation: pd.moe430a
>
>
> norm.data2
ExpressionSet (storageMode: lockedEnvironment)
assayData: 22690 features, 6 samples
element names: exprs, se.exprs
protocolData
sampleNames: GSM508528.CEL GSM508529.CEL ... GSM508533.CEL (6 total)
varLabels: exprs dates
varMetadata: labelDescription channel
phenoData
sampleNames: GSM508528.CEL GSM508529.CEL ... GSM508533.CEL (6 total)
varLabels: index
varMetadata: labelDescription channel
featureData: none
experimentData: use 'experimentData(object)'
Annotation: pd.moe430a
>
>
# ... but how to perform GC-RMA normalization...??
sessionInfo()
R version 3.5.1 Patched (2018-11-24 r75665)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] pd.moe430a_3.12.0 DBI_1.0.0 RSQLite_2.1.1
[4] oligo_1.46.0 Biostrings_2.50.2 XVector_0.22.0
[7] IRanges_2.16.0 S4Vectors_0.20.1 Biobase_2.42.0
[10] oligoClasses_1.44.0 BiocGenerics_0.28.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.0 compiler_3.5.1
[3] BiocManager_1.30.4 GenomeInfoDb_1.18.1
[5] bitops_1.0-6 iterators_1.0.10
[7] tools_3.5.1 zlibbioc_1.28.0
[9] digest_0.6.18 bit_1.1-14
[11] memoise_1.1.0 preprocessCore_1.44.0
[13] lattice_0.20-38 ff_2.2-14
[15] pkgconfig_2.0.2 Matrix_1.2-15
[17] foreach_1.4.4 DelayedArray_0.8.0
[19] GenomeInfoDbData_1.2.0 affxparser_1.54.0
[21] bit64_0.9-7 grid_3.5.1
[23] BiocParallel_1.16.5 blob_1.1.1
[25] codetools_0.2-16 matrixStats_0.54.0
[27] GenomicRanges_1.34.0 splines_3.5.1
[29] SummarizedExperiment_1.12.0 RCurl_1.95-4.11
[31] affyio_1.52.0
>