Question: Microarray Data Analysis CDF Problem
0
gravatar for sherajilir
5 weeks ago by
sherajilir0
sherajilir0 wrote:

Hello Everyone,

I am trying to normalize a dataset GSE45220 based on hugene.1.1.st.v1 platform. However, when i try to do rma or gcrma normalization, i get an error about the missing cdf-file which does not work for the package anyway.

#GSE45220
BiocManager::install("GEOquery")
library(GEOquery)
library(dplyr)
BiocManager::install("gcrma")
library(gcrma)
BiocManager::install("pd.hugene.1.1.st.v1")
library(pd.hugene.1.1.st.v1)
BiocManager::install("hugene10sttranscriptcluster.db")
library(hugene10sttranscriptcluster.db)

untar("GSE45220_RAW.tar", exdir="data1")
cels = list.files("data1/", pattern = "CEL")
sapply(paste("data1", cels, sep="/"), gunzip)
cels = list.files("data1/", pattern = "CEL")
raw.data=ReadAffy(filenames=cels)

Warning message:

The affy package can process data from the Gene ST 1.x series of arrays, but you should consider using either the oligo or xps packages, which are specifically designed for these arrays.

data.rma.norm=rma(raw.data)

Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘rma’ for signature ‘"AffyBatch"’

I then tried oligo package but the problem persisted. SCAN also did not work, giving me this error Error in as.character.default(x) : no method for coercing this S4 class to a vector

Has anyone been through the same experience? I can use the processed data but normalizing myself would be much better i think.

I am using R version 3.6.0.

Thank you

microarray normalization • 70 views
ADD COMMENTlink modified 5 weeks ago by Guido Hooiveld2.5k • written 5 weeks ago by sherajilir0
Answer: Microarray Data Analysis CDF Problem
1
gravatar for Guido Hooiveld
5 weeks ago by
Guido Hooiveld2.5k
Wageningen University, Wageningen, the Netherlands
Guido Hooiveld2.5k wrote:

Hi, First some remarks: Use indeed only the package oligo to read and process these files, and not affy. There is indeed no CDF available for the HuGene ST 1.1 arrays! That's why you need the oligo-based framework with the corresponding PlatformDesign (PdInfo) info package! gcRMA normalization cannot be applied to these arrays, because only PM probes are on the array (the required MM probes are missing). After normalization, I strongly recommend to add annotation info using the function annotateEset() from the library affycoretools.

Some code to get you started:

> library(oligo)
> library(hugene11sttranscriptcluster.db)
> library(affycoretools)
> 
> # read in CEL files
> path<- "./GSE45220_RAW" #dir with (compressed) CEL files
> raw.data <- read.celfiles(filenames = list.celfiles(path,  full.names=TRUE, listGzipped=TRUE) )
Loading required package: pd.hugene.1.1.st.v1
Loading required package: RSQLite
Loading required package: DBI
Platform design info loaded.
Reading in : ./GSE45220_RAW/GSM1099310_PS01_uns_A05_2.CEL.gz
Reading in : ./GSE45220_RAW/GSM1099311_PS02_NaB_A07.CEL.gz
Reading in : ./GSE45220_RAW/GSM1099312_PS03_NaB_Cip125_A09.CEL.gz
Reading in : ./GSE45220_RAW/GSM1099313_PS04_uns_B05.CEL.gz
Reading in : ./GSE45220_RAW/GSM1099314_PS05_NaB_B07.CEL.gz
Reading in : ./GSE45220_RAW/GSM1099315_PS06_NaB_Cip150_B09.CEL.gz
Reading in : ./GSE45220_RAW/GSM1099316_PS07_uns_C05.CEL.gz
Reading in : ./GSE45220_RAW/GSM1099317_PS08_NaB_C07.CEL.gz
Reading in : ./GSE45220_RAW/GSM1099318_PS09_NaB_Cip150_C09.CEL.gz
> 
> # RMA normalization
> norm.data <- oligo::rma(raw.data, target = "core")
Background correcting
Normalizing
Calculating Expression
> 
> # add annotation info (using functionality affycoretools)
> norm.data <- annotateEset(norm.data,  hugene11sttranscriptcluster.db)
'select()' returned 1:many mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
'select()' returned 1:many mapping between keys and columns
> norm.data
ExpressionSet (storageMode: lockedEnvironment)
assayData: 33297 features, 9 samples 
  element names: exprs 
protocolData
  rowNames: GSM1099310_PS01_uns_A05_2.CEL.gz
    GSM1099311_PS02_NaB_A07.CEL.gz ...
    GSM1099318_PS09_NaB_Cip150_C09.CEL.gz (9 total)
  varLabels: exprs dates
  varMetadata: labelDescription channel
phenoData
  rowNames: GSM1099310_PS01_uns_A05_2.CEL.gz
    GSM1099311_PS02_NaB_A07.CEL.gz ...
    GSM1099318_PS09_NaB_Cip150_C09.CEL.gz (9 total)
  varLabels: index
  varMetadata: labelDescription channel
featureData
  featureNames: 7892501 7892502 ... 8180418 (33297 total)
  fvarLabels: PROBEID ENTREZID SYMBOL GENENAME
  fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation: pd.hugene.1.1.st.v1
ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by Guido Hooiveld2.5k

Thanks a lot Guido, it worked finally!

ADD REPLYlink written 5 weeks ago by sherajilir0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 278 users visited in the last hour