Question: rma summarization error when normalizing using oligo and brainarray cdf
gravatar for Moosa
3 months ago by
Moosa0 wrote:

Hello. I'm trying to read some raw (.cel) files generated from Affymetrix U133 Plus 2.0 Array using Brainarray custom CDFs. The code that I'm using are:

install.packages("", repos = NULL, type = "source")
path = #.cel files path
raw_data <- read.celfiles(path, pkgname = "pd.hgu133plus2.hs.entrezg")
normalized_data = oligo::rma(raw_data, target = "core")

read.CEL files runs well:

Platform design info loaded.
Reading in : C:/Users/moosa/Desktop/Microarray/Projects/array/test/E-GEOD-71423/raw_data/GSM1834030_EA1242_06.CEL
Reading in : C:/Users/moosa/Desktop/Microarray/Projects/array/test/E-GEOD-71423/raw_data/GSM1834029_EA1242_05.CEL
Reading in : C:/Users/moosa/Desktop/Microarray/Projects/array/test/E-GEOD-71423/raw_data/GSM1834028_EA1242_04.CEL

When I'm not passing the argument target = "core", the normalization process seems to be executed without a problem, but using the argument target = "core" leads to the following error:

Background correcting... OK
Normalizing... OK
Available tables: featureSet1, mmfeature, mps1mm, mps1pm, pmfeature, table_info
Error in getMPSInfo(get(annotation(object)), substr(target, 4, 4), "fid",  : 
  Table mpsepm does not exist.

thank you for your time. regards.

oligo brainarray mbni • 154 views
ADD COMMENTlink modified 3 months ago • written 3 months ago by Moosa0

target = "core" is primarily for Gene or Exon arrays, which typically have 'GeneChip' and/or 'ST' in their name. core will instruct the algorithm to summarise the probesets to gene or exon level. The U133 A and B arrays, which is what you are using, are fundamentally designed differently, so, the usage of core is not valid for these. This is my understanding, at least. Please wait for another person to respond.

ADD REPLYlink written 3 months ago by Kevin Blighe200

Seem valid. I've also read your explanations about target parameters in Biostar (for example). target="probeset" also yields the same error. So, if I just execute rma without any arguments, in that case, the result should be a normalized dataset summarized based on gene levels? Am I correct?

Also, I've run read.celfiles files with and without pkgname = "pd.hgu133plus2.hs.entrezg", the normalized objects from each try are as follow: Do the equal number of assayData in both cases and the different number of assayData means that the .CEL files has been read correctly using Barainarray CDF, and the different of assaydata (aka probes) reflects the different CDF design of Brainarray? I'm sorry to bother you with my rudementary questions, I've done my searches and reading and just checking with to be sure.


raw_data <- read.celfiles(path, pkgname = "pd.hgu133plus2.hs.entrezg")
normalized_data2 = oligo::rma(raw_data)

the ruslts:

> raw_data
GenericFeatureSet (storageMode: lockedEnvironment)
assayData: 1354896 features, 38 samples 
Annotation: pd.hgu133plus2.hs.entrezg


> normalized_data
ExpressionSet (storageMode: lockedEnvironment)
assayData: 20481 features, 38 samples 
Annotation: pd.hgu133plus2.hs.entrezg

Code 2:

raw_data2 <- read.celfiles(path)
normalized_data2 = oligo::rma(raw_data2)

the rsults:

ExpressionFeatureSet (storageMode: lockedEnvironment)
assayData: 1354896 features, 38 samples 


> normalized data2
ExpressionSet (storageMode: lockedEnvironment)
assayData: 54675 features, 38 samples 
ADD REPLYlink modified 3 months ago • written 3 months ago by Moosa0

Yes, you are correct!

raw_data: The assayData of this object reflect the number of probes (not probesets) present on the array. This number is independent of the 'chip definition file' (i.e. probe-to-probeset mapping) that is used. Hence these are the same for raw_data and raw_data2.

In the case of normalized_data, you have used modified probe-to-probeset mapping information based on up-to-date genome annotation information (a so-called Custom CDF from the MBNI group). Assuming you used the latest version (i.e. version 23), Manhong Dai (MBNI) generated these remapping files in September/October 2018. Since you used an entrez gene-based remapping file, each probeset in normalized_data now reflects the expression level of a gene (as annotated by the NCBI ENTREZ database (status September 2018)). The probeset ID as such corresponds to the the ENTREZ ID, with suffix _at. To be in line with Affymetrix nomenclature, _at indicates that the probeset detects an antisense target (see e.g. here).

In code chunk 2 you used the probe-to-probeset mapping as defined by Affymetrix at the time they designed this array, which was in the early 2000's. FYI: U133 refers to Unigene version 133 (released April 20, 2001), the version of the Unigene database Affymetrix used to design their probes and probesets for this array. By definition, the probesets in normalized_data2 are NOT (always) unique for a single gene (genes could be detected by multiple probesets), and this can be inferred from the probeset name (whether e.g. _s or _x are present in the probeset name; see here for more info).

Hence, your object normalized_data2 is comprised of more probesets than normalized_data, but the number of uniquely detected genes should be roughly similar.

Lastly, to complete the story, the content of the (two) hgu133A and hgu133B arrays together is on the (single) hgu133plus2 array. The difference is that the first 2 arrays were manufactured in a photolithographic process in which the minimum (physical) distance between each probe was 11 microM. Not all 'required' probes could then be 'printed' on a single array. However, improved technology allowed the distance to be reduced to only 5 microM, which in turn allowed to synthesize all probes on a single array. See e.g. here (section Array Manufacturing).

ADD REPLYlink modified 3 months ago • written 3 months ago by Guido Hooiveld2.5k

Dear Guido I appreciate your informative response and great help. It was just amazing and comprehensive. : )

best regards

ADD REPLYlink written 3 months ago by Moosa0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 396 users visited in the last hour