Entering edit mode
Moosa
•
0
@moosa-20888
Last seen 5.6 years ago
Hello. I'm trying to read some raw (.cel) files generated from Affymetrix U133 Plus 2.0 Array using Brainarray custom CDFs. The code that I'm using are:
install.packages("http://mbni.org/customcdf/23.0.0/entrezg.download/pd.hgu133plus2.hs.entrezg_23.0.0.tar.gz", repos = NULL, type = "source")
library(pd.hgu133plus2.hs.entrezg)
library(oligo)
path = #.cel files path
raw_data <- read.celfiles(path, pkgname = "pd.hgu133plus2.hs.entrezg")
normalized_data = oligo::rma(raw_data, target = "core")
read.CEL files runs well:
Platform design info loaded.
Reading in : C:/Users/moosa/Desktop/Microarray/Projects/array/test/E-GEOD-71423/raw_data/GSM1834030_EA1242_06.CEL
Reading in : C:/Users/moosa/Desktop/Microarray/Projects/array/test/E-GEOD-71423/raw_data/GSM1834029_EA1242_05.CEL
Reading in : C:/Users/moosa/Desktop/Microarray/Projects/array/test/E-GEOD-71423/raw_data/GSM1834028_EA1242_04.CEL
When I'm not passing the argument target = "core"
, the normalization process seems to be executed without a problem, but using the argument target = "core"
leads to the following error:
Background correcting... OK
Normalizing... OK
Available tables: featureSet1, mmfeature, mps1mm, mps1pm, pmfeature, table_info
Error in getMPSInfo(get(annotation(object)), substr(target, 4, 4), "fid", :
Table mpsepm does not exist.
thank you for your time. regards.
target = "core"
is primarily for Gene or Exon arrays, which typically have 'GeneChip' and/or 'ST' in their name.core
will instruct the algorithm to summarise the probesets to gene or exon level. The U133 A and B arrays, which is what you are using, are fundamentally designed differently, so, the usage ofcore
is not valid for these. This is my understanding, at least. Please wait for another person to respond.Seem valid. I've also read your explanations about target parameters in Biostar (for example).
target="probeset"
also yields the same error. So, if I just executerma
without any arguments, in that case, the result should be a normalized dataset summarized based on gene levels? Am I correct?Also, I've run
read.celfiles
files with and withoutpkgname = "pd.hgu133plus2.hs.entrezg"
, the normalized objects from each try are as follow: Do the equal number ofassayData
in both cases and the different number ofassayData
means that the .CEL files has been read correctly using Barainarray CDF, and the different of assaydata (aka probes) reflects the different CDF design of Brainarray? I'm sorry to bother you with my rudementary questions, I've done my searches and reading and just checking with to be sure.Code1:
the ruslts:
and:
Code 2:
the rsults:
and:
Yes, you are correct!
raw_data
: The assayData of this object reflect the number of probes (not probesets) present on the array. This number is independent of the 'chip definition file' (i.e. probe-to-probeset mapping) that is used. Hence these are the same forraw_data
andraw_data2
.In the case of
normalized_data
, you have used modified probe-to-probeset mapping information based on up-to-date genome annotation information (a so-called Custom CDF from the MBNI group). Assuming you used the latest version (i.e. version 23), Manhong Dai (MBNI) generated these remapping files in September/October 2018. Since you used an entrez gene-based remapping file, each probeset innormalized_data
now reflects the expression level of a gene (as annotated by the NCBI ENTREZ database (status September 2018)). The probeset ID as such corresponds to the the ENTREZ ID, with suffix_at
. To be in line with Affymetrix nomenclature,_at
indicates that the probeset detects an antisense target (see e.g. here).In code chunk 2 you used the probe-to-probeset mapping as defined by Affymetrix at the time they designed this array, which was in the early 2000's. FYI: U133 refers to Unigene version 133 (released April 20, 2001), the version of the Unigene database Affymetrix used to design their probes and probesets for this array. By definition, the probesets in
normalized_data2
are NOT (always) unique for a single gene (genes could be detected by multiple probesets), and this can be inferred from the probeset name (whether e.g. _s or _x are present in the probeset name; see here for more info).Hence, your object
normalized_data2
is comprised of more probesets thannormalized_data
, but the number of uniquely detected genes should be roughly similar.Lastly, to complete the story, the content of the (two) hgu133A and hgu133B arrays together is on the (single) hgu133plus2 array. The difference is that the first 2 arrays were manufactured in a photolithographic process in which the minimum (physical) distance between each probe was 11 microM. Not all 'required' probes could then be 'printed' on a single array. However, improved technology allowed the distance to be reduced to only 5 microM, which in turn allowed to synthesize all probes on a single array. See e.g. here (section Array Manufacturing).
Dear Guido I appreciate your informative response and great help. It was just amazing and comprehensive. : )
best regards