Question: Preprocessing of Human Gene 2.0 ST microarrays with oligo R package and annotation options
gravatar for svlachavas
2.8 years ago by
Greece/Athens/National Hellenic Research Foundation
svlachavas740 wrote:

Dear Community,

I currently analyzing in R, a small number (6 samples-2 conditions-3 biological reps of each condition) of CEL files regarding Affymetrix Human Gene 2.0 ST arrays (for the first time this type of gene chip arrays). A relevant subset of my code is the following:




pdat <- read.table("pdat.project.txt",header=TRUE,stringsAsFactors = FALSE) # phenotype info

celfiles = list.celfiles()

affy.cels <- read.celfiles(celfiles)

identical(colnames(affy.cels),rownames(pdat)) # need to be identical for incorporate phenotype info

pd <- AnnotatedDataFrame(data= pdat)
phenoData(affy.cels) <- pd
celfiles.rma <- rma(affy.cels, target="probeset")

Thus, my main questions are the following:

1) For the rma function, which is the most valid/appropriate choise of target argment for gene ST arrays ? "probeset" or "core" ?

2) For removing the control probesets, i can use the function getMainProbes

3) To annotate in later steps of limma (i.e after topTable) my probesets/transcripts into gene symbols, i should first:

annotation(eset.rma) <- "hugene20sttranscriptcluster.db"

& then query the above db with functions select, etc ?


Thank you in advance !!



ADD COMMENTlink modified 2.8 years ago by James W. MacDonald52k • written 2.8 years ago by svlachavas740
Answer: Preprocessing of Human Gene 2.0 ST microarrays with oligo R package and annotati
gravatar for James W. MacDonald
2.8 years ago by
United States
James W. MacDonald52k wrote:

1.) For the vast majority of users, the 'core' argument is the way to go. The Gene ST arrays are intended to measure transcript abundances, and the ability to summarize at the probeset level is really just due to the fact that they are based on the Exon ST platform.

2.) Yes. Why do you have doubts?

3.) If you are using affycoretools, it's easier to do

eset.rma <- annotateEset(eset.rma, hugene20sttranscriptcluster.db)

And then your topTable will automatically contain annotation data.

ADD COMMENTlink written 2.8 years ago by James W. MacDonald52k

Dear James,

thank you for your confirmation !! actually, i tried (accidentally) the argument target=probeset with rma and then with getMainProbes, i ended with ~1700 features, which concerned me-and perhaps explains your comment about probeset level (but of course it is not the case when i use the "core" option). Thus, if i understood well, with the annotateEset function the returned annotation data are gene symbols, for matched transcripts, correct ?

ADD REPLYlink written 2.8 years ago by svlachavas740

The results are the Entrez Gene ID, symbol, and gene name, based on the Affy annotations for that array. We just take what Affy says each probeset measures, and then convert to a useful format without doing anything to check that what they say is correct in any sense.

Also, do note that there is a hugene20stprobeset.db package that annotates the probeset IDs, and that is what you would use to annotate if you summarize at the probeset level.

ADD REPLYlink written 2.8 years ago by James W. MacDonald52k

Than you again for your explanation-i will follow your advice and use the core argument-remove the control "transcripts" & annotation of the expression eset--perhaps the very small number of probesets when i use first "probeset" in rma and then getMainProbes, probably has to do with the design of the array.

ADD REPLYlink written 2.8 years ago by svlachavas740

Hi James:   

I get error:  could not find function "annotateEset"

ADD REPLYlink written 20 months ago by GENOMIC_region0

Any time you see an error saying 'could not find function', it means you haven't loaded the package that contains that function yet. Or, it may mean that you are using an old version of R/Bioconductor where the function was not yet part of the package. You don't give the results of sessionInfo, so I can't say for sure, so try A) loading affycoretools first or B) using the current version of R/Bioconductor.

ADD REPLYlink written 20 months ago by James W. MacDonald52k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 396 users visited in the last hour