Question: Preprocessing of Human Gene 2.0 ST microarrays with oligo R package and annotation options
4
2.7 years ago by
svlachavas740
Greece/Athens/National Hellenic Research Foundation
svlachavas740 wrote:

Dear Community,

I currently analyzing in R, a small number (6 samples-2 conditions-3 biological reps of each condition) of CEL files regarding Affymetrix Human Gene 2.0 ST arrays (for the first time this type of gene chip arrays). A relevant subset of my code is the following:

library(oligo) library(affycoretools) library(hugene20sttranscriptcluster.db) 

library(limma)

librarypd.hugene.2.0.st)

setwd(mydir)

pdat <- read.table("pdat.project.txt",header=TRUE,stringsAsFactors = FALSE) # phenotype info

celfiles = list.celfiles()

affy.cels <- read.celfiles(celfiles)

identical(colnames(affy.cels),rownames(pdat)) # need to be identical for incorporate phenotype info

pd <- AnnotatedDataFrame(data= pdat) phenoData(affy.cels) <- pd celfiles.rma <- rma(affy.cels, target="probeset")

Thus, my main questions are the following:

1) For the rma function, which is the most valid/appropriate choise of target argment for gene ST arrays ? "probeset" or "core" ?

2) For removing the control probesets, i can use the function getMainProbes

3) To annotate in later steps of limma (i.e after topTable) my probesets/transcripts into gene symbols, i should first:

annotation(eset.rma) <- "hugene20sttranscriptcluster.db"

& then query the above db with functions select, etc ?

Thank you in advance !!

modified 2.7 years ago by James W. MacDonald51k • written 2.7 years ago by svlachavas740
Answer: Preprocessing of Human Gene 2.0 ST microarrays with oligo R package and annotati
2
2.7 years ago by
United States
James W. MacDonald51k wrote:

1.) For the vast majority of users, the 'core' argument is the way to go. The Gene ST arrays are intended to measure transcript abundances, and the ability to summarize at the probeset level is really just due to the fact that they are based on the Exon ST platform.

2.) Yes. Why do you have doubts?

3.) If you are using affycoretools, it's easier to do

library(hugene20sttranscriptcluster.db)
eset.rma <- annotateEset(eset.rma, hugene20sttranscriptcluster.db)

And then your topTable will automatically contain annotation data.

Dear James,

thank you for your confirmation !! actually, i tried (accidentally) the argument target=probeset with rma and then with getMainProbes, i ended with ~1700 features, which concerned me-and perhaps explains your comment about probeset level (but of course it is not the case when i use the "core" option). Thus, if i understood well, with the annotateEset function the returned annotation data are gene symbols, for matched transcripts, correct ?

1

The results are the Entrez Gene ID, symbol, and gene name, based on the Affy annotations for that array. We just take what Affy says each probeset measures, and then convert to a useful format without doing anything to check that what they say is correct in any sense.

Also, do note that there is a hugene20stprobeset.db package that annotates the probeset IDs, and that is what you would use to annotate if you summarize at the probeset level.

ADD REPLYlink written 2.7 years ago by James W. MacDonald51k

Than you again for your explanation-i will follow your advice and use the core argument-remove the control "transcripts" & annotation of the expression eset--perhaps the very small number of probesets when i use first "probeset" in rma and then getMainProbes, probably has to do with the design of the array.

Hi James:

I get error:  could not find function "annotateEset"

2

Any time you see an error saying 'could not find function', it means you haven't loaded the package that contains that function yet. Or, it may mean that you are using an old version of R/Bioconductor where the function was not yet part of the package. You don't give the results of sessionInfo, so I can't say for sure, so try A) loading affycoretools first or B) using the current version of R/Bioconductor.

ADD REPLYlink written 18 months ago by James W. MacDonald51k