Question

Preprocessing of Human Gene 2.0 ST microarrays with oligo R package and annotation options

4

Entering edit mode

svlachavas ▴ 840

@svlachavas-7225

Last seen 1 day ago

Germany/Heidelberg/German Cancer Resear…

Dear Community,

I currently analyzing in R, a small number (6 samples-2 conditions-3 biological reps of each condition) of CEL files regarding Affymetrix Human Gene 2.0 ST arrays (for the first time this type of gene chip arrays). A relevant subset of my code is the following:

library(oligo) library(affycoretools) library(hugene20sttranscriptcluster.db)

library(limma)

librarypd.hugene.2.0.st)

setwd(mydir)

pdat <- read.table("pdat.project.txt",header=TRUE,stringsAsFactors = FALSE) # phenotype info

celfiles = list.celfiles()

affy.cels <- read.celfiles(celfiles)

identical(colnames(affy.cels),rownames(pdat)) # need to be identical for incorporate phenotype info

pd <- AnnotatedDataFrame(data= pdat) phenoData(affy.cels) <- pd celfiles.rma <- rma(affy.cels, target="probeset")

Thus, my main questions are the following:

1) For the rma function, which is the most valid/appropriate choise of target argment for gene ST arrays ? "probeset" or "core" ?

2) For removing the control probesets, i can use the function getMainProbes ?

3) To annotate in later steps of limma (i.e after topTable) my probesets/transcripts into gene symbols, i should first:

annotation(eset.rma) <- "hugene20sttranscriptcluster.db"

& then query the above db with functions select, etc ?

Thank you in advance !!

microarray oligo affycoretools pd.hugene.2.0.st hugene20sttranscriptcluster.db • 4.3k views

ADD COMMENT • link updated 8.2 years ago by James W. MacDonald 68k • written 8.2 years ago by svlachavas ▴ 840

score 2 · Answer 1 · 2017-02-02

2

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 3 hours ago

United States

1.) For the vast majority of users, the 'core' argument is the way to go. The Gene ST arrays are intended to measure transcript abundances, and the ability to summarize at the probeset level is really just due to the fact that they are based on the Exon ST platform.

2.) Yes. Why do you have doubts?

3.) If you are using affycoretools, it's easier to do

library(hugene20sttranscriptcluster.db)
eset.rma <- annotateEset(eset.rma, hugene20sttranscriptcluster.db)

And then your topTable will automatically contain annotation data.

ADD COMMENT • link 8.2 years ago James W. MacDonald 68k

0

Entering edit mode

Dear James,

thank you for your confirmation !! actually, i tried (accidentally) the argument target=probeset with rma and then with getMainProbes, i ended with ~1700 features, which concerned me-and perhaps explains your comment about probeset level (but of course it is not the case when i use the "core" option). Thus, if i understood well, with the annotateEset function the returned annotation data are gene symbols, for matched transcripts, correct ?

ADD REPLY • link 8.2 years ago svlachavas ▴ 840

1

Entering edit mode

The results are the Entrez Gene ID, symbol, and gene name, based on the Affy annotations for that array. We just take what Affy says each probeset measures, and then convert to a useful format without doing anything to check that what they say is correct in any sense.

Also, do note that there is a hugene20stprobeset.db package that annotates the probeset IDs, and that is what you would use to annotate if you summarize at the probeset level.

ADD REPLY • link 8.2 years ago James W. MacDonald 68k

0

Entering edit mode

Than you again for your explanation-i will follow your advice and use the core argument-remove the control "transcripts" & annotation of the expression eset--perhaps the very small number of probesets when i use first "probeset" in rma and then getMainProbes, probably has to do with the design of the array.

ADD REPLY • link 8.2 years ago svlachavas ▴ 840

0

Entering edit mode

Hi James:

I get error: could not find function "annotateEset"

ADD REPLY • link 7.0 years ago GENOMIC_region • 0

2

Entering edit mode

Any time you see an error saying 'could not find function', it means you haven't loaded the package that contains that function yet. Or, it may mean that you are using an old version of R/Bioconductor where the function was not yet part of the package. You don't give the results of sessionInfo, so I can't say for sure, so try A) loading affycoretools first or B) using the current version of R/Bioconductor.

ADD REPLY • link 7.0 years ago James W. MacDonald 68k