Dear Community,
I currently analyzing in R, a small number (6 samples-2 conditions-3 biological reps of each condition) of CEL files regarding Affymetrix Human Gene 2.0 ST arrays (for the first time this type of gene chip arrays). A relevant subset of my code is the following:
library(oligo)
library(affycoretools)
library(hugene20sttranscriptcluster.db)
library(limma)
librarypd.hugene.2.0.st)
setwd(mydir)
pdat <- read.table("pdat.project.txt",header=TRUE,stringsAsFactors = FALSE) # phenotype info
celfiles = list.celfiles()
affy.cels <- read.celfiles(celfiles)
identical(colnames(affy.cels),rownames(pdat)) # need to be identical for incorporate phenotype info
pd <- AnnotatedDataFrame(data= pdat)
phenoData(affy.cels) <- pd
celfiles.rma <- rma(affy.cels, target="probeset")
Thus, my main questions are the following:
1) For the rma function, which is the most valid/appropriate choise of target argment for gene ST arrays ? "probeset" or "core" ?
2) For removing the control probesets, i can use the function getMainProbes ?
3) To annotate in later steps of limma (i.e after topTable) my probesets/transcripts into gene symbols, i should first:
annotation(eset.rma) <- "hugene20sttranscriptcluster.db"
& then query the above db with functions select, etc ?
Thank you in advance !!
Dear James,
thank you for your confirmation !! actually, i tried (accidentally) the argument target=probeset with rma and then with getMainProbes, i ended with ~1700 features, which concerned me-and perhaps explains your comment about probeset level (but of course it is not the case when i use the "core" option). Thus, if i understood well, with the annotateEset function the returned annotation data are gene symbols, for matched transcripts, correct ?
The results are the Entrez Gene ID, symbol, and gene name, based on the Affy annotations for that array. We just take what Affy says each probeset measures, and then convert to a useful format without doing anything to check that what they say is correct in any sense.
Also, do note that there is a hugene20stprobeset.db package that annotates the probeset IDs, and that is what you would use to annotate if you summarize at the probeset level.
Than you again for your explanation-i will follow your advice and use the core argument-remove the control "transcripts" & annotation of the expression eset--perhaps the very small number of probesets when i use first "probeset" in rma and then getMainProbes, probably has to do with the design of the array.
Hi James:
I get error: could not find function "annotateEset"
Any time you see an error saying 'could not find function', it means you haven't loaded the package that contains that function yet. Or, it may mean that you are using an old version of R/Bioconductor where the function was not yet part of the package. You don't give the results of sessionInfo, so I can't say for sure, so try A) loading affycoretools first or B) using the current version of R/Bioconductor.