Dear Community,
I currently analyzing in R, a small number (6 samples-2 conditions-3 biological reps of each condition) of CEL files regarding Affymetrix Human Gene 2.0 ST arrays (for the first time this type of gene chip arrays). A relevant subset of my code is the following:
library(oligo)
library(affycoretools)
library(hugene20sttranscriptcluster.db)
library(limma)
librarypd.hugene.2.0.st)
setwd(mydir)
pdat <- read.table("pdat.project.txt",header=TRUE,stringsAsFactors = FALSE) # phenotype info
celfiles = list.celfiles()
affy.cels <- read.celfiles(celfiles)
identical(colnames(affy.cels),rownames(pdat)) # need to be identical for incorporate phenotype info
pd <- AnnotatedDataFrame(data= pdat)
phenoData(affy.cels) <- pd
celfiles.rma <- rma(affy.cels, target="probeset")
Thus, my main questions are the following:
1) For the rma function, which is the most valid/appropriate choise of target argment for gene ST arrays ? "probeset" or "core" ?
2) For removing the control probesets, i can use the function getMainProbes ?
3) To annotate in later steps of limma (i.e after topTable) my probesets/transcripts into gene symbols, i should first:
annotation(eset.rma) <- "hugene20sttranscriptcluster.db"
& then query the above db with functions select, etc ?
Thank you in advance !!

