There's probably two ways you could do that. First, you could use one of the MBNI re-mapped library files, which you can read about here.
Installation of the pdInfoPackage
is simple
install.packages("http://mbni.org/customcdf/24.0.0/ensg.download/pd.clariomdhuman.hs.ensg_24.0.0.tar.gz", repos = NULL)
## might need a type = "source" if you are on Windows
dat <- read.celfiles(<celfile names go here>, pkgname = "pd.clariomdhuman.hs.ensg")
The 'ensg' array will summarize your data at the NCBI Gene ID (what used to be called Entrez Gene ID). There are versions for Ensembl and other things as well. Note that you can also get annotation packages from MNBI, or you can simply note that the probesets are now just the NCBI Gene ID appended with an _st, and annotate using say the org.Hs.eg.db
package.
Second, you could just use the regular pdInfoPackage
for that array, summarize at the transcript level, and then average any transcripts for the same gene. An example would be
library(oligo)
library(clariomdhumantranscriptcluster.db)
library(limma)
dat <- read.celfiles(<celfile names go here>)
eset <- rma(dat)
library(affycoretools) #shameless plug, but it makes the rest of this simpler
eset <- annotateEset(eset, clariomdhumantranscriptcluster.db)
## this part is a bit tricky, but will make your life easier
## first remove un-annotated probesets
eset <- eset[!is.na(fData(eset)$ENTREZID),]
## then average
avemat <- avereps(eset, fData(eset)$ENTREZID)
## and put back into an ExpressionSet
## but first we have to ensure we have things ordered correctly
eset2 <- eset[match(row.names(avemat), fData(eset)$ENTREZID),]
row.names(avemat) <- row.names(eset2)
exprs(eset2) <- avemat
And now the expression values in eset2
are the average of any duplicated transcripts, and the annotation for that ExpressionSet
still matches up with the averaged data. So you can go on to analyze using limma
and when you output the topTable
results, the annotation will accurately reflect the underlying expression data.
exprs(eset2) <- avemat
Thank you very much for your kind and exhaustive hints! I’ll try them out! Luca