oligo package - aggregate transcript expression to gene
0
Entering edit mode
@luca.piacentini-9597
Last seen 20 hours ago

Hello, I'm analyzing Clariom D microarray by the oligo package and and I would like to know if I can aggregate the transcript expression values to gene expression values to perform a differential expression analysis also at gene level other than at transcript level. Thanks in advance.

affycoretools oligo • 50 views
0
Entering edit mode
@James-W.-MacDonald-5106
Last seen 15 hours ago
United States

There's probably two ways you could do that. First, you could use one of the MBNI re-mapped library files, which you can read about here.

Installation of the pdInfoPackage is simple

install.packages("http://mbni.org/customcdf/24.0.0/ensg.download/pd.clariomdhuman.hs.ensg_24.0.0.tar.gz", repos = NULL)
## might need a type = "source" if you are on Windows

dat <- read.celfiles(<celfile names go here>, pkgname = "pd.clariomdhuman.hs.ensg")


The 'ensg' array will summarize your data at the NCBI Gene ID (what used to be called Entrez Gene ID). There are versions for Ensembl and other things as well. Note that you can also get annotation packages from MNBI, or you can simply note that the probesets are now just the NCBI Gene ID appended with an _st, and annotate using say the org.Hs.eg.db package.

Second, you could just use the regular pdInfoPackage for that array, summarize at the transcript level, and then average any transcripts for the same gene. An example would be

library(oligo)
library(clariomdhumantranscriptcluster.db)
library(limma)
dat <- read.celfiles(<celfile names go here>)
eset <- rma(dat)
library(affycoretools) #shameless plug, but it makes the rest of this simpler
eset <- annotateEset(eset, clariomdhumantranscriptcluster.db)
## this part is a bit tricky, but will make your life easier
## first remove un-annotated probesets
eset <- eset[!is.na(fData(eset)$ENTREZID),] ## then average avemat <- avereps(eset, fData(eset)$ENTREZID)
## and put back into an ExpressionSet
## but first we have to ensure we have things ordered correctly
eset2 <- eset[match(row.names(avemat), fData(eset)\$ENTREZID),]
row.names(avemat) <- row.names(eset2)
exprs(eset2) <- avemat


And now the expression values in eset2 are the average of any duplicated transcripts, and the annotation for that ExpressionSet still matches up with the averaged data. So you can go on to analyze using limma and when you output the topTable results, the annotation will accurately reflect the underlying expression data. exprs(eset2) <- avemat

0
Entering edit mode

Thank you very much for your kind and exhaustive hints! I’ll try them out! Luca