Question

oligo package - aggregate transcript expression to gene

0

Entering edit mode

Luca • 0

@lucapiacentini-9597

Last seen 6 months ago

Italy

Hello, I'm analyzing Clariom D microarray by the oligo package and and I would like to know if I can aggregate the transcript expression values to gene expression values to perform a differential expression analysis also at gene level other than at transcript level. Thanks in advance.

affycoretools oligo • 1.6k views

ADD COMMENT • link 4.4 years ago Luca • 0

score 0 · Answer 1 · 2020-11-05

There's probably two ways you could do that. First, you could use one of the MBNI re-mapped library files, which you can read about here.

Installation of the pdInfoPackage is simple

install.packages("http://mbni.org/customcdf/24.0.0/ensg.download/pd.clariomdhuman.hs.ensg_24.0.0.tar.gz", repos = NULL)
## might need a type = "source" if you are on Windows

dat <- read.celfiles(<celfile names go here>, pkgname = "pd.clariomdhuman.hs.ensg")

The 'ensg' array will summarize your data at the NCBI Gene ID (what used to be called Entrez Gene ID). There are versions for Ensembl and other things as well. Note that you can also get annotation packages from MNBI, or you can simply note that the probesets are now just the NCBI Gene ID appended with an _st, and annotate using say the org.Hs.eg.db package.

Second, you could just use the regular pdInfoPackage for that array, summarize at the transcript level, and then average any transcripts for the same gene. An example would be

library(oligo)
library(clariomdhumantranscriptcluster.db)
library(limma)
dat <- read.celfiles(<celfile names go here>)
eset <- rma(dat)
library(affycoretools) #shameless plug, but it makes the rest of this simpler
eset <- annotateEset(eset, clariomdhumantranscriptcluster.db)
## this part is a bit tricky, but will make your life easier
## first remove un-annotated probesets
eset <- eset[!is.na(fData(eset)$ENTREZID),]
## then average
avemat <- avereps(eset, fData(eset)$ENTREZID)
## and put back into an ExpressionSet
## but first we have to ensure we have things ordered correctly
eset2 <- eset[match(row.names(avemat), fData(eset)$ENTREZID),]
row.names(avemat) <- row.names(eset2)
exprs(eset2) <- avemat

And now the expression values in eset2 are the average of any duplicated transcripts, and the annotation for that ExpressionSet still matches up with the averaged data. So you can go on to analyze using limma and when you output the topTable results, the annotation will accurately reflect the underlying expression data. exprs(eset2) <- avemat