oligo package - aggregate transcript expression to gene
Entering edit mode
Last seen 20 hours ago

Hello, I'm analyzing Clariom D microarray by the oligo package and and I would like to know if I can aggregate the transcript expression values to gene expression values to perform a differential expression analysis also at gene level other than at transcript level. Thanks in advance.

affycoretools oligo • 50 views
Entering edit mode
Last seen 15 hours ago
United States

There's probably two ways you could do that. First, you could use one of the MBNI re-mapped library files, which you can read about here.

Installation of the pdInfoPackage is simple

install.packages("http://mbni.org/customcdf/24.0.0/ensg.download/pd.clariomdhuman.hs.ensg_24.0.0.tar.gz", repos = NULL)
## might need a type = "source" if you are on Windows

dat <- read.celfiles(<celfile names go here>, pkgname = "pd.clariomdhuman.hs.ensg")

The 'ensg' array will summarize your data at the NCBI Gene ID (what used to be called Entrez Gene ID). There are versions for Ensembl and other things as well. Note that you can also get annotation packages from MNBI, or you can simply note that the probesets are now just the NCBI Gene ID appended with an _st, and annotate using say the org.Hs.eg.db package.

Second, you could just use the regular pdInfoPackage for that array, summarize at the transcript level, and then average any transcripts for the same gene. An example would be

dat <- read.celfiles(<celfile names go here>)
eset <- rma(dat)
library(affycoretools) #shameless plug, but it makes the rest of this simpler
eset <- annotateEset(eset, clariomdhumantranscriptcluster.db)
## this part is a bit tricky, but will make your life easier
## first remove un-annotated probesets
eset <- eset[!is.na(fData(eset)$ENTREZID),]
## then average
avemat <- avereps(eset, fData(eset)$ENTREZID)
## and put back into an ExpressionSet
## but first we have to ensure we have things ordered correctly
eset2 <- eset[match(row.names(avemat), fData(eset)$ENTREZID),]
row.names(avemat) <- row.names(eset2)
exprs(eset2) <- avemat

And now the expression values in eset2 are the average of any duplicated transcripts, and the annotation for that ExpressionSet still matches up with the averaged data. So you can go on to analyze using limma and when you output the topTable results, the annotation will accurately reflect the underlying expression data. exprs(eset2) <- avemat

Entering edit mode

Thank you very much for your kind and exhaustive hints! I’ll try them out! Luca


Login before adding your answer.

Similar Posts
Loading Similar Posts
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.3