Question

issue with mta10.r1.genecdf: eset results don't make sense

0

Entering edit mode

juliayuecui2011 • 0

@juliayuecui2011-8112

Last seen 9.8 years ago

United States

Hi James,

I got some mta 1.0 arrays, and I compared my R script output with the Affymetrix Expression Console output at the gene level. The results do not match. The Affymetrix Expression Console output appears to be correct, because the positive controls worked. However, none of the results from R output make sense.

In RStudio, I did this and all went through smoothly:

library(makecdfenv)

make.cdf.package("MTA-1_0.r1.gene.cdf", species = "Mus_musculus")

install.packages("mta10.r1.genecdf/", repos = NULL, type = "source")

library(affy)

library(mta10.r1.genecdf)

data <- ReadAffy(cdfname="mta10.r1.genecdf")

annotation(data) <- "mta10.r1.genecdf"

eset <- rma(data)

e<-exprs(eset)

There are 71293 rows in e.

Then I used annaffy and mta10sttranscriptcluster.db to convert the probe IDs (TCxxxxx) to gene symbols etc.

However, even just simply looking at the eset and e files in R, the results do not make sense at all; using the TCxxxx probe IDs, and the Affymetrix's recently released CSV file (MTA-1_0.na35.mm10.transcript), I was able to find the gene symbols for many known highly expressed and lowly expressed genes in my particular samples; however, the probe densities all look pretty much the same. In Affymetrix Expression Console, however, these genes behave the way they should.

My suspicion is that there is something wrong with the CDF file or I should not use rma?

Thanks for your help.

annotation • 1.1k views

ADD COMMENT • link updated 9.8 years ago by James W. MacDonald 68k • written 9.8 years ago by juliayuecui2011 • 0

score 0 · Answer 1 · 2015-06-09

I should probably fix the error message in affy to include the HTA and MTA probes. The affy package should really only be used for the old 3'-biased arrays. For everything else you should use either oligo or xps. To use oligo you would do:

library(oligo)

dat <- read.celfiles(list.celfiles())

eset <- rma(dat)

And depending on what you want to do, you might want to summarize at different levels. The MTA arrays are very complex, and can hypothetically be used to detect differential splicing. But there is nothing in Bioconductor that I know of that is designed for that sort of analysis. If you simply want to measure differential expression, then the code above will summarize the data at the transcript level, and you can then use e.g., limma to make comparisons between different groups.