I'm not sure you want to be using gcrma() on these arrays, but it looks like currently there is no 'clean' way to analyze these arrays using Bioconductor. Let me explain.
First a bit of background. These arrays are Affymetrix's answer to RNA-Seq taking market share from them, and are intended to allow one to measure differential transcript splicing, just like you can (hypothetically) do with RNA-Seq. They do this by adding a bunch of probes that are intended to span known exon-exon junctions, so you can get a set of exon-level measurements, along with measures of exon-exon junctions, from which one could then hypothetically infer what transcripts are being expressed in a given sample type, along with relative frequencies. This information could then be used to infer differences in both transcript abundance between sample types, as well as differences in the abundances of different splice variants.
Affymetrix has their own Transcript Analysis Console software that is supposed to help you do this sort of thing, which might be the way to go for now.
In Bioconductor, the only package that is capable of dealing with these arrays in a reasonable manner is oligo. However, when I try to build a pdInfoPackage from the various files that Affymetrix supply for this array, it fails because they have 811 probesets in their probeset annotation csv file that are not found in the pgf and clf files. For the uninitiated, that is a bunch of blahblahblah, that just boils down to the fact that there is an inconsistency between the various files that Affymetrix supply that cannot be resolved right now without making possibly unwarranted assumptions. Without a pdInfoPackage, you cannot process the files using oligo, so there you go.
Affymetrix does apparently have two CDF files that could be used to process these data using the old makecdfenv/affy or gcrma pipelines. They have an exon-level cdf and a transcript-level cdf. You could download one or both of these CDF files (found here: http://www.affymetrix.com/estore/catalog/prod870009/AFFY/Mouse+Transcriptome+Assay+1.0#1_3), which will require a free registration with Affy, and use the makecdfenv package to create a cdf package that you could install. We generally don't supply these cdf packages via biocLite(), because they are usually 'unsupported'. If Affy won't support them, then we don't want to either.
I wouldn't normally recommend using the unsupported CDF files and the affy or gcrma pipeline (and I wouldn't recommend gcrma for these anyway), primarily because you have purchased these arrays that are supposed to do all this sweet transcript splice aware analysis business, but then you are treating them like a dumb Gene ST array. Why buy the cool new thing if you aren't going to take advantage of its cool new abilities? But maybe the Affy rep was trying to hit his (or her) numbers, and gave your PI a sweet deal, just to offload some of these things.
If I were you, I would probably get Affy's Expression Console and Transcript Analysis Console software (here: http://www.affymetrix.com/estore/browse/level_seven_software_products_only.jsp?productId=131414#1_1 and here: http://www.affymetrix.com/estore/browse/level_seven_software_products_only.jsp?productId=prod760001#1_1) and analyze using that.
If you really insist on using Bioconductor, then download one of the CDF packages, and use the makecdfenv package to create a cdf package, and process using affy. To make the cdf package you want to do something like:
library(makecdfenv)
make.cdf.package(""MTA-1_0.r1.gene.cdf", species = "Mus_musculus")
# wait for like a really long time
install.packages("mta10.r1.genecdf/", repos = NULL, type = "source")
library(affy)
library(mta10.r1.genecdf)
dat <- ReadAffy()
annotation(dat) <- "mta10.r1.genecdf"
eset <- rma(dat)
<other analysis steps go here>
Hi James,
I received a dataset processed on a GeneChip® Mouse Transcriptome Assay 1.0 chip. I ran into the same problem; no annotation file for this platform in Bioconductor.
I followed your instruction above but I received an error message at the end. I would like to make an "eset" object to have all the project related info (expression, annotation, phenotype) in one place.
Would you please advise how to make it work.
Thanks a lot.
Anita
Procedure:
cdf file is downloaded from the Affy website.
setwd("~/project/data_processing_analysis/annotation_files/MTA-1_0.r1.gene_cdf")
make.cdf.package("MTA-1_0.r1.gene.cdf", species = "Mus_musculus") Reading CDF file. Creating CDF environment Wait for about 713 dots................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................ Creating package in C:/Users/alakatos/Documents/project/data_processing_analysis/annotation_files/MTA-1_0.r1.gene_cdf/mta10.r1.genecdf
README PLEASE: A source package has now been produced in C:/Users/alakatos/Documents/project/data_processing_analysis/annotation_files/MTA-1_0.r1.gene_cdf/mta10.r1.genecdf. Before using this package it must be installed via 'R CMD INSTALL' at a terminal prompt (or DOS command shell). If you are using Windows, you will need to get set up to install packages. See the 'R Installation and Administration' manual, specifically Section 6 'Add-on Packages' as well as 'Appendix E: The Windows Toolset' for more information. Alternatively, you could use make.cdf.env(), which will not require you to install a package. However, this environment will only persist for the current R session unless you save() it.
Warning messages: 1: In cbind(pm = pm, mm = mm) : number of rows of result is not a multiple of vector length (arg 1) 2: In cbind(pm = pm, mm = mm) : number of rows of result is not a multiple of vector length (arg 1) install.packages("mta10.r1.genecdf/", repos = NULL, type = "source") Installing package into ‘C:/Users/alakatos/Documents/R/win-library/3.1’ (as ‘lib’ is unspecified) * installing *source* package 'mta10.r1.genecdf' ... ** R ** data ** preparing package for lazy loading ** help *** installing help indices ** building package indices ** testing if installed package can be loaded *** arch - i386 *** arch - x64 * DONE (mta10.r1.genecdf) setwd("~/project/data_processing_analysis/cel_files") library(affy)
library(mta10.r1.genecdf)
dat <- ReadAffy()
annotation(dat) <- "mta10.r1.genecdf" eset <- rma(dat) Error in getCdfInfo(object) : Could not obtain CDF environment, problems encountered: Specified environment does not contain MTA-1_0 Library - package mta10cdf not installed Bioconductor - mta10cdf not available
My bad. It's not the annotation that matters, but the cdfName. You can specify that as part of the pipeline:
Thank you.
Anita
Hi James,
Sorry, I ran into another error.
>dat <- ReadAffy(cdfname = "mta10.r1.genecdf")
>dat AffyBatch object size of arrays=2572x2680 features (25 kb) cdf=mta10.r1.genecdf (71293 affyids) number of samples=19 number of genes=71293 annotation=mta10.r1.genecdf notes=
eset <- rma(dat) Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘rma’ for signature ‘"AffyBatch"’
Would you please advise?
Thank you,
Anita
I solved it. I needed to detach package "oligo".
Thanks,
Anita
I followed this whole post, but I still get the error:
I would appraciate any hint on this issue.
Thanks a lot in advance
Well, I would say the message is quite clear: using the library
affy
for the analyses of these new, complex arrays is not recommended; you rather should useoligo
orxps
.If using
oligo
make sure to also have the required MTA platform design installed, which is available here. Please note that at the time this thread was started by the OP (>20 months ago), the PDInfo package for this array wasn't available, but it is now! Also be sure to carefully read the answer of James (2nd post in this thread).Depending on how you summarize the normalized probe into probe-set data (at the level of exons or transcript clusters), you may want to use this or this annotation library.
In the Bioconductor Channel at the Faculty of 1000 (F1000) a very nice complete workflow has been published on the analysis of Affymetrix arrays, which also addresses the normalization and summarization aspects mentioned above (and much more!) Although not all of the workflow may be of direct relevance to you, it is an excellent read. Find it here!