mta10cdf not installed Bioconductor
2
0
Entering edit mode
duan • 0
@duan-7063
Last seen 6.9 years ago
United States

Hi,

I got the following error:

> eset <- gcrma(ab)
Error in getCdfInfo(object) :
Could not obtain CDF environment, problems encountered:
Specified environment does not contain MTA-1_0 Library - package
mta10cdf not installed Bioconductor - mta10cdf not available

Thank you

gcrma • 2.4k views
2
Entering edit mode
@james-w-macdonald-5106
Last seen 11 hours ago
United States

I'm not sure you want to be using gcrma() on these arrays, but it looks like currently there is no 'clean' way to analyze these arrays using Bioconductor. Let me explain.

First a bit of background. These arrays are Affymetrix's answer to RNA-Seq taking market share from them, and are intended to allow one to measure differential transcript splicing, just like you can (hypothetically) do with RNA-Seq. They do this by adding a bunch of probes that are intended to span known exon-exon junctions, so you can get a set of exon-level measurements, along with measures of exon-exon junctions, from which one could then hypothetically infer what transcripts are being expressed in a given sample type, along with relative frequencies. This information could then be used to infer differences in both transcript abundance between sample types, as well as differences in the abundances of different splice variants.

Affymetrix has their own Transcript Analysis Console software that is supposed to help you do this sort of thing, which might be the way to go for now.

In Bioconductor, the only package that is capable of dealing with these arrays in a reasonable manner is oligo. However, when I try to build a pdInfoPackage from the various files that Affymetrix supply for this array, it fails because they have 811 probesets in their probeset annotation csv file that are not found in the pgf and clf files. For the uninitiated, that is a bunch of blahblahblah, that just boils down to the fact that there is an inconsistency between the various files that Affymetrix supply that cannot be resolved right now without making possibly unwarranted assumptions. Without a pdInfoPackage, you cannot process the files using oligo, so there you go.

Affymetrix does apparently have two CDF files that could be used to process these data using the old makecdfenv/affy or gcrma pipelines. They have an exon-level cdf and a transcript-level cdf. You could download one or both of these CDF files (found here: http://www.affymetrix.com/estore/catalog/prod870009/AFFY/Mouse+Transcriptome+Assay+1.0#1_3), which will require a free registration with Affy, and use the makecdfenv package to create a cdf package that you could install. We generally don't supply these cdf packages via biocLite(), because they are usually 'unsupported'. If Affy won't support them, then we don't want to either.

I wouldn't normally recommend using the unsupported CDF files and the affy or gcrma pipeline (and I wouldn't recommend gcrma for these anyway), primarily because you have purchased these arrays that are supposed to do all this sweet transcript splice aware analysis business, but then you are treating them like a dumb Gene ST array. Why buy the cool new thing if you aren't going to take advantage of its cool new abilities? But maybe the Affy rep was trying to hit his (or her) numbers, and gave your PI a sweet deal, just to offload some of these things.

If I were you, I would probably get Affy's Expression Console and Transcript Analysis Console software (here: http://www.affymetrix.com/estore/browse/level_seven_software_products_only.jsp?productId=131414#1_1 and here: http://www.affymetrix.com/estore/browse/level_seven_software_products_only.jsp?productId=prod760001#1_1) and analyze using that.

If you really insist on using Bioconductor, then download one of the CDF packages, and use the makecdfenv package to create a cdf package, and process using affy. To make the cdf package you want to do something like:

library(makecdfenv)

make.cdf.package(""MTA-1_0.r1.gene.cdf", species = "Mus_musculus")

# wait for like a really long time

install.packages("mta10.r1.genecdf/", repos = NULL, type = "source")

library(affy)

library(mta10.r1.genecdf)

annotation(dat) <- "mta10.r1.genecdf"

eset <- rma(dat)

<other analysis steps go here>
0
Entering edit mode

Hi James,

I received a dataset processed on a  GeneChip® Mouse Transcriptome Assay 1.0 chip. I ran into the same problem; no  annotation file for this platform in Bioconductor.

I followed your instruction above but I received an error message at the end. I would like to make an "eset" object to have all the project related info  (expression, annotation, phenotype) in one place.

Thanks a lot.

Anita

Procedure:

setwd("~/project/data_processing_analysis/annotation_files/MTA-1_0.r1.gene_cdf")
 make.cdf.package("MTA-1_0.r1.gene.cdf", species = "Mus_musculus")
Creating CDF environment
Creating package in C:/Users/alakatos/Documents/project/data_processing_analysis/annotation_files/MTA-1_0.r1.gene_cdf/mta10.r1.genecdf

README PLEASE:
A source package has now been produced in
C:/Users/alakatos/Documents/project/data_processing_analysis/annotation_files/MTA-1_0.r1.gene_cdf/mta10.r1.genecdf.
Before using this package it must be installed via 'R CMD INSTALL'
at a terminal prompt (or DOS command shell).
If you are using Windows, you will need to get set up to install packages.
See the 'R Installation and Administration' manual, specifically
Section 6 'Add-on Packages' as well as 'Appendix E: The Windows Toolset'

Alternatively, you could use make.cdf.env(), which will not require you to install a package.
However, this environment will only persist for the current R session
unless you save() it.

Warning messages:
1: In cbind(pm = pm, mm = mm) :
number of rows of result is not a multiple of vector length (arg 1)
2: In cbind(pm = pm, mm = mm) :
number of rows of result is not a multiple of vector length (arg 1)

install.packages("mta10.r1.genecdf/", repos = NULL, type = "source")

Installing package into ‘C:/Users/alakatos/Documents/R/win-library/3.1’

(as ‘lib’ is unspecified)
* installing *source* package 'mta10.r1.genecdf' ...
** R
** data
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
*** arch - i386
*** arch - x64
* DONE (mta10.r1.genecdf)
setwd("~/project/data_processing_analysis/cel_files")
library(affy)

library(mta10.r1.genecdf)
dat <- ReadAffy()
annotation(dat) <- "mta10.r1.genecdf"
eset <- rma(dat)
Error in getCdfInfo(object) :
Could not obtain CDF environment, problems encountered:
Specified environment does not contain MTA-1_0
Library - package mta10cdf not installed
Bioconductor - mta10cdf not available
0
Entering edit mode

My bad. It's not the annotation that matters, but the cdfName. You can specify that as part of the pipeline:

dat <- ReadAffy(cdfname = "mta10.r1.genecdf")

eset <- rma(dat)

0
Entering edit mode

Thank you.

Anita

0
Entering edit mode

Hi James,

Sorry, I ran into another error.

>dat <- ReadAffy(cdfname = "mta10.r1.genecdf")
>dat
AffyBatch object
size of arrays=2572x2680 features (25 kb)
cdf=mta10.r1.genecdf (71293 affyids)
number of samples=19
number of genes=71293
annotation=mta10.r1.genecdf
notes=
eset <- rma(dat)
Error in (function (classes, fdef, mtable)  :
unable to find an inherited method for function ‘rma’ for signature ‘"AffyBatch"’


Thank you,

Anita

0
Entering edit mode

I solved it. I needed to detach  package "oligo".

Thanks,

Anita

0
Entering edit mode

I followed this whole post, but I still get the error:

Error:

The affy package is not designed for this array type.
Please use either the oligo or xps package.

I would appraciate any hint on this issue.

0
Entering edit mode

Well, I would say the message is quite clear: using the library affy for the analyses of these new, complex arrays is not recommended; you rather should use oligo or xps.

If using oligo make sure to also have the required MTA platform design installed, which is available here. Please note that at the time this thread was started by the OP (>20 months ago), the PDInfo package for this array wasn't available, but it is now! Also be sure to carefully read the answer of James (2nd post in this thread).

Depending on how you summarize the normalized probe into probe-set data (at the level of exons or transcript clusters), you may want to use this or this annotation library.

In the Bioconductor Channel at the Faculty of 1000 (F1000) a very nice complete workflow has been published on the analysis of Affymetrix arrays, which also addresses the normalization and summarization aspects mentioned above (and much more!) Although not all of the workflow may be of direct relevance to you, it is an excellent read. Find it here!

0
Entering edit mode
jacorvar ▴ 40
@jacorvar-8972
Last seen 8 months ago
European Union