Question

Load CDF files for Affymetrix ST 2.0 microarrays

0

Entering edit mode

Nathaniel ▴ 20

@nathaniel-9283

Last seen 8.4 years ago

Denmark

I am analysing an Affymetrix Mogene 2.0 ST array, and I would like to collapse all cross-hybridizing probesets into a single transcript cluster. That is, reduce all rows with probesets mapping to different transcripts to a single row.

A first approach I thought was to annotate all probesets with the corresponding gene_id, using the probesets -> gene mapping that can be extracted from Biomart or the Affymetrix website. Then, for all probesets that are repeated, take only a random one or maybe the one with the maximum expression value.

However, I read about some CDF files which apparently have already done that, and they can be downloaded at: http://nmg-r.bioinformatics.nl/NuGO_R.html. I have never used them, and almost all tutorials are based on the old affymetrix 3' arrays, not the new ST ones.

To start with, I cannot figure out how to load such files in and use them to collapse my probeset-level expression matrix into a transcript-level expression matrix. btw, I am using 'oligo' for the processing and normalization. How could I do it? Can someone point me at a tutorial or documentation?

Thanks.

microarray oligo cdf customcdf • 2.5k views

ADD COMMENT • link updated 8.4 years ago by Guido Hooiveld ★ 3.9k • written 8.4 years ago by Nathaniel ▴ 20

score 0 · Answer 1 · 2015-11-28

0

Entering edit mode

Guido Hooiveld ★ 3.9k

@guido-hooiveld-2020

Last seen 11 hours ago

Wageningen University, Wageningen, the …

Hi,

We are hosting the 'some CDF files' you refer to. This is because we often use them in our research, and in the past the annotation for some of the custom (remapped) CDFs was lacking. Note that the latter is not the case anymore.

Please also be aware of the fact that the custom CDFs were originally created by Manhong Dai and Fan Meng of the MBNI at the Univeristy of Michigan, and that they still maintain them! FYI: last week Manhong somewhat silently released version 20 of the custom CDFs.

In principle you could still use the library affy to read the cell files with a custom CDF, despite the fact that the design of version 2 of the Gene ST arrays is not compatible anymore with affy. affy will therefore throw an error if you try to do so. Please see this post of James on why affy cannot handle the new arrays, and that you will have to use oligo or xps.

C: oligo package requires the annotation package pd.ht.mg.430.pm - where is it?

You will also find a link in that post that shows you how to analyze new arrays using a custom CDF.

Below some code to get you started normalizing (RMA) your arrays in oligo, provided you also have the appropriate probe design (PD) package installed. Also read the vignette!

setwd("/path/to/my/files")
library(oligo)
celFiles <- list.celfiles(full.names = TRUE)
affyExpressionFS <- read.celfiles(celFiles)

ppData <- rma(affyExpressionFS)

write.exprs(ppData, file="myFile.txt", sep="\t")

ADD COMMENT • link 8.4 years ago Guido Hooiveld ★ 3.9k

0

Entering edit mode

Thanks a lot for the answer Guido.

after some googling I realized that someone had already made the CDF files to work with 'oligo': https://bioconductor.org/packages/release/data/annotation/html/pd.mogene.2.0.st.html

Thus, I performed the analysis as follows:

library(oligo)

# Load the CEL files and the phenotypic data
celFiles <- list.celfiles("/direcotry", full.names = TRUE)
pheno_data <- read.AnnotatedDataFrame("/directory", header=TRUE, sep="\t")
affyRaw <- read.celfiles(celFiles, phenoData=pheno_data, verbose=FALSE, checkType=FALSE)
annotation(affyRaw) <- "pd.mogene.2.0.st"

# RMA normalization
affyNorm <- rma(affyRaw, target="core")     # 41345 features

So now I have a matrix with normalized expression values for each probeset. Where can I find the transcript annotation for each of these probesets? I thought these information was contained in the CDF given by pd.mogene.2.0.st? How can I extract it?

ADD REPLY • link 8.4 years ago Nathaniel ▴ 20

0

Entering edit mode

Guido - You may already know this, or maybe not. Benilton somewhat silently added code to pdInfoBuilder last release that allows you to generate a pdInfoPackage based on the MBNI CDFs that Manhong produces, so you can now use oligo for those as well.

ADD REPLY • link 8.4 years ago James W. MacDonald 65k

0

Entering edit mode

Thanks James for this info; I wasn't aware of that! However, I had a look at the vignette and help pages of PdInfoBuilder, but could not directly find more on this. I assume this is possible along the same line (s of code) you previously posted here: A: How to use brainarray custom cdf with oligo package? ??

ADD REPLY • link 8.4 years ago Guido Hooiveld ★ 3.9k

0

Entering edit mode

Exactly. I should congratulate you on your ability to find things on this site. I can't find things to save my life, so when people come up with just the exact post I am thinking of (that I cannot find) I am simply amazed.

And that is what I mean by 'somewhat silently'. There are some additions to the man page for makePdInfoPackage and chipName, but otherwise not so much. I am intending to add something but haven't had the time.

ADD REPLY • link 8.4 years ago James W. MacDonald 65k