The support.bioconductor.org editor has been updated to markdown! Please see more info at: Tutorial: Updated Support Site Editor

Question: Mismatched ArrayExpress microarray annotation package
0
gravatar for sandmann.t
20 months ago by
sandmann.t40
sandmann.t40 wrote:

Dear Bioconductors,

I am contacting you because you are listed as the maintainer of the ArrayExpress Bioconductor ArrayExpress package. I tried to use the ArrayExpress function to access a large dataset stored in ArrayExpress: "E-GEOD-5258"

library("ArrayExpress")
GEOD5258.batch <- ArrayExpress( "E-GEOD-5258" )

The function downloads all of the necessary files, but then exits with the following message:

ArrayExpress: Reading data files
Loading required package: pd.u133aaofav2
Attempting to obtain 'pd.u133aaofav2' from BioConductor website.
Checking to see if your internet connection works...
Package 'pd.u133aaofav2' was not found in the BioConductor repository.
The 'pdInfoBuilder' package can often be used in situations like this.
Error in oligo::read.celfiles(filenames = file.path(path, unique(files))) : 
  The annotation package, pd.u133aaofav2, could not be loaded.
In addition: Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
  there is no package called ‘pd.u133aaofav2’
Error in readAEdata(path = path, files = dataFiles, dataCols = dataCols,  : 
  Unable to read cel files in/tmp/RtmpdN8fuF

The "E-GEOD-5258" dataset contains Affymetrix microarray data from two different array types: A-Affy-113 and A-Affy-33. The former is causing the problem, because its annotation cannot be found in Bioconductor under its original name. Instead, annotations for this microarray are available under the name "hthgu133a" .

I am not the only user who has run into this problem, see e.g.Help with error: 'pd.u133aaofav2 was not found in the BioConductor repository'.

Unfortunately the ArrayExpress function does not have any arguments that would allow me to manually set the array annotation package. Perhaps that feature would be worth adding?

Many thanks in advance,

Thomas

arrayexpress hthgu133a.db • 381 views
ADD COMMENTlink modified 20 months ago • written 20 months ago by sandmann.t40
Answer: Mismatched ArrayExpress microarray annotation package
2
gravatar for sandmann.t
20 months ago by
sandmann.t40
sandmann.t40 wrote:

For anybody else who get's stuck, here is a workaround for the E-GEOD-5258 Connectivity Map dataset, which could also be applied to other datasets as well:

library(ArrayExpress)
# global variables
kAccession <- "E-GEOD-5258"
kDataDir <- "~/data_dir"

# retrieve the raw data from ArrayExpress and place them into kDataDir
# (This will download several GB of data.)
dir.create(kDataDir)
mex = ArrayExpress::getAE(kAccession, type = "full", path = kDataDir)

# The following 'ae2bioc' command fails
# mex_raw = ArrayExpress::ae2bioc(mageFiles = mex, )  # ERROR
# Instead, we need the sample annotation table from ArrayExpress, which
# lists the array type for each CEL file as well.
phenoData <- read.delim(
  "https://www.ebi.ac.uk/arrayexpress/files/E-GEOD-5258/E-GEOD-5258.sdrf.txt",
  stringsAsFactors = FALSE)

# As expected, there are results from two different array types
table(phenoData$Array.Design.REF)  # A-AFFY-113: 218 arrays, A-AFFY-33> 346 arrays

# We read the  raw data fromCEL files into AffyBatch objects, separately for
# each array type.
library(affy)
array_designs <- unique(phenoData$Array.Design.REF)
GEOD5258.batch <- lapply(
  X = setNames(array_designs, array_designs),
  FUN = function(design) {
    cel_files <- subset(phenoData, Array.Design.REF == design)$Array.Data.File
    pdata <- as(phenoData[match(cel_files, phenoData$Array.Data.File), ],
                "AnnotatedDataFrame")
    row.names(pdata) <- cel_files
    read.affybatch(filenames = file.path(kDataDir, cel_files),
                            phenoData = pdata)
  })

# The hthgu133a.db Bioconductor package contains the 
# current annotations fothe A-AFFY-113 array design.
annotation(GEOD5258.batch[["A-AFFY-113"]]) <- "hthgu133a"

# Now, we can generate RMA summaries & quantile normalized data for
# each array type.
library("hthgu133a.db")  # annotations for A-AFFY-133
library("hgu133a.db")  # annotation for A-AFFY-33
GEOD5258.rma <- lapply(GEOD5258.batch, rma)
GEOD5258.rma  # list of two ExpressionSets
ADD COMMENTlink written 20 months ago by sandmann.t40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 279 users visited in the last hour