Question: Processing agilent data by limma
14 months ago by
India
Agaz Hussain Wani260 wrote:

I am trying to process Agilent data by using limma R package. For GSE10469, I used the following code

raw_data <- read.maimages(pdata[,1], source = "agilent") # pdata file is having group information
I get the error:
Specified column headings not found in file

When I try

raw_data <- read.maimages(pdata[,1], source = "agilent", green.only = TRUE)
Error in RG[[a]][, i] <- obj[, columns[[a]]] :
number of items to replace is not a multiple of replacement length

And also

raw_data <- read.maimages(pdata[,1], source = "agilent", green.only = FALSE)
Specified column headings not found in file

For  GSE32006

raw_data <- read.maimages(pdata[,1], source = "agilent")
Specified column headings not found in file

And

raw_data <- read.maimages(pdata[,1], source = "agilent", green.only = TRUE)
Specified column headings not found in file

The files which are read from GSE32006 are gene expression, where as other failed files are from exon array.

So how can I deal with all these issues.

limma agilent microarrays • 355 views
modified 14 months ago by Gordon Smyth37k • written 14 months ago by Agaz Hussain Wani260

Code snippets are not useful! Unless you show exactly what you did, you are expecting people to guess at what you might have done, and most people are too busy to bother with such things. You need to show a short, self-contained (e.g., anybody can run) bit of code to show exactly what you did and where the error is.

Answer: Processing agilent data by limma
14 months ago by
Gordon Smyth37k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth37k wrote:

The first problem occurs when you try to read in single-channel data as if it was two color. The read.maimages() function requires that you tell it explicitly to read in the green channel only by specifying green.only=TRUE.

The second problem occurs when people upload "raw" data files to GEO that have been edited or corrupted, and are therefore no longer in proper Agilent format.

In the case of GSE10469, it appears that someone (one of the authors presumably) has opened the first file GSM264878.txt in Excel, then written it out again but now with extra rows and an extra column. The other files are ok. You can fix the problem simply by changing the order of the files when you read them in, so that GSM264878 is not the first file:

> files
[1] "GSM264878.txt.gz" "GSM264879.txt.gz" "GSM264880.txt.gz" "GSM264881.txt.gz"
[5] "GSM264882.txt.gz" "GSM264883.txt.gz" "GSM264884.txt.gz" "GSM264885.txt.gz"
[9] "GSM264886.txt.gz" "GSM264887.txt.gz" "GSM264888.txt.gz" "GSM264889.txt.gz"
> x <- read.maimages(files[c(2,1,3:12)], source="agilent", green.only=TRUE)


In the case of GSE32006, you can't expect to read in gene expression and exon arrays with the same read command because they have different probe sets. You naturally have to read and analyse the gene arrays and the exon arrays separately.

Thank you very much for the answer. I tried reading the exon arrays from GSE32006 seprately,

raw_data <- read.maimages(pdata[,1], source = "agilent", green.only = TRUE)

but I get the same issue

Error in readGenericHeader(fullname, columns = columns, sep = sep) :
Specified column headings not found in file

However, when I run the gene arrays seprately, it worked fine.

1

For some reason, the exon arrays have been hybridized using the Cy5 (red) channel instead of Cy3 (green). To read them, you'll have to use a trick to tell read.maimages() to use the red channel only:

x <- read.maimages(files, source="agilent", green.only=TRUE,
columns=list(G="rMedianSignal",Gb="rBGMedianSignal"))