Question

Help with diff. expression analysis of Agilent files

0

Entering edit mode

bhgyu ▴ 30

@bhgyu-13069

Last seen 5.8 years ago

Hi there,

I've got an Agilent expression dataset which I'm trying to analyse using limma.

Here are the issues:

1. I cannot read.maimages(agilentdataset) as suggested in the user guide. That prompts the following error:

Error in readGenericHeader(fullname, columns = columns, sep = sep) : 
  Specified column headings not found in file
In addition: Warning message:
In readLines(con, n = 1) :
  incomplete final line found on 'GSE85426_raw_data_original2.txt'

2. I seem to be able to only add the annotation from a separate annotation file by merging the data.frames. However, then I'm stuck at n.sv and lmfit() which it won't let me do, because the data.frame isn't a suitable format.

n.sv = num.sv(norm_merged,mod,method="leek")
Error in diag(dims[2]) - mod %*% solve(t(mod) %*% mod) %*% t(mod) : 
  non-conformable arrays

3. I've successfully turned the data set into an ExpressionSet using

eset <- ExpressionSet(exprs, phenoData=pdata)

However, olgio::rma() won't work on this. If I normalise and bg correct it using limma's features, I get data frames which once again won't work with lmfit()

4. I've also tried library(coRNAi) to turn it into a working file for lmfit.

norm_merged_d2 <- df2lmFit(norm_merged)
Error in df2lmFit(norm_merged) : 
  no residuals, need to estimate main effects and update dataframe

All in all, I cannot seem to find a suitable workaround to normalize and subsequently combat and lmfit this dataset, let alone successfully add the annotation file. The annotation is however the least of my worries right now, since merging the data frames by row.names does work and hence could be done after analysis to find out the names of the top 1000 DE genes.

Does anyone have a suitable workaround of this? I apologise for the long description. I've been trying stuff on this for 3 days now, so appreciate any and all hints/help/links.

microarray agilent differential gene expression R • 1.8k views

ADD COMMENT • link updated 7.0 years ago by Axel Klenk ★ 1.0k • written 7.0 years ago by bhgyu ▴ 30

Gordon Smyth · Answer 1 · 2017-05-17

0

Entering edit mode

Axel Klenk ★ 1.0k

@axel-klenk-3224

Last seen 6 hours ago

UPF, Barcelona, Spain

Dear annecomplete,

don't try too many packages -- the complete analysis can be done with limma.

Unfortunately, you do not give us your call to read.maimages() and we don't know which columns it cannot find in your Agilent files. At some point, Agilent introduced a reduced ("lean" IIRC) file format which could cause this error, depending on the type of file you are trying to read. So, please: show us your commands.

(I wouldn't be concerned about the warning; I remember seeing it without doing any harm.)

Once you can properly import your data, it's just a matter of following an example from the limma users's guide.

Cheers,

- axel

ADD COMMENT • link updated 7.0 years ago by Gordon Smyth 50k • written 7.0 years ago by Axel Klenk ★ 1.0k

0

Entering edit mode

Hi Axel, thanks for the info above.

Let's start with just reading the data in. This is what I do and get:

x <- read.maimages("GSE85426_raw_data.txt",source="agilent")
Error in readGenericHeader(fullname, columns = columns, sep = sep) : 
  Specified column headings not found in file
In addition: Warning message:
In readLines(con, n = 1) :
  incomplete final line found on 'GSE85426_raw_data.txt'

ADD REPLY • link 7.0 years ago bhgyu ▴ 30

0

Entering edit mode

GSE85426 appears to be Cy3/green only. Have you tried

x <- read.maimages("GSE85426_raw_data.txt",source="agilent", green.only = TRUE)

?

ADD REPLY • link 7.0 years ago Axel Klenk ★ 1.0k

0

Entering edit mode

Yes, same error code.

ADD REPLY • link 7.0 years ago bhgyu ▴ 30

0

Entering edit mode

Ok, I have downloaded that file and taken a look at it and it is not Agilent format at all :-( but

rather a tab-delimited textfile with one column <slide_id>_<array_position>.txt:gProcessedSignal

per sample.

You *could* try to read it using read.delim() but I do not see any advantage over using

library("GEOquery")

x <- getGEO("GSE85426")

and then x[[1]] is a nice and useful ExpressionSet (thanks to Sean Davis).

ADD REPLY • link 7.0 years ago Axel Klenk ★ 1.0k

0

Entering edit mode

Hi Axel, first of all thanks for taking the time to check this out.

The above did work for me, but downloaded just the SDRF file. There's actually no expression data in that expression set. When I try to fetch the actual data set it returns:

x <- getGEO("GSE85426_raw_data.txt.gz")
https://ftp.ncbi.nlm.nih.gov/geo/series/GSE85426_RAW_DATA.TXT.GZ/GSE85426_RAW_DATA.TXT.GZ/matrix/
Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x),  : 
  'data' must be of a vector type, was 'NULL'

ADD REPLY • link 7.0 years ago bhgyu ▴ 30

0

Entering edit mode

Just FYI, this seemed to work:

eset <- ExpressionSet(assayData=eset, phenotData=pheno)

It then just leaves the question of how to annotate given that there is not annotation package for this Agilent chip and all I have is a nice txt file with the IDs and matching Gene names. All package builders require some sort of db package which I'm not sure exists...

ADD REPLY • link 7.0 years ago bhgyu ▴ 30