Question: Help with diff. expression analysis of Agilent files
0
gravatar for bhgyu
2.5 years ago by
bhgyu30
bhgyu30 wrote:

Hi there, 

I've got an Agilent expression dataset which I'm trying to analyse using limma. 

Here are the issues:

1. I cannot read.maimages(agilentdataset) as suggested in the user guide. That prompts the following error:

Error in readGenericHeader(fullname, columns = columns, sep = sep) : 
  Specified column headings not found in file
In addition: Warning message:
In readLines(con, n = 1) :
  incomplete final line found on 'GSE85426_raw_data_original2.txt'

2. I seem to be able to only add the annotation from a separate annotation file by merging the data.frames. However, then I'm stuck at n.sv and lmfit() which it won't let me do, because the data.frame isn't a suitable format.

n.sv = num.sv(norm_merged,mod,method="leek")
Error in diag(dims[2]) - mod %*% solve(t(mod) %*% mod) %*% t(mod) : 
  non-conformable arrays

3. I've successfully turned the data set into an ExpressionSet using

eset <- ExpressionSet(exprs, phenoData=pdata)

However, olgio::rma() won't work on this. If I normalise and bg correct it using limma's features, I get data frames which once again won't work with lmfit()

4. I've also tried library(coRNAi) to turn it into a working file for lmfit.

norm_merged_d2 <- df2lmFit(norm_merged)
Error in df2lmFit(norm_merged) : 
  no residuals, need to estimate main effects and update dataframe

 

All in all, I cannot seem to find a suitable workaround to normalize and subsequently combat and lmfit this dataset, let alone successfully add the annotation file. The annotation is however the least of my worries right now, since merging the data frames by row.names does work and hence could be done after analysis to find out the names of the top 1000 DE genes. 

Does anyone have a suitable workaround of this? I apologise for the long description. I've been trying stuff on this for 3 days now, so appreciate any and all hints/help/links. 

ADD COMMENTlink modified 2.5 years ago by Axel Klenk940 • written 2.5 years ago by bhgyu30
Answer: Help with diff. expression analysis of Agilent files
0
gravatar for Axel Klenk
2.5 years ago by
Axel Klenk940
Switzerland
Axel Klenk940 wrote:

Dear annecomplete,

don't try too many packages -- the complete analysis can be done with limma.

Unfortunately, you do not give us your call to read.maimages() and we don't know which columns it cannot find in your Agilent files. At some point, Agilent introduced a reduced ("lean" IIRC) file format which could cause this error, depending on the type of file you are trying to read. So, please: show us your commands.

(I wouldn't be concerned about the warning; I remember seeing it without doing any harm.)

Once you can properly import your data, it's just a matter of following an example from the limma users's guide.

Cheers,

 - axel

ADD COMMENTlink modified 2.5 years ago by Gordon Smyth39k • written 2.5 years ago by Axel Klenk940

Hi Axel, thanks for the info above. 

Let's start with just reading the data in. This is what I do and get:

x <- read.maimages("GSE85426_raw_data.txt",source="agilent")
Error in readGenericHeader(fullname, columns = columns, sep = sep) : 
  Specified column headings not found in file
In addition: Warning message:
In readLines(con, n = 1) :
  incomplete final line found on 'GSE85426_raw_data.txt'
ADD REPLYlink written 2.5 years ago by bhgyu30

GSE85426 appears to be Cy3/green only. Have you tried

x <- read.maimages("GSE85426_raw_data.txt",source="agilent", green.only = TRUE)

?

ADD REPLYlink written 2.5 years ago by Axel Klenk940

Yes, same error code.

ADD REPLYlink written 2.5 years ago by bhgyu30

Ok, I have downloaded that file and taken a look at it and it is not Agilent format at all :-( but

rather a tab-delimited textfile with one column <slide_id>_<array_position>.txt:gProcessedSignal

per sample.

You *could* try to read it using read.delim() but I do not see any advantage over using

library("GEOquery")

x <- getGEO("GSE85426")

and then x[[1]] is a nice and useful ExpressionSet (thanks to Sean Davis).

 

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Axel Klenk940

Hi Axel, first of all thanks for taking the time to check this out. 

The above did work for me, but downloaded just the SDRF file. There's actually no expression data in that expression set. When I try to fetch the actual data set it returns:

x <- getGEO("GSE85426_raw_data.txt.gz")
https://ftp.ncbi.nlm.nih.gov/geo/series/GSE85426_RAW_DATA.TXT.GZ/GSE85426_RAW_DATA.TXT.GZ/matrix/
Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x),  : 
  'data' must be of a vector type, was 'NULL'

 

ADD REPLYlink written 2.5 years ago by bhgyu30

Just FYI, this seemed to work:

eset <- ExpressionSet(assayData=eset, phenotData=pheno)

It then just leaves the question of how to annotate given that there is not annotation package for this Agilent chip and all I have is a nice txt file with the IDs and matching Gene names. All package builders require some sort of db package which I'm not sure exists...

ADD REPLYlink written 2.5 years ago by bhgyu30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 380 users visited in the last hour