Question

processing agilent data by limma

0

Entering edit mode

Agaz Hussain Wani ▴ 260

@agaz-hussain-wani-7620

Last seen 6.6 years ago

India

I am trying to perform differential expression of Agilent data GSE9210 by using limma. Following is the example code

target = file.path("/path/to/file")
SDRF <- read.delim(target,check.names=FALSE,stringsAsFactors=FALSE)
x <- read.maimages(SDRF[,"File"], source="agilent", green.only=TRUE )

Some of the problems:

1) There are 58 sample in the study, and it processes only 50 samples from GSM232970-GSM233019.

On reaching sample GSM233020, it gives an error

Error in RG[[a]][, i] <- obj[, columns[[a]]] :
  number of items to replace is not a multiple of replacement length

May be 1) is related to File format for single channel analysis of Agilent microarray data with Limma?

Is there any way to process samples from a single study in one batch.

2) I tried to process the files in batches, means GSM232970-GSM233019

in one batch and the remaining in another batch that worked fine, but it generates character type for second batch for which I am not able to run

y <- backgroundCorrect(x, method="normexp"), which generates Error in E -Eb : non-numeric argument to binary operator

which is an obvious reason. Why it generates character than numeric. Example is shown below.

an object of class "EListRaw"
$E
     GSM233020 GSM233021
[1,] "1452"    "1373.5"
[2,] "77"      "109.5"  
[3,] "86"      "131"    
[4,] "320"     "898"    
[5,] "137.5"   "236"    
14904 more rows ...

$Eb
     GSM233020 GSM233021
[1,] "45"      "50"     
[2,] "44"      "51"     
[3,] "44"      "51"     
[4,] "44"      "52"     
[5,] "45"      "52"     
14904 more rows ...

$targets
               FileName
GSM233020 GSM233020.txt
GSM233021 GSM233021.txt

$genes
  Row Col ProbeUID ControlType    ProbeName        GeneName  SystematicName
1   1   1        0           1 BrightCorner    BrightCorner    BrightCorner
2   1   2        1          -1    (-)3xSLv1 NegativeControl NegativeControl
3   1   3        2           0 A_23_P146576       NM_021996       NM_021996
4   1   4        3           0 A_23_P125016    A_23_P125016    A_23_P125016
5   1   5        4           0  A_23_P28555          STAMBP       NM_006463
                                                                         Description
1                                                                                   
2                                                                                   
3 Homo sapiens globoside alpha-1,3-N-acetylgalactosaminyltransferase 1 (GBGT1), mRNA
4                                                                            Unknown
5             Homo sapiens STAM binding protein (STAMBP), transcript variant 1, mRNA
14904 more rows ...

$source
[1] "agilent"

limma agilent microarrays • 1.5k views

ADD COMMENT • link updated 6.8 years ago by Gordon Smyth 51k • written 6.8 years ago by Agaz Hussain Wani ▴ 260

score 1 · Answer 1 · 2018-02-12

1

Entering edit mode

James W. MacDonald 67k

@james-w-macdonald-5106

Last seen 5 hours ago

United States

Providing stylized code that isn't really what you ran is not helpful. Unless you really have a path on your computer called /path/to/file, which would be sort of cool.

As for question #1, you get an error reading in a file. Did you try reading in just that one file? How about all files but that one? I find that I cannot read that one file in, but I can read in all the other files, which doesn't seem like a question for Bioconductor, but instead for the GEO curators.

ADD COMMENT • link 6.8 years ago James W. MacDonald 67k

0

Entering edit mode

I tried reading all the files at once, and it stoped at GSM233020.txt as mentioned above.
I can read the sample GSM233020.txt, by x <- read.maimages(SDRF[,"File"][51], source="agilent", green.only=TRUE ) but it generated the data as shown above. Even if I run from that sample saperately, it read the files x <- read.maimages(SDRF[,"File"][51:57], source="agilent", green.only=TRUE ).

ADD REPLY • link 6.8 years ago Agaz Hussain Wani ▴ 260

1

Entering edit mode

No, you can't read GSM233020.txt. Sure, you get a result, but the result is wrong. As I told you in my answer, the end of that file is corrupted.

ADD REPLY • link 6.8 years ago Gordon Smyth 51k

0

Entering edit mode

Thank you for providing the useful information.

ADD REPLY • link 6.8 years ago Agaz Hussain Wani ▴ 260

score 1 · Answer 2 · 2018-02-13

The file GSM233020.txt on GEO is corrupted, and therefore can't be read correctly by limma or any other program. As James has said, all the other files for that GEO series are fine

I suggest you write to the original authors of the series and ask for the correct raw data file. Alternatively, just omit that file and read all the others.