Search
Question: Reading Illumina HT12 V4.0 Data from GEO into Lumi
0
13 months ago by
aaronrosenstein0 wrote:

Hello,

I am relatively new to preprocessing microarray data, and am trying to analyze the GEO dataset "GSE56045". I downloaded the supplementary RAW files to manipulate with lumi, however the file format does not seem to be compatible with the lumiR function. The header of the RAW file is as follows, if this helps:

? Illumina, Inc.
Date    15/4/2010
ContentVersion    4.0
FormatVersion    1.0.0
Number of Probes    47231
Number of Controls    887
[Probes]

When i call the lumiR function, the error message is:

"Error in gregexpr("\t", dataLine1)[[1]] : subscript out of bounds"

This confuses me because the file appears to be a tab separated document.

Is this data in a format readable by lumi? should I use a different package instead?

modified 13 months ago by Gordon Smyth35k • written 13 months ago by aaronrosenstein0
1
13 months ago by
Gordon Smyth35k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth35k wrote:

The naming of the GEO series supplementary files is somewhat misleading. I guess you are trying to read the file GSE56045_RAW.tar, but that actually contains Illumina Bead Manifest files, which give probe annotation rather than expression data. The raw expression data is instead in the file GSE56045_non_normalized.txt.gz.

I was able to read the data using the limma package:

> library(limma)
> dim(x)
[1] 48164  1202
> x[1:5,1:5]
An object of class "EListRaw"
$source [1] "illumina"$E
100001    100002   100003   100004   100005
ILMN_1762337 26.40536  28.34256 61.83844 32.21310 11.21891
ILMN_2055271 49.77552 104.60300 94.35043 58.13754 42.71157
ILMN_1736007 28.54197  36.64471 34.84822 26.64572 16.58674
ILMN_2383229 36.51273  16.37690 45.85955 30.05022  6.72389
ILMN_1806310 23.35780  21.99633 52.21932 31.46063 18.65642

$other$detection
100001    100002      100003      100004      100005
ILMN_1762337 0.349350700 0.4285714 0.227272700 0.225974000 0.668831200
ILMN_2055271 0.006493506 0.0000000 0.009090909 0.003896104 0.005194805
ILMN_1736007 0.266233800 0.1870130 0.779220800 0.436363600 0.327272700
ILMN_2383229 0.075324680 0.8922078 0.546753200 0.307792200 0.916883100
ILMN_1806310 0.472727300 0.7000000 0.406493500 0.251948100 0.244155800


The data can then be background corrected and normalized by neqc() using the detection p-values:

> y <- neqc(x)
Note: inferring mean and variance of negative control probe intensities from the detection p-values.


Note that this is how Reynolds et al (2014) processed the data also, as you can read from the description of the data processing on GEO.