Question: Reading Illumina HT12 V4.0 Data from GEO into Lumi
0
2.1 years ago by
aaronrosenstein0 wrote:

Hello,

I am relatively new to preprocessing microarray data, and am trying to analyze the GEO dataset "GSE56045". I downloaded the supplementary RAW files to manipulate with lumi, however the file format does not seem to be compatible with the lumiR function. The header of the RAW file is as follows, if this helps:

? Illumina, Inc.
[Heading]
Date    15/4/2010
ContentVersion    4.0
FormatVersion    1.0.0
Number of Probes    47231
Number of Controls    887
[Probes]

When i call the lumiR function, the error message is:

"Error in gregexpr("\t", dataLine1)[[1]] : subscript out of bounds"

This confuses me because the file appears to be a tab separated document.

Is this data in a format readable by lumi? should I use a different package instead?

ADD COMMENTlink
modified 2.1 years ago by Gordon Smyth39k • written 2.1 years ago by aaronrosenstein0
Answer: Reading Illumina HT12 V4.0 Data from GEO into Lumi
1
2.1 years ago by
Gordon Smyth39k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth39k wrote:

The naming of the GEO series supplementary files is somewhat misleading. I guess you are trying to read the file GSE56045_RAW.tar, but that actually contains Illumina Bead Manifest files, which give probe annotation rather than expression data. The raw expression data is instead in the file GSE56045_non_normalized.txt.gz.

I was able to read the data using the limma package:

> library(limma)
> x <- read.ilmn("GSE56045_non_normalized.txt.gz",probeid="ID_REF",expr="intensity",other.columns="detection")
Reading file GSE56045_non_normalized.txt.gz ... ...
> dim(x)
[1] 48164  1202
> x[1:5,1:5]
An object of class "EListRaw"
$source [1] "illumina"$E
100001    100002   100003   100004   100005
ILMN_1762337 26.40536  28.34256 61.83844 32.21310 11.21891
ILMN_2055271 49.77552 104.60300 94.35043 58.13754 42.71157
ILMN_1736007 28.54197  36.64471 34.84822 26.64572 16.58674
ILMN_2383229 36.51273  16.37690 45.85955 30.05022  6.72389
ILMN_1806310 23.35780  21.99633 52.21932 31.46063 18.65642

$other$detection
100001    100002      100003      100004      100005
ILMN_1762337 0.349350700 0.4285714 0.227272700 0.225974000 0.668831200
ILMN_2055271 0.006493506 0.0000000 0.009090909 0.003896104 0.005194805
ILMN_1736007 0.266233800 0.1870130 0.779220800 0.436363600 0.327272700
ILMN_2383229 0.075324680 0.8922078 0.546753200 0.307792200 0.916883100
ILMN_1806310 0.472727300 0.7000000 0.406493500 0.251948100 0.244155800


The data can then be background corrected and normalized by neqc() using the detection p-values:

> y <- neqc(x)
Note: inferring mean and variance of negative control probe intensities from the detection p-values.


Note that this is how Reynolds et al (2014) processed the data also, as you can read from the description of the data processing on GEO.

ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by Gordon Smyth39k

Thank you so much Gordon! I really appreciate it!

ADD REPLYlink written 2.1 years ago by aaronrosenstein0

Dear Gordon,

Sorry to bother you but if you can take a look at my post and can give me any suggestion, should be helpful. https://support.bioconductor.org/p/125225/

Thank you,

ADD REPLYlink written 10 weeks ago by FL5120
Please log in to add an answer.

Content
Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 177 users visited in the last hour