Question

beadarray error importing from GEOquery

0

Entering edit mode

emiliomastriani • 0

@emiliomastriani-11983

Last seen 6.5 years ago

Hello guys, I need your support in order to understand how to solve my problem, because I already spent too much time and I don't know how to solve it.

The steps done are the following (as in vignette of beadarray):

> source("http://www.bioconductor.org/biocLite.R")
Bioconductor version 3.4 (BiocInstaller 1.24.0), ?biocLite for help
> library (GEOquery)
> library ("beadarray")
> gse <- getGEO("GSE16570")
https://ftp.ncbi.nlm.nih.gov/geo/series/GSE16nnn/GSE16570/matrix/
OK
Found 1 file(s)
GSE16570_series_matrix.txt.gz
Using locally cached version: /tmp/RtmpqrUsaF/GSE16570_series_matrix.txt.gz
Using locally cached version of GPL6947 found here:
/tmp/RtmpqrUsaF/GPL6947.soft
Warning message:
In read.table(file = file, header = header, sep = sep, quote = quote, :
not all columns named in 'colClasses' exist
> head(exprs(gse[[1]]))
             GSM416669 GSM416670 GSM416671 GSM416672 GSM416673 GSM416674 GSM416675 GSM416676 GSM416677 GSM416678 GSM416679 GSM416680
ILMN_1343291   41536.0 45373.90 44729.9344 45373.9000 37678.2414 45373.9000 41401.46 41710.67 45373.90 43194.8834 45373.90 39210.2438
ILMN_1343295   15916.2 18469.74 14981.4909 17471.4313 15566.8978 14633.9670 17822.06 18731.63 16440.84 14719.4626 20182.79 12236.7710
ILMN_1651199      50.7     48.00    43.1333    51.0000    49.5333 42.3000     48.15     45.90     47.30    46.5000     44.60 46.1500
ILMN_1651209      66.4     61.60    66.9000    80.8750    73.8407 85.5833    100.76     80.50     58.40    82.5733     90.50 96.5059
ILMN_1651210      51.3     45.35    48.5000    45.8000    47.5000 51.2500     48.40     50.95     46.40    47.4000     53.20 49.7500
ILMN_1651221      49.6     58.70    57.8000    67.9846    57.1000 55.5500     67.54     61.10     54.00    47.4000     61.50 63.6200
              GSM416681 GSM416682 GSM416683 GSM416684
ILMN_1343291 34557.5028 38430.8676 45373.9000 42919.1497
ILMN_1343295 16962.7363 12016.3114 15753.2659 17366.6322
ILMN_1651199    50.9500    51.8000    41.4600    56.2500
ILMN_1651209    86.9375    92.5625    75.0444    76.8083
ILMN_1651210    48.9500    52.6500    48.9500    42.2250
ILMN_1651221    73.1000    51.9500    55.2333    50.9500
> gse
$GSE16570_series_matrix.txt.gz
ExpressionSet (storageMode: lockedEnvironment)
assayData: 48803 features, 16 samples
element names: exprs
protocolData: none
phenoData
sampleNames: GSM416669 GSM416670 ... GSM416684 (16 total)
varLabels: title geo_accession ... data_row_count (32 total)
varMetadata: labelDescription
featureData
featureNames: ILMN_1343291 ILMN_1343295 ... ILMN_2416019 (48803 total)
fvarLabels: ID nuID ... GB_ACC (30 total)
fvarMetadata: Column Description labelDescription
experimentData: use 'experimentData(object)'
Annotation: GPL6947

But when I try to import to the native class to work on beadarray, I get the following:

> summaryData <- as(gse[[1]], "ExpressionSetIllumina")
Error in object@channelData[[1]] : subscript out of bounds

Notice that I get the same using the code from NCBI:

> gse <- getGEO("GSE16570", GSEMatrix =TRUE, getGPL=FALSE)
https://ftp.ncbi.nlm.nih.gov/geo/series/GSE16nnn/GSE16570/matrix/
OK
Found 1 file(s)
GSE16570_series_matrix.txt.gz
Using locally cached version: /tmp/RtmpqrUsaF/GSE16570_series_matrix.txt.gz
> if (length(gset) > 1) idx <- grep("GPL6947", attr(gse, "names")) else idx <- 1
> gse <- gse[[idx]]
> head(exprs(gse))
             GSM416669 GSM416670  GSM416671  GSM416672  GSM416673  GSM416674 GSM416675 GSM416676 GSM416677  GSM416678 GSM416679  GSM416680
ILMN_1343291   41536.0  45373.90 44729.9344 45373.9000 37678.2414 45373.9000  41401.46  41710.67  45373.90 43194.8834  45373.90 39210.2438
ILMN_1343295   15916.2  18469.74 14981.4909 17471.4313 15566.8978 14633.9670  17822.06  18731.63  16440.84 14719.4626  20182.79 12236.7710
ILMN_1651199      50.7     48.00    43.1333    51.0000    49.5333    42.3000     48.15     45.90     47.30    46.5000     44.60    46.1500
ILMN_1651209      66.4     61.60    66.9000    80.8750    73.8407    85.5833    100.76     80.50     58.40    82.5733     90.50    96.5059
ILMN_1651210      51.3     45.35    48.5000    45.8000    47.5000    51.2500     48.40     50.95     46.40    47.4000     53.20    49.7500
ILMN_1651221      49.6     58.70    57.8000    67.9846    57.1000    55.5500     67.54     61.10     54.00    47.4000     61.50    63.6200
              GSM416681  GSM416682  GSM416683  GSM416684
ILMN_1343291 34557.5028 38430.8676 45373.9000 42919.1497
ILMN_1343295 16962.7363 12016.3114 15753.2659 17366.6322
ILMN_1651199    50.9500    51.8000    41.4600    56.2500
ILMN_1651209    86.9375    92.5625    75.0444    76.8083
ILMN_1651210    48.9500    52.6500    48.9500    42.2250
ILMN_1651221    73.1000    51.9500    55.2333    50.9500
> summaryData <- as(gse, "ExpressionSetIllumina")
Error in object@channelData[[1]] : subscript out of bounds

>

Please, any suggest is welcome.

Here is my sessionInfo:
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.4 LTS

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8 LC_NAME=C                  LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel stats     graphics grDevices utils     datasets methods   base

other attached packages:
[1] beadarray_2.24.0     ggplot2_2.2.0        GEOquery_2.40.0 Biobase_2.34.0       BiocGenerics_0.20.0 BiocInstaller_1.24.0

loaded via a namespace (and not attached):
[1] Rcpp_0.12.8          plyr_1.8.4           GenomeInfoDb_1.10.1 XVector_0.14.0       bitops_1.0-6         tools_3.3.2
[7] zlibbioc_1.20.0      digest_0.6.10        base64_2.0 RSQLite_1.1          memoise_1.0.0        tibble_1.2
[13] gtable_0.2.0         DBI_0.5-1            httr_1.2.1 stringr_1.1.0        S4Vectors_0.12.1     IRanges_2.8.1
[19] stats4_3.3.2         grid_3.3.2           R6_2.2.0 AnnotationDbi_1.36.0 XML_3.98-1.5         limma_3.30.6
[25] BeadDataPackR_1.26.0 reshape2_1.4.2       magrittr_1.5 scales_0.4.1         GenomicRanges_1.26.1 assertthat_0.1
[31] colorspace_1.3-1     stringi_1.1.2        openssl_0.9.5 RCurl_1.95-4.8       lazyeval_0.2.0       munsell_0.4.3
[37] illuminaio_0.16.0

Best regards

GEOquery beadarray ExpressionSetIllumina getGEO • 1.3k views

ADD COMMENT • link updated 7.4 years ago by Gordon Smyth 50k • written 7.4 years ago by emiliomastriani • 0

score 2 · Answer 1 · 2016-12-07

I have pushed a patch to the developmental version and it should be available from version 2.25.1. Since this is a bug I'll push it to release once I've determined whether it's a suitable fix. In the mean time you can run this code before you try the conversion and it should work for you.

setReplaceMethod("exprs", 
                 signature(object="ExpressionSetIllumina",value="matrix"),
                 function(object, value) 
                     assayDataElementReplace(object, "exprs", 
                                             value, validate = FALSE))

summaryData <- as(gse[[1]], "ExpressionSetIllumina")

> summaryData

ExpressionSetIllumina (storageMode: list)
assayData: 48803 features, 16 samples 
  element names: exprs, se.exprs, nObservations, Detection 
protocolData: none
phenoData
  sampleNames: GSM416669 GSM416670 ... GSM416684 (16 total)
  varLabels: title geo_accession ... data_row_count (32 total)
  varMetadata: labelDescription
featureData
  featureNames: ILMN_1343291 ILMN_1343295 ... ILMN_2416019 (48803 total)
  fvarLabels: Row.names ID ... PROBESEQUENCE (35 total)
  fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation: Humanv3 
QC Information
 Available Slots:  
  QC Items: 
  sampleNames:

score 0 · Answer 2 · 2016-12-06

I've just had a dig into the code, and I think this is a problem introduced by changes in Biobase, which we haven't seen until now. It looks like they relate to checks in the dimensions of a class, which ExpressionSetIllumina reports differently from a standard ExpressionSet.

I'm not sure there's a quick fix to give you, but it isn't something wrong with your code. I'll take a look at the beadarray code in the next few days and let you know when I have a patch available.

score 0 · Answer 3 · 2016-12-06

0

Entering edit mode

emiliomastriani • 0

@emiliomastriani-11983

Last seen 6.5 years ago

Thank you very much for your support.

I am sure that you will give soon an optimal solution to the problem.

In the meantime, any other suggestions are welcome!!!!

Best regards,

Emilio

ADD COMMENT • link 7.4 years ago emiliomastriani • 0

score 0 · Answer 4 · 2016-12-08

0

Entering edit mode

emiliomastriani • 0

@emiliomastriani-11983

Last seen 6.5 years ago

Dear Mark,

thank you very much for your support. I applied your update, but I am afraid that something is not right. Going to check the validity of the created object I get the following error:

> validObject(summaryData)
Error in validObject(summaryData) : 
  invalid class “ExpressionSetIllumina” object: 1: row numbers differ for assayData members
invalid class “ExpressionSetIllumina” object: 2: sample numbers differ for assayData members
invalid class “ExpressionSetIllumina” object: 3: feature numbers differ between assayData and featureData
invalid class “ExpressionSetIllumina” object: 4: sample numbers differ between assayData and phenoData

Some idea?

Best regards

ADD COMMENT • link 7.4 years ago emiliomastriani • 0

0

Entering edit mode

What are you planning to do with the object? Does it actually need to be a ExpressionSetIllumina? validObject() is failing for very similar reasons to the first error, namely that dimension checks in Biobase aren't actually appropriate for the ExpressionSetIllumina class.

It's been a long time since I've worked with beadarray data, but I seem to remember the ExpressionSetIllumina mostly extends a regular ExpressionSet in order to store the number of replicate beads for each observation, the standard error for each bead type and some measure of detection above background. I'm not sure you have any of that information in the GEO data you're downloading, so does it ever need to be converted to ExpressionSetIllumina?

I might suggest continuing without that step and seeing what happens. Either tools will work or they won't, but it's probably worth proceeding anyway.

ADD REPLY • link 7.4 years ago Mike Smith ★ 6.5k

score 0 · Answer 5 · 2016-12-14

This dataset is just a matrix of normalized expression values and it is easy to read it directly using basic R functions. See A: beadarray error conversion for code to read and analyze this GEO dataset.

Alternatively you can work with the gse object you got from getGEO(), which can be given directly to limma modelling functions, for example:

fit <- lmFit(gse, design)
fit <- eBayes(fit, robust=TRUE, trend=TRUE)

and so on.

There is no need to do any class conversions. Why make things unnecessarily hard?