GEOquery run into problem with Sys.setenv("VROOM_CONNECTION_SIZE")
2
1
Entering edit mode
array chip ▴ 420
@array-chip-4136
Last seen 11 months ago
United States

Hi, I a using GEOquery to download some microarray datasets. It worked previously without any problem under old R version (3.5 probably), but now run into problem now I've updated R to version 4.0.5.

Code should be placed in three backticks as shown below

library(GEOquery)
gse<-getGEO(GEO="GSE137140")

# Found 1 file(s)
# GSE137140_series_matrix.txt.gz
# trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE137nnn/GSE137140/matrix/GSE137140_series_matrix.txt.gz'
# Content type 'application/x-gzip' length 45700106 bytes (43.6 MB)
# downloaded 43.6 MB

# Error: The size of the connection biffer (131072) was not large enough
# to fit a complete line:
#    Increase it by setting 'Sys.setenv("VROOM_CONNECTION_SIZE")'

sessionInfo( )

R version 4.0.5 (2021-03-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] limma_3.46.0        GEOquery_2.58.0     Biobase_2.50.0      BiocGenerics_0.36.1 BiocManager_1.30.12

loaded via a namespace (and not attached):
 [1] xml2_1.3.3       magrittr_2.0.1   hms_1.1.1        bit_4.0.4        tidyselect_1.1.1 R6_2.5.1         rlang_0.4.12     fansi_0.5.0     
 [9] dplyr_1.0.7      tools_4.0.5      vroom_1.5.7      utf8_1.2.2       DBI_1.1.1        withr_2.4.3      ellipsis_0.3.2   bit64_4.0.5     
[17] assertthat_0.2.1 tibble_3.1.6     lifecycle_1.0.1  crayon_1.4.2     tidyr_1.1.4      purrr_0.3.4      readr_2.1.1      tzdb_0.2.0      
[25] vctrs_0.3.8      curl_4.3.2       glue_1.5.1       compiler_4.0.5   pillar_1.6.4     generics_0.1.1   pkgconfig_2.0.3

I tried to increase the connection buffer by Sys.setenv("VROOM_CONNECTION_SIZE"=131072*5), but no success.

Can someone tell me what's going on? Why it worked flawlessly, but now the problem with new R and new bioconductor versions?

Thanks,

John

GEOquery VROOM • 4.7k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 11 hours ago
United States

Yes. It told you there was a problem and then it said how to fix it. Did you try that?

ADD COMMENT
0
Entering edit mode

Yes, I tried to increase the connection buffer by Sys.setenv("VROOM_CONNECTION_SIZE"=131072*5), but no success.

ADD REPLY
0
Entering edit mode

And it worked without any problem with old R/bioconductor, why all of sudden the problem came up with new R/bioconductor? Could you try my 2-line of code and see if you have the same problem? I have tried it on 2 computers, with the same problem

Thanks

ADD REPLY
0
Entering edit mode

also tried to increasing to 10 fold of the size, still no success. I think it's not the size problem

ADD REPLY
0
Entering edit mode
> Sys.setenv("VROOM_CONNECTION_SIZE" = 262144)
> gse<-getGEO(GEO="GSE137140")
Found 1 file(s)
GSE137140_series_matrix.txt.gz
> gse[[1]]
ExpressionSet (storageMode: lockedEnvironment)
assayData: 2565 features, 3924 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: GSM4067570 GSM4067571 ... GSM4071493 (3924 total)
  varLabels: title geo_accession ... tissue:ch1 (37 total)
  varMetadata: labelDescription
featureData
  featureNames: MIMAT0000062 MIMAT0000063 ... MIMAT0035704 (2565 total)
  fvarLabels: ID miRNA miRNA_ID_LIST
  fvarMetadata: Column Description labelDescription
experimentData: use 'experimentData(object)'
  pubMedIds: 32193503 
Annotation: GPL21263 
> Sys.getenv("VROOM_CONNECTION_SIZE")
[1] "262144"

And for completeness

> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GEOquery_2.62.1     Biobase_2.54.0      BiocGenerics_0.40.0

loaded via a namespace (and not attached):
 [1] xml2_1.3.2        magrittr_2.0.1    hms_1.1.1         tidyselect_1.1.1 
 [5] R6_2.5.1          rlang_0.4.12      fansi_0.5.0       dplyr_1.0.7      
 [9] tools_4.1.2       data.table_1.14.2 R.oo_1.24.0       utf8_1.2.2       
[13] DBI_1.1.1         ellipsis_0.3.2    assertthat_0.2.1  tibble_3.1.6     
[17] lifecycle_1.0.1   crayon_1.4.2      purrr_0.3.4       readr_2.1.0      
[21] tzdb_0.2.0        tidyr_1.1.4       R.utils_2.11.0    vctrs_0.3.8      
[25] curl_4.3.2        glue_1.5.0        limma_3.50.0      compiler_4.1.2   
[29] pillar_1.6.4      R.methodsS3_1.8.1 generics_0.1.1    pkgconfig_2.0.3
ADD REPLY
0
Entering edit mode

And for super extra completeness

> library(memuse)
> Sys.meminfo()
Totalram:  15.910 GiB 
Freeram:    7.150 GiB

So I can download it on my crappidy old Windows box with 16Gb RAM

ADD REPLY
0
Entering edit mode

Thanks!! maybe I need to get the latest R 4.1.2, I'll try that.

ADD REPLY
0
Entering edit mode

And to answer your question about why it doesn't work any more, GEOquery now uses vroom from the tidyverse to read in the data. It's meant to be superduper fast, but like lots of the tidyverse relies on various environment flags, that you sometimes have to set.

ADD REPLY
0
Entering edit mode

Thanks, Jim, for answering all this in the time that it took me to have just 1 zoom meeting!

Actually, GEOquery doesn't use vroom anymore for exactly these reasons. It now relies on data.table::fread.

ADD REPLY
1
Entering edit mode

If your zoom meetings are like mine, I had plenty of time. ;-D

ADD REPLY
0
Entering edit mode

I think this works because of the GEOquery version used, 2.62.1, which is data.table::fread() based. Updating R to 4.1 or later and then updating GEOquery will fix the vroom errors and also some download issues on Windows. Highly recommend the update.

ADD REPLY
0
Entering edit mode

Thanks to both. I installed the latest R 4.1.2, but now I got problem with installing the latest bioconductor version 3/14. This is frustrating... what I need to do...

> if (!require("BiocManager", quietly = TRUE))
     install.packages("BiocManager")

> BiocManager::install()
'getOption("repos")' replaces Bioconductor standard repositories, see '?repositories' for details

replacement repositories:
    CRAN: https://mirror.las.iastate.edu/CRAN

Bioconductor version 3.14 (BiocManager 1.30.16), R 4.1.2 (2021-11-01)
Installing package(s) 'BiocVersion'
Warning message:
In .inet_warning(msg) :
  package ‘BiocVersion’ is not available for Bioconductor version '3.14'

A version of this package for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages

> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] BiocManager_1.30.16 compiler_4.1.2      tools_4.1.2
ADD REPLY
1
Entering edit mode

Uh, try again? BiocVersion is for sure available for Bioc 3.14

ADD REPLY
0
Entering edit mode

yes, I tried uninstalling and reinstalling R 4.1.2 and then use default bioconductor code, still got then same error. And on 2 computers...Very furstrated...

ADD REPLY
0
Entering edit mode

I'm a little stumped. We could guess more, but it might be worth asking a new question just so others can have a look. Sorry for the frustration!

ADD REPLY
0
Entering edit mode

Ok thanks Sean and Jim

ADD REPLY
0
Entering edit mode

Might have something to do with this

> z <- available.packages("https://bioconductor.org/packages/3.14/bioc/src/contrib")
> row.names(z)
[1] "AnVIL"        "HubPub"       "PhyloProfile" "XNAString"    "YAPSA"

As compared to, say

> z <- available.packages("https://bioconductor.org/packages/3.13/bioc/src/contrib")
> dim(z)
[1] 1984   17

Which looks more reasonable... I'll see what Martin and Lori have to say about that.

ADD REPLY
0
Entering edit mode

Should be repaired now. See Installing Bioconductor. Sorry for the trouble.

H.

ADD REPLY
0
Entering edit mode

Thank you for the fix! All good now

ADD REPLY
0
Entering edit mode
@sean-davis-490
Last seen 5 months ago
United States

Here is a detailed answer to what is going on.

Historical context: GEOquery used readr::read_table() for many years. At some point, read_table() started to use functionality in the vroom package. When vroom limited buffer size, GEOquery started seeing parsing failures that required setting the environment variable VROOM_CONNECTION_SIZE. I could have just set the environment to an arbitrarily large number and left things at that, but changing environment variables in package code is not something any of us should be comfortable doing unless we "own" the environment variable. Addressing the error requires the user to 1) read and understand the error message and 2) iteratively try settings until things work.

So, I refactored GEOquery and replaced readr::read_table() and associated readr functionality with data.table::fread() code; this fixes the error and also benefits from some additional speed from data.table. So, the recommended solution to the error in the original post is to upgrade R to 4.1 or later and reinstall GEOquery to get the matching Bioconductor and GEOquery versions.

ADD COMMENT
0
Entering edit mode

Hi Sean, can you please take a look at my new problem above regarding BiocVersion? Thanks so much!

ADD REPLY
0
Entering edit mode

Hopefully, you are all good now thanks to the amazing Bioconductor Team fix.

ADD REPLY
0
Entering edit mode

All good now. Yes, many thanks to the bioconductor team, including you and Jim!

ADD REPLY

Login before adding your answer.

Traffic: 523 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6