Reading vcf file with readVcf
1
0
Entering edit mode
@bbf7f868
Last seen 16 days ago
Sweden

Enter the body of text here

I have 26 Gb of RAM available on my laptop and I am trying to import a vcf file of 1.8GB using the readVcf function from the Variant Annotation package. So far the process was stopped every time. I guess it might be for some memory issue but I don't understand why because I shoud have enough memory I think. I am running this in a terminal on ubuntu.

Any ideas ?

Code should be placed in three backticks as shown below


# include your problematic code here with any corresponding output 
vcf <- readVcf('EA1EA2_all90_filt_map.recode.vcf')
Processus arrĂȘtĂ©


# please also include the results of running the following in an R session 



sessionInfo( )

R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8    
 [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=fr_FR.UTF-8   
 [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] rtracklayer_1.46.0          VariantAnnotation_1.32.0   
 [3] GenomicAlignments_1.22.1    Rsamtools_2.2.3            
 [5] Biostrings_2.54.0           XVector_0.26.0             
 [7] SummarizedExperiment_1.16.1 DelayedArray_0.12.3        
 [9] BiocParallel_1.20.1         matrixStats_0.61.0         
[11] Biobase_2.46.0              GenomicRanges_1.38.0       
[13] GenomeInfoDb_1.22.1         IRanges_2.20.2             
[15] S4Vectors_0.24.4            BiocGenerics_0.32.0        

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.7             lattice_0.20-44        prettyunits_1.1.1     
 [4] assertthat_0.2.1       utf8_1.2.2             BiocFileCache_1.10.2  
 [7] R6_2.5.1               RSQLite_2.2.8          httr_1.4.2            
[10] pillar_1.6.2           zlibbioc_1.32.0        rlang_0.4.11          
[13] GenomicFeatures_1.38.2 progress_1.2.2         curl_4.3.2            
[16] blob_1.2.2             Matrix_1.3-4           stringr_1.4.0         
[19] RCurl_1.98-1.5         bit_4.0.4              biomaRt_2.42.1        
[22] compiler_3.6.3         pkgconfig_2.0.3        askpass_1.1           
[25] openssl_1.4.5          tidyselect_1.1.1       tibble_3.1.4          
[28] GenomeInfoDbData_1.2.2 XML_3.99-0.3           fansi_0.5.0           
[31] crayon_1.4.1           dplyr_1.0.7            dbplyr_2.1.1          
[34] bitops_1.0-7           rappdirs_0.3.3         grid_3.6.3            
[37] lifecycle_1.0.0        DBI_1.1.1              magrittr_2.0.1        
[40] stringi_1.7.4          cachem_1.0.6           ellipsis_0.3.2        
[43] generics_0.1.0         vctrs_0.3.8            tools_3.6.3           
[46] bit64_4.0.5            BSgenome_1.54.0        glue_1.4.2            
[49] purrr_0.3.4            hms_1.1.0              fastmap_1.1.0         
[52] AnnotationDbi_1.48.0   memoise_2.0.0
VariantAnnotation • 138 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

Usually when you run out of RAM, you usually get an error saying that 'R couldn't allocate a vector of size <some number>'. When it just says the process was stopped, that seems more like an error at the C level, where it hit something unexpected and just gave up. You could try a few things to try to track it down.

  1. Update your R/Bioconductor. You are woefully out of date.
  2. Use a VcfFile rather than a character file name
  3. Read in each chromosome separately. To do this you have to use a VcfFile with an indexed VCF.

If you can read in each chromosome without problem, then maybe it is a memory issue. In which case if you are just reading in the VCF to process the data, maybe you could do it in chunks?

0
Entering edit mode

Hi,

Thanks for the comment. I updated R and Bioconductor, here is the sessionInfo()

sessionInfo() R version 4.1.1 (2021-08-10) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04.3 LTS

Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale: [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C
[3] LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8
[5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8
[7] LC_PAPER=fr_FR.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] rtracklayer_1.52.1 VariantAnnotation_1.38.0
[3] GenomicAlignments_1.28.0 Rsamtools_2.8.0
[5] Biostrings_2.60.2 XVector_0.32.0
[7] SummarizedExperiment_1.22.0 Biobase_2.52.0
[9] MatrixGenerics_1.4.3 matrixStats_0.61.0
[11] GenomicRanges_1.44.0 GenomeInfoDb_1.28.4
[13] IRanges_2.26.0 S4Vectors_0.30.0
[15] BiocGenerics_0.38.0

loaded via a namespace (and not attached): [1] Rcpp_1.0.7 lattice_0.20-45 prettyunits_1.1.1
[4] png_0.1-7 assertthat_0.2.1 digest_0.6.27
[7] utf8_1.2.2 BiocFileCache_2.0.0 R6_2.5.1
[10] RSQLite_2.2.8 httr_1.4.2 pillar_1.6.2
[13] zlibbioc_1.38.0 rlang_0.4.11 GenomicFeatures_1.44.2 [16] progress_1.2.2 curl_4.3.2 rstudioapi_0.13
[19] blob_1.2.2 Matrix_1.3-4 BiocParallel_1.26.2
[22] stringr_1.4.0 RCurl_1.98-1.5 bit_4.0.4
[25] biomaRt_2.48.3 DelayedArray_0.18.0 compiler_4.1.1
[28] pkgconfig_2.0.3 tidyselect_1.1.1 KEGGREST_1.32.0
[31] tibble_3.1.4 GenomeInfoDbData_1.2.6 XML_3.99-0.8
[34] fansi_0.5.0 crayon_1.4.1 dplyr_1.0.7
[37] dbplyr_2.1.1 bitops_1.0-7 rappdirs_0.3.3
[40] grid_4.1.1 lifecycle_1.0.0 DBI_1.1.1
[43] magrittr_2.0.1 stringi_1.7.4 cachem_1.0.6
[46] xml2_1.3.2 ellipsis_0.3.2 filelock_1.0.2
[49] vctrs_0.3.8 generics_0.1.0 rjson_0.2.20
[52] restfulr_0.0.13 tools_4.1.1 bit64_4.0.5
[55] BSgenome_1.60.0 glue_1.4.2 purrr_0.3.4
[58] hms_1.1.0 yaml_2.2.1 fastmap_1.1.0
[61] AnnotationDbi_1.54.1 memoise_2.0.0 BiocIO_1.2.0

I also tried to use the VcfFile function, but I stilll have the same issue.

In parallel I ran the same script on a cluster and it worked without problem, so it is either a problem of memory (but again I have 26Gb of RAM available and the vcf file is 1.6GB) or there is something wrong with my installation maybe ?

Any idea ?

ADD REPLY

Login before adding your answer.

Traffic: 293 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6