ReadVcf Memory Issues
2
0
Entering edit mode
Timothy Duff ▴ 40
@timothy-duff-5396
Last seen 9.6 years ago
Hi. I am trying to determine, from a filtered set of Ilumina 450k probes, which of them occur with a specified frequency in a given population. I was referred to the Variant Annotation package by this mailing list. While the readVcf function seems to handle small loads nicely, looping over the regions of interest seems to cause allocation troubles. R tells me "Realloc could not re-allocate memory (0 bytes)" after about 4 iterations. Below is the relevent code, and below it the output of sessionInfo(). If anyone might sugget some diagnostic measures or an alternate way of doing this I would appreciate it. Thanks. ------ library(VariantAnnotation) library(IlluminaHumanMethylation450kprobe) filename <- " ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/supporting/E UR.2of4intersection_allele_freq.20100804.genotypes.vcf.gz " load("rsids.Rdata") # a data frame containing probe id, rs id, and chromosome data(IlluminaHumanMethylation450kprobe) colnames(rsids) <- c("Probe_ID", "RS_ID", "CHR") m <- merge(IlluminaHumanMethylation450kprobe, rsids, by="Probe_ID") filename <- " ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/supporting/E UR.2of4intersection_allele_freq.20100804.genotypes.vcf.gz " for (i in 1:length(m$Probe_ID)) { snprange <- readVcf(TabixFile(filename), "hg19", param=GRanges(as.character(m$CHR[i]), IRanges(as.integer(m$start[i]), as.integer(m$end[i])))) freq <- elementMetadata(info(snprange))["EUR_R2"][1,1] if is.na(freq) == FALSE & freq < .99 & freq > .01) { m$CpGs[i] <- 1 } else { m$CpGs[i] <- 0 } } ---- > sessionInfo() R version 2.15.1 (2012-06-22) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] IlluminaHumanMethylation450kprobe_2.0.6 [2] AnnotationDbi_1.18.1 [3] Biobase_2.16.0 [4] BiocInstaller_1.4.7 [5] VariantAnnotation_1.2.9 [6] Rsamtools_1.8.5 [7] Biostrings_2.24.1 [8] GenomicRanges_1.8.7 [9] IRanges_1.14.4 [10] BiocGenerics_0.2.0 loaded via a namespace (and not attached): [1] biomaRt_2.2.0 bitops_1.0-4.1 BSgenome_1.24.0 [4] DBI_0.2-5 GenomicFeatures_1.8.2 grid_2.15.1 [7] lattice_0.20-6 Matrix_1.0-6 RCurl_1.4-3 [10] RSQLite_0.9-2 rtracklayer_1.16.3 snpStats_1.6.0 [13] splines_2.15.1 stats4_2.15.1 survival_2.36-14 [16] tools_2.15.1 XML_3.9-4 zlibbioc_1.2.0 -- Tim [[alternative HTML version deleted]]
Annotation probe Annotation probe • 1.4k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 22 hours ago
United States
On 07/19/2012 05:59 PM, Timothy Duff wrote: > Hi. I am trying to determine, from a filtered set of Ilumina 450k probes, > which of them occur with a specified frequency in a given population. I was > referred to the Variant Annotation package by this mailing list. While the > readVcf function seems to handle small loads nicely, looping over the > regions of interest seems to cause allocation troubles. R tells me "Realloc > could not re-allocate memory (0 bytes)" after about 4 iterations. Below is for this, I think it is a bug in the release version of VariantAnnotation, and that it is fixed in devel v. 1.3.6 (current devel version is 1.3.16) and will be fixed in release version 1.2.10, probably built Saturday morning, 10am Seattle time. The short-term solution is to switch to using the devel branch (http://bioconductor.org/developers/useDevel/), but the bug might be avoided anyway by re-coding as suggested by Vince. Martin > the relevent code, and below it the output of sessionInfo(). If anyone > might sugget some diagnostic measures or an alternate way of doing this I > would appreciate it. Thanks. > > ------ > > library(VariantAnnotation) > library(IlluminaHumanMethylation450kprobe) > > filename <- " > ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/supporting /EUR.2of4intersection_allele_freq.20100804.genotypes.vcf.gz > " > > load("rsids.Rdata") # a data frame containing probe id, rs id, and > chromosome > data(IlluminaHumanMethylation450kprobe) > colnames(rsids) <- c("Probe_ID", "RS_ID", "CHR") > m <- merge(IlluminaHumanMethylation450kprobe, rsids, by="Probe_ID") > > filename <- " > ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/supporting /EUR.2of4intersection_allele_freq.20100804.genotypes.vcf.gz > " > > for (i in 1:length(m$Probe_ID)) { > snprange <- readVcf(TabixFile(filename), "hg19", > param=GRanges(as.character(m$CHR[i]), IRanges(as.integer(m$start[i]), > as.integer(m$end[i])))) > freq <- elementMetadata(info(snprange))["EUR_R2"][1,1] > if is.na(freq) == FALSE & freq < .99 & freq > .01) { > m$CpGs[i] <- 1 > } > else { > m$CpGs[i] <- 0 > } > } > > > ---- > >> sessionInfo() > R version 2.15.1 (2012-06-22) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] IlluminaHumanMethylation450kprobe_2.0.6 > [2] AnnotationDbi_1.18.1 > [3] Biobase_2.16.0 > [4] BiocInstaller_1.4.7 > [5] VariantAnnotation_1.2.9 > [6] Rsamtools_1.8.5 > [7] Biostrings_2.24.1 > [8] GenomicRanges_1.8.7 > [9] IRanges_1.14.4 > [10] BiocGenerics_0.2.0 > > loaded via a namespace (and not attached): > [1] biomaRt_2.2.0 bitops_1.0-4.1 BSgenome_1.24.0 > [4] DBI_0.2-5 GenomicFeatures_1.8.2 grid_2.15.1 > [7] lattice_0.20-6 Matrix_1.0-6 RCurl_1.4-3 > [10] RSQLite_0.9-2 rtracklayer_1.16.3 snpStats_1.6.0 > [13] splines_2.15.1 stats4_2.15.1 survival_2.36-14 > [16] tools_2.15.1 XML_3.9-4 zlibbioc_1.2.0 > > > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
@vincent-j-carey-jr-4
Last seen 5 weeks ago
United States
On Thu, Jul 19, 2012 at 8:59 PM, Timothy Duff <timothy.duff@ncf.edu> wrote: > Hi. I am trying to determine, from a filtered set of Ilumina 450k probes, > which of them occur with a specified frequency in a given population. I was > referred to the Variant Annotation package by this mailing list. While the > readVcf function seems to handle small loads nicely, looping over the > regions of interest seems to cause allocation troubles. R tells me "Realloc > could not re-allocate memory (0 bytes)" after about 4 iterations. Below is > the relevent code, and below it the output of sessionInfo(). If anyone > might sugget some diagnostic measures or an alternate way of doing this I > would appreciate it. Thanks. > > ------ > > library(VariantAnnotation) > library(IlluminaHumanMethylation450kprobe) > > filename <- " > > ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/supporting /EUR.2of4intersection_allele_freq.20100804.genotypes.vcf.gz > " > > load("rsids.Rdata") # a data frame containing probe id, rs id, and > chromosome > data(IlluminaHumanMethylation450kprobe) > colnames(rsids) <- c("Probe_ID", "RS_ID", "CHR") > m <- merge(IlluminaHumanMethylation450kprobe, rsids, by="Probe_ID") > > filename <- " > > ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/supporting /EUR.2of4intersection_allele_freq.20100804.genotypes.vcf.gz > " > > Briefly, at a minimum you should set up your param to be a GRanges with multiple ranges, instead of running readVcf one range at a time. How many ranges, and how to best use the yield capability, I cannot comment on at the moment. There is no reason to do this with record- wise looping, but some kind of chunking may be in order. Have a look at scanVcf which may give you more control over the consumption as the VCF is traversed. for (i in 1:length(m$Probe_ID)) { > snprange <- readVcf(TabixFile(filename), "hg19", > param=GRanges(as.character(m$CHR[i]), IRanges(as.integer(m$start[i]), > as.integer(m$end[i])))) > freq <- elementMetadata(info(snprange))["EUR_R2"][1,1] > if is.na(freq) == FALSE & freq < .99 & freq > .01) { > m$CpGs[i] <- 1 > } > else { > m$CpGs[i] <- 0 > } > } > > > ---- > > > sessionInfo() > R version 2.15.1 (2012-06-22) > Platform: x86_64-pc-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] IlluminaHumanMethylation450kprobe_2.0.6 > [2] AnnotationDbi_1.18.1 > [3] Biobase_2.16.0 > [4] BiocInstaller_1.4.7 > [5] VariantAnnotation_1.2.9 > [6] Rsamtools_1.8.5 > [7] Biostrings_2.24.1 > [8] GenomicRanges_1.8.7 > [9] IRanges_1.14.4 > [10] BiocGenerics_0.2.0 > > loaded via a namespace (and not attached): > [1] biomaRt_2.2.0 bitops_1.0-4.1 BSgenome_1.24.0 > [4] DBI_0.2-5 GenomicFeatures_1.8.2 grid_2.15.1 > [7] lattice_0.20-6 Matrix_1.0-6 RCurl_1.4-3 > [10] RSQLite_0.9-2 rtracklayer_1.16.3 snpStats_1.6.0 > [13] splines_2.15.1 stats4_2.15.1 survival_2.36-14 > [16] tools_2.15.1 XML_3.9-4 zlibbioc_1.2.0 > > > > -- > Tim > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 492 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6