Error when computing ld with snpStats in big samples
0
1
Entering edit mode
@carlosruiz-8222
Last seen 7.1 years ago
Spain/Barcelona/CREAL

Hi,

I want to compute the linkage disequilibrium between the SNPs of some predifined regions of the genome of around Mb using snpStats. I have downloaded 1000 Genomes VCF files but I am experiencing some errors.

I start with a list of target genomic regions in a GRanges. I have created some functions to read the SNPs of the region, transform the VCF object into a SNP matrix and finally compute the LD. Most of these steps are performed using existing BioC functions. The problems appear when I want to compute the LD for a big numbers of SNPs with a big depth.

I tried to compute the LD using the ld function of snpStats and the following problem raised:

> computeLD(GRcomp[2], EUR))

*** caught segfault ***
address 0x7fe323878000, cause 'invalid permissions'

Traceback:
 1: .Call("ld", x, y, as.integer(depth), cstats, symmetric, PACKAGE = "snpStats")
 2: ld(snpsVCF$genotypes, depth = 30000, stats = "D.prime")

3: computeLD(GRcomp[2], EUR)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace

This region is quite big (14Mb) and contains many snps (379737)

>  as.character(GRcomp[2])
                merge2
"2:99698199-114173850"

This is an example of a very big region but I have had problems with smaller regions too. I have tested the wrappers with smaller regions and it works. Is there any limitation in the maximum of SNPs and depth that can be passed to ld? I haven't been able to find anything about it on the web.

I also send you the code of my wrappers. The functions use VCF files of 1000 Genomes:

computeLD <- function(range, samples){
  message("Loading VCF")
  snpsVCF <- getVCFmatrix(range, samples)
  message("Computing LD")
  res <- ld(snpsVCF$genotypes, depth = 30000, stats = "D.prime")
  res
}

getVCFmatrix <- function(range, samples = vcfsamples){
  vcf <- loadVCFrange(range, samples)
  snpsVCF <- genotypeToSnpMatrix(vcf)
  ## Conversion from VCF to SNP matrix produces some SNPs to be NA (multiallelic or bigger than 1)
  snpsVCF$map$position <- start(rowRanges(vcf))
  snpsVCF$map$chromosome <- rep(as.character(seqnames(range)), nrow(snpsVCF$map))
  snpsVCF$genotypes <- snpsVCF$genotypes[, !snpsVCF$map$ignore]
  snpsVCF$map <- snpsVCF$map[!snpsVCF$map$ignore, ]
  snpsVCF$map$allele.2 <- unlist(snpsVCF$map$allele.2)
  snpsVCF
}

loadVCFrange <- function(range, samples){
  vcffile <- paste0("ALL.chr", as.character(seqnames(range)), ".phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz")
  vcfsamples <- samples(scanVcfHeader(vcffile))
  samples <- vcfsamples[vcfsamples %in% samples]
  param <- ScanVcfParam(samples = samples, which = range)
  vcf <- readVcf(vcffile, "hg19", param)
  vcf
}

I also send you my session info:

R version 3.3.0 (2016-05-03)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.7 (Final)

locale:
 [1] LC_CTYPE=es_ES.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=es_ES.UTF-8        LC_COLLATE=es_ES.UTF-8
 [5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=es_ES.UTF-8
 [7] LC_PAPER=es_ES.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
 [1] snpStats_1.22.0            Matrix_1.2-6
 [3] survival_2.39-4            VariantAnnotation_1.18.1
 [5] SummarizedExperiment_1.2.2 Biobase_2.32.0
 [7] ggplot2_2.1.0              Rsamtools_1.24.0
 [9] Biostrings_2.40.1          XVector_0.12.0
[11] GenomicRanges_1.24.1       GenomeInfoDb_1.8.1
[13] IRanges_2.6.0              S4Vectors_0.10.1
[15] BiocGenerics_0.18.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.5             AnnotationDbi_1.34.3    splines_3.3.0
 [4] GenomicAlignments_1.8.1 zlibbioc_1.18.0         munsell_0.4.3
 [7] BiocParallel_1.6.2      lattice_0.20-33         BSgenome_1.40.0
[10] colorspace_1.2-6        plyr_1.8.3              tools_3.3.0
[13] grid_3.3.0              gtable_0.2.0            DBI_0.4-1
[16] rtracklayer_1.32.0      bitops_1.0-6            biomaRt_2.28.0
[19] RCurl_1.95-4.8          RSQLite_1.0.0           GenomicFeatures_1.24.2
[22] scales_0.4.0            XML_3.98-1.4

Bests,

Carlos Ruiz

software error snpstats `ld • 1.3k views
ADD COMMENT

Login before adding your answer.

Traffic: 514 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6