Question: Error in matrix
0
gravatar for bkauf
2.6 years ago by
bkauf0
bkauf0 wrote:

Sorry for total newbie question, but searching has turned up nothing.

I am running R3.3.3 with bioconductor on a mac. trying to compile coding variants from a snp files, but I can't figure out the source of error. Here is the script with traceback at end.

Thanks!

Brett

-------

> library(AnnotationHub)
> library(VariantAnnotation)
> # Get Ensembl 85 gene location.
> hub = AnnotationHub()
snapshotDate(): 2016-10-11
> hub = query(hub, c("ensembl", "mus musculus", "gtf"))
> ensembl = hub[["AH51040"]]
Importing File into R ..
loading from cache ‘/Users/brettkaufman//.AnnotationHub/57778’
using guess work to populate seqinfo
> # Sanger VCF file location.
> vcf.file = "~/Box Sync/Lab unfiled data/bioconductor/mgp.v5.merged.snps_all.dbSNP142.vcf.gz"
> # Pick a set of your favorite genes.
> target.genes = c("Apob", "Dicer1", "Serpina1a")
> vcf.header = scanVcfHeader(vcf.file)
> strains = samples(vcf.header)[c(5, 2, 26, 28, 16, 30, 35)]
> results = NULL
> for(i in 1:length(target.genes)) {
+   print(paste(i, "of", length(target.genes)))
+   gene.loc = ensembl[ensembl$gene_name == target.genes[i] & ensembl$type == "gene"]
+   if(length(gene.loc) == 0) {
+     stop(paste("Gene", target.genes[i], "not found."))
+   }
+   param = ScanVcfParam(geno = "GT", samples = strains,
+           which = GRanges(seqnames = seqnames(gene.loc)[1], ranges =
+           IRanges(start = start(gene.loc)[1], end = end(gene.loc)[1])))
+   snps = readVcf(file = vcf.file, genome = "mm10", param = param)
+   csq = as.list(info(snps)$CSQ)
+   keep = lapply(csq, function(z) { grep("missense|stop_lost|stop_gained", z) })
+   keep = which(sapply(keep, length) > 0)
+   snps = snps[keep]
+   snps = genotypeCodesToNucleotides(snps)
+   results = rbind(results, cbind(as.data.frame(rowRanges(snps)), geno(snps)$GT))
+ } # for(i)
[1] "1 of 3"
Error in matrix(unlist(GTstr), ncol = 2, byrow = TRUE) : 
  'data' must be of a vector type, was 'NULL'
> write.csv(results, file = "missense_snps.csv", quote = F, row.names = F)
> traceback()
3: matrix(unlist(GTstr), ncol = 2, byrow = TRUE)
2: .geno2geno(NULL, ALT, REF, GT)
1: genotypeCodesToNucleotides(snps) at #15
> 

 

 

vcf bioconductor • 497 views
ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by bkauf0

It would be good to include the output of sessionInfo() in your post so people can see exactly what version of the packages you're using.  Currently I'm unable to replicate this error with your code.  The only difference I made is to use a remote version of the VCF file since I don't want to download 21GB e.g.

vcf.file <- "ftp://ftp-mouse.sanger.ac.uk/REL-1505-SNPs_Indels/mgp.v5.merged.snps_all.dbSNP142.vcf.gz"

Here's my sessionInfo(). You can compared it with yours to see if there are any obvious package version differences:

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 18.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] rtracklayer_1.34.2         VariantAnnotation_1.20.3  
 [3] Rsamtools_1.26.1           Biostrings_2.42.1         
 [5] XVector_0.14.1             SummarizedExperiment_1.4.0
 [7] Biobase_2.34.0             GenomicRanges_1.26.4      
 [9] GenomeInfoDb_1.10.3        IRanges_2.8.2             
[11] S4Vectors_0.12.2           AnnotationHub_2.6.5       
[13] BiocGenerics_0.20.0       

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.10                  BiocInstaller_1.24.0         
 [3] GenomicFeatures_1.26.3        bitops_1.0-6                 
 [5] tools_3.3.1                   zlibbioc_1.20.0              
 [7] biomaRt_2.30.0                digest_0.6.12                
 [9] BSgenome_1.42.0               RSQLite_1.1-2                
[11] memoise_1.0.0                 lattice_0.20-35              
[13] Matrix_1.2-8                  shiny_1.0.0                  
[15] DBI_0.6                       curl_2.4                     
[17] yaml_2.1.14                   httr_1.2.1                   
[19] grid_3.3.1                    R6_2.2.0                     
[21] AnnotationDbi_1.36.2          XML_3.98-1.6                 
[23] BiocParallel_1.8.1            GenomicAlignments_1.10.1     
[25] htmltools_0.3.5               mime_0.5                     
[27] interactiveDisplayBase_1.12.0 xtable_1.8-2                 
[29] httpuv_1.3.3                  RCurl_1.95-4.8   
ADD REPLYlink written 2.6 years ago by Mike Smith4.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 175 users visited in the last hour