Error in matrix
0
0
Entering edit mode
bkauf • 0
@bkauf-12726
Last seen 7.1 years ago

Sorry for total newbie question, but searching has turned up nothing.

I am running R3.3.3 with bioconductor on a mac. trying to compile coding variants from a snp files, but I can't figure out the source of error. Here is the script with traceback at end.

Thanks!

Brett

-------

> library(AnnotationHub)
> library(VariantAnnotation)
> # Get Ensembl 85 gene location.
> hub = AnnotationHub()
snapshotDate(): 2016-10-11
> hub = query(hub, c("ensembl", "mus musculus", "gtf"))
> ensembl = hub[["AH51040"]]
Importing File into R ..
loading from cache ‘/Users/brettkaufman//.AnnotationHub/57778’
using guess work to populate seqinfo
> # Sanger VCF file location.
> vcf.file = "~/Box Sync/Lab unfiled data/bioconductor/mgp.v5.merged.snps_all.dbSNP142.vcf.gz"
> # Pick a set of your favorite genes.
> target.genes = c("Apob", "Dicer1", "Serpina1a")
> vcf.header = scanVcfHeader(vcf.file)
> strains = samples(vcf.header)[c(5, 2, 26, 28, 16, 30, 35)]
> results = NULL
> for(i in 1:length(target.genes)) {
+   print(paste(i, "of", length(target.genes)))
+   gene.loc = ensembl[ensembl$gene_name == target.genes[i] & ensembl$type == "gene"]
+   if(length(gene.loc) == 0) {
+     stop(paste("Gene", target.genes[i], "not found."))
+   }
+   param = ScanVcfParam(geno = "GT", samples = strains,
+           which = GRanges(seqnames = seqnames(gene.loc)[1], ranges =
+           IRanges(start = start(gene.loc)[1], end = end(gene.loc)[1])))
+   snps = readVcf(file = vcf.file, genome = "mm10", param = param)
+   csq = as.list(info(snps)$CSQ)
+   keep = lapply(csq, function(z) { grep("missense|stop_lost|stop_gained", z) })
+   keep = which(sapply(keep, length) > 0)
+   snps = snps[keep]
+   snps = genotypeCodesToNucleotides(snps)
+   results = rbind(results, cbind(as.data.frame(rowRanges(snps)), geno(snps)$GT))
+ } # for(i)
[1] "1 of 3"
Error in matrix(unlist(GTstr), ncol = 2, byrow = TRUE) : 
  'data' must be of a vector type, was 'NULL'
> write.csv(results, file = "missense_snps.csv", quote = F, row.names = F)
> traceback()
3: matrix(unlist(GTstr), ncol = 2, byrow = TRUE)
2: .geno2geno(NULL, ALT, REF, GT)
1: genotypeCodesToNucleotides(snps) at #15
> 

 

 

bioconductor vcf • 1.1k views
ADD COMMENT
0
Entering edit mode

It would be good to include the output of sessionInfo() in your post so people can see exactly what version of the packages you're using.  Currently I'm unable to replicate this error with your code.  The only difference I made is to use a remote version of the VCF file since I don't want to download 21GB e.g.

vcf.file <- "ftp://ftp-mouse.sanger.ac.uk/REL-1505-SNPs_Indels/mgp.v5.merged.snps_all.dbSNP142.vcf.gz"

Here's my sessionInfo(). You can compared it with yours to see if there are any obvious package version differences:

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 18.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] rtracklayer_1.34.2         VariantAnnotation_1.20.3  
 [3] Rsamtools_1.26.1           Biostrings_2.42.1         
 [5] XVector_0.14.1             SummarizedExperiment_1.4.0
 [7] Biobase_2.34.0             GenomicRanges_1.26.4      
 [9] GenomeInfoDb_1.10.3        IRanges_2.8.2             
[11] S4Vectors_0.12.2           AnnotationHub_2.6.5       
[13] BiocGenerics_0.20.0       

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.10                  BiocInstaller_1.24.0         
 [3] GenomicFeatures_1.26.3        bitops_1.0-6                 
 [5] tools_3.3.1                   zlibbioc_1.20.0              
 [7] biomaRt_2.30.0                digest_0.6.12                
 [9] BSgenome_1.42.0               RSQLite_1.1-2                
[11] memoise_1.0.0                 lattice_0.20-35              
[13] Matrix_1.2-8                  shiny_1.0.0                  
[15] DBI_0.6                       curl_2.4                     
[17] yaml_2.1.14                   httr_1.2.1                   
[19] grid_3.3.1                    R6_2.2.0                     
[21] AnnotationDbi_1.36.2          XML_3.98-1.6                 
[23] BiocParallel_1.8.1            GenomicAlignments_1.10.1     
[25] htmltools_0.3.5               mime_0.5                     
[27] interactiveDisplayBase_1.12.0 xtable_1.8-2                 
[29] httpuv_1.3.3                  RCurl_1.95-4.8   
ADD REPLY

Login before adding your answer.

Traffic: 755 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6