Importing genbank entries for synthetic constructs
Dear Gabe,

thanks a lot for making the genbankr package available. Today, I tried to parse a genbank entry for a synthetic DNA molecule, e.g. KR709867.1.

Importing this file by accession failed:

id = GBAccession("KR709867.1")

Error in .normargIsCircular(isCircular, seqnames) : 
  length of supplied 'isCircular' must equal the number of sequences

I traced the error to the make_gbrecord function, which raises the error in the following line:

sqinfo = Seqinfo(seqlevels(srcs), width(srcs), circ, genom)

because the srcs GRanges object contains 2 ranges:

Ranges object with 2 ranges and 9 metadata columns: seqnames ranges strand | type organism <Rle> <IRanges> <Rle> | <character> <character> [1] synthetic construct:1 [ 1, 1311] + | source synthetic construct [2] Homo sapiens:2 [66, 1244] + | source Homo sapiens mol_type db_xref clone focus <character> <CharacterList> <character> <logical> [1] other DNA taxon:32630 CCSBHm_00007040 TRUE [2] other DNA taxon:9606 <NA> FALSE note loctype <character> <character> [1] vector:pDONR223; derived from parent clone GenBankaccession: KJ897694 normal [2] <NA> normal temp_grouping_id <integer> [1] 1 [2] 2 ------- seqinfo: 2 sequences from an unspecified genome; no seqlengths

Are you intending the genbankr package to support synthetic constructs (plasmids, clones, etc)? If so, maybe you want to take a look at this example.



> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.1

[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] genbankr_1.2.0       BiocInstaller_1.24.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.8                AnnotationDbi_1.36.0       XVector_0.14.0            
 [4] GenomicAlignments_1.10.0   GenomicRanges_1.26.1       BiocGenerics_0.20.0       
 [7] zlibbioc_1.20.0            IRanges_2.8.1              BiocParallel_1.8.1        
[10] BSgenome_1.42.0            lattice_0.20-34            R6_2.2.0                  
[13] httr_1.2.1                 rentrez_1.0.4              GenomeInfoDb_1.10.1       
[16] tools_3.3.2                grid_3.3.2                 SummarizedExperiment_1.4.0
[19] parallel_3.3.2             Biobase_2.34.0             DBI_0.5-1                 
[22] digest_0.6.10              Matrix_1.2-7.1             rtracklayer_1.34.1        
[25] S4Vectors_0.12.1           bitops_1.0-6               curl_2.3                  
[28] RCurl_1.95-4.8             biomaRt_2.30.0             memoise_1.0.0             
[31] RSQLite_1.1                GenomicFeatures_1.26.0     Biostrings_2.42.1         
[34] Rsamtools_1.26.1           stats4_3.3.2               XML_3.98-1.5              
[37] jsonlite_1.1               VariantAnnotation_1.20.2  


genbankr • 1.6k views
Thanks for the report. Ideally, genbankr is intended to support anything (reasonable) provided in the GenBank (or GenPept) format. I will look into this and should be able to get it fixed. 

I will comment again when there is news.




