predictCoding warning message: serious or not?
0
0
Entering edit mode
longsong • 0
@longsong-20032
Last seen 5.0 years ago

I am doing some gene annotation. My code is very similar to the code from https://www.rdocumentation.org/packages/VariantAnnotation/versions/1.18.5/topics/predictCoding as following

library(VariantAnnotation)
library(BSgenome.Hsapiens.UCSC.hg19)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene 

## ----------------------------
## VCF object as query 
## ----------------------------
## Read variants from a VCF file 
fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")
vcf <- readVcf(fl, "hg19")

## Rename seqlevels in the VCF object to match those in the TxDb.
vcf <- renameSeqlevels(vcf, "chr22")
## Confirm common seqlevels
intersect(seqlevels(vcf), seqlevels(txdb))

## When 'query' is a VCF object the varAllele argument is missing.
coding1 <- predictCoding(vcf, txdb, Hsapiens)
Warning message:
In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) :
  GRanges object contains 2405 out-of-bound ranges located on sequences 75253, 74357, 74359, 74360, 74361, 74362, 74363,
  74358, 74364, 74365, 75254, 75259, 74368, 74369, 74366, 74367, 74370, 74372, 74373, 74374, 74375, 74378, 74377, 74380,
  74381, 75262, 75263, 75265, 75266, 75268, 75269, 75271, 75273, 75276, 75281, 75282, 75283, 74389, 74383, 74384, 74385,
  74386, 74387, 75287, 75288, 75286, 75289, 74390, 74391, 74392, 74393, 74394, 75291, 74395, 74396, 74397, 74398, 75302,
  75304, 75305, and 75306. Note that ranges located on a sequence whose length is unknown (NA) or on a circular sequence are
  not considered out-of-bound (use seqlengths() and isCircular() to get the lengths and circularity flags of the underlying
  sequences). You can use trim() to trim these ranges. See ?`trim,GenomicRanges-method` for more information.

> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252    LC_MONETARY=English_Australia.1252
[4] LC_NUMERIC=C                       LC_TIME=English_Australia.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 GenomicFeatures_1.34.3                  AnnotationDbi_1.44.0                   
 [4] BSgenome.Hsapiens.UCSC.hg19_1.4.0       BSgenome_1.50.0                         rtracklayer_1.42.1                     
 [7] VariantAnnotation_1.28.11               Rsamtools_1.34.1                        Biostrings_2.50.2                      
[10] XVector_0.22.0                          SummarizedExperiment_1.12.0             DelayedArray_0.8.0                     
[13] BiocParallel_1.16.6                     matrixStats_0.54.0                      Biobase_2.42.0                         
[16] GenomicRanges_1.34.0                    GenomeInfoDb_1.18.2                     IRanges_2.16.0                         
[19] S4Vectors_0.20.1                        BiocGenerics_0.28.0                    

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.0               compiler_3.5.2           prettyunits_1.0.2        bitops_1.0-6             tools_3.5.2             
 [6] zlibbioc_1.28.0          progress_1.2.0           biomaRt_2.38.0           digest_0.6.18            bit_1.1-14              
[11] RSQLite_2.1.1            memoise_1.1.0            lattice_0.20-38          pkgconfig_2.0.2          rlang_0.3.1             
[16] Matrix_1.2-15            DBI_1.0.0                rstudioapi_0.9.0         yaml_2.2.0               GenomeInfoDbData_1.2.0  
[21] httr_1.4.0               stringr_1.4.0            hms_0.4.2                bit64_0.9-7              grid_3.5.2              
[26] R6_2.4.0                 XML_3.98-1.17            magrittr_1.5             blob_1.1.1               GenomicAlignments_1.18.1
[31] assertthat_0.2.0         stringi_1.3.1            RCurl_1.95-4.11          crayon_1.3.4

My own code has the following warning message:

Warning messages:
1: In valid.GenomicRanges.seqinfo(x, suggest.trim = TRUE) :
  GRanges object contains 88121 out-of-bound ranges located on sequences
  1, 529, 3, 532, 7, 8, 9, 534, 536, 537, 14, 541, 544, 20, 21, 546, 22,
  549, 550, 23, 24, 551, 26, 27, 555, 28, 556, 558, 559, 561, 562, 564,
  33, 567, 568, 35, 572, 573, 574, 575, 39, 40, 578, 44, 45, 583, 584,
  46, 587, 49, 50, 51, 589, 52, 53, 591, 592, 56, 596, 597, 598, 600, 57,
  601, 58, 603, 604, 606, 60, 609, 610, 611, 62, 614, 615, 64, 617, 66,
  618, 71, 72, 625, 627, 628, 629, 74, 630, 631, 633, 75, 634, 635, 637,
  76, 638, 641, 79, 82, 645, 648, 85, 653, 91, 93, 655, 94, 656, 96, 97,
  661, 98, 99, 662, 663, 665, 103, 666, 671, 673, 674, 678, 113, 682,
  118, 119, 120, 121, 124, 126, 683, 128, 684, 130, 135, 685, 136, 137,
  138, 693, 144, 145, 695, 146, 147, 696, 149, 152, 155, 156, 701, 702,
  705, 159, 708, 161, 712, 713, 165, 166, 167, 716, 169, 717, 720, 721,
  170, 173, 722, 176, 177, 724, 178, 182, 184, 185, 732, 186, 733, 187,
  188, 189, 190, 739, 194, 742, 195, 197, 743, 200, 201, 748, 7 [... truncated]
2: In .predictCodingGRangesList(query, cache[["cdsbytx"]], seqSource,  :
  records with missing 'varAllele' were ignored
3: In .Call2("DNAStringSet_translate", x, skip_code, dna_codes[codon_alphabet],  :
  in 'x[[158215]]': last base was ignored
4: In .Call2("DNAStringSet_translate", x, skip_code, dna_codes[codon_alphabet],  :
  in 'x[[158215]]': last base was ignored

There is an old answer about this issue: https://support.bioconductor.org/p/62376/ But it seems not enough.

I am a newbie of Bioconductor. I have two concerns about my warning message:

(1) There are so many out-of-bound ranges (88121). I worry about these ranges. If I can locate some of them, then I will understand whether the warning message is serious or not. How to manually check where those out-of-bound ranges are, from the three objects vcf, txdb, Hsapiens in predictCoding(vcf, txdb, Hsapiens)?

(2) My own predictCoding has another more warnings (2, 3, 4). What should I do with them?

software error annotation VariantAnnotation • 783 views
ADD COMMENT

Login before adding your answer.

Traffic: 687 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6