Hi,
The TxDb.Mmusculus.UCSC.mm10.knownGene package appears to be giving me some strange co-ordinates for certain genes, making them huge, e.g. this microRNA which stretches 35 Mb according to the txdb package but 66 bp according to UCSC. Any suggestions/something I'm overlooking?
library(TxDb.Mmusculus.UCSC.mm10.knownGene) txdb <- TxDb.Mmusculus.UCSC.mm10.knownGene genes <- genes(txdb) subset(genes, width(genes) > 35000000)
Output:
GRanges object with 1 range and 1 metadata column: seqnames ranges strand | gene_id <Rle> <IRanges> <Rle> | <character> 102465114 chr19 [24942236, 60774397] - | 102465114 ------- seqinfo: 66 sequences (1 circular) from mm10 genome
Session info:
R version 3.3.1 (2016-06-21) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.1 LTS locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] BiocInstaller_1.24.0 [2] TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.0 [3] GenomicFeatures_1.26.0 [4] AnnotationDbi_1.36.0 [5] Biobase_2.34.0 [6] GenomicRanges_1.26.1 [7] GenomeInfoDb_1.10.1 [8] IRanges_2.8.1 [9] S4Vectors_0.12.0 [10] BiocGenerics_0.20.0 loaded via a namespace (and not attached): [1] XVector_0.14.0 zlibbioc_1.20.0 [3] GenomicAlignments_1.10.0 BiocParallel_1.8.1 [5] lattice_0.20-33 tools_3.3.1 [7] SummarizedExperiment_1.4.0 grid_3.3.1 [9] DBI_0.5-1 Matrix_1.2-7.1 [11] rtracklayer_1.34.1 bitops_1.0-6 [13] RCurl_1.95-4.8 biomaRt_2.30.0 [15] RSQLite_1.1 Biostrings_2.42.0 [17] Rsamtools_1.26.1 XML_3.98-1.5
Thanks,
Mark
Ah yeah, should have checked that. Thanks!