Question: Error in TxDb mm10 coordinates?
2.4 years ago by
Imperial College London
Mark10 wrote:


The TxDb.Mmusculus.UCSC.mm10.knownGene package appears to be giving me some strange co-ordinates for certain genes, making them huge, e.g. this microRNA which stretches 35 Mb according to the txdb package but 66 bp according to UCSC. Any suggestions/something I'm overlooking?

txdb <- TxDb.Mmusculus.UCSC.mm10.knownGene
genes <- genes(txdb)
subset(genes, width(genes) > 35000000)


GRanges object with 1 range and 1 metadata column:
            seqnames               ranges strand |     gene_id
               <Rle>            <IRanges>  <Rle> | <character>
  102465114    chr19 [24942236, 60774397]      - |   102465114
  seqinfo: 66 sequences (1 circular) from mm10 genome


Session info:

R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS

 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] BiocInstaller_1.24.0                    
 [2] TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.0
 [3] GenomicFeatures_1.26.0                  
 [4] AnnotationDbi_1.36.0                    
 [5] Biobase_2.34.0                          
 [6] GenomicRanges_1.26.1                    
 [7] GenomeInfoDb_1.10.1                     
 [8] IRanges_2.8.1                           
 [9] S4Vectors_0.12.0                        
[10] BiocGenerics_0.20.0                     

loaded via a namespace (and not attached):
 [1] XVector_0.14.0             zlibbioc_1.20.0           
 [3] GenomicAlignments_1.10.0   BiocParallel_1.8.1        
 [5] lattice_0.20-33            tools_3.3.1               
 [7] SummarizedExperiment_1.4.0 grid_3.3.1                
 [9] DBI_0.5-1                  Matrix_1.2-7.1            
[11] rtracklayer_1.34.1         bitops_1.0-6              
[13] RCurl_1.95-4.8             biomaRt_2.30.0            
[15] RSQLite_1.1                Biostrings_2.42.0         
[17] Rsamtools_1.26.1           XML_3.98-1.5   



ADD COMMENTlink modified 2.4 years ago by James W. MacDonald49k • written 2.4 years ago by Mark10
Answer: Error in TxDb mm10 coordinates?
2.4 years ago by
United States
James W. MacDonald49k wrote:

By definition, the gene extent is the start of the 'first' transcript to the end of the 'last' transcript. For non-coding RNA species, which may be found multiple places on a chromosome, this has the unintended effect of returning a really long gene that doesn't really exist. If you did

txs <- transcriptsBy(TxDb.Mmusculus.UCSC.mm10.knownGene)


You sill see that there are two transcripts for this miRNA, spaced quite far apart on chr19.

ADD COMMENTlink written 2.4 years ago by James W. MacDonald49k

Ah yeah, should have checked that. Thanks!

ADD REPLYlink written 2.4 years ago by Mark10
