Search
Question: Error in TxDb mm10 coordinates?
0
gravatar for Mark
2.0 years ago by
Mark10
Imperial College London
Mark10 wrote:

Hi, 

The TxDb.Mmusculus.UCSC.mm10.knownGene package appears to be giving me some strange co-ordinates for certain genes, making them huge, e.g. this microRNA which stretches 35 Mb according to the txdb package but 66 bp according to UCSC. Any suggestions/something I'm overlooking?

library(TxDb.Mmusculus.UCSC.mm10.knownGene)
txdb <- TxDb.Mmusculus.UCSC.mm10.knownGene
genes <- genes(txdb)
subset(genes, width(genes) > 35000000)

Output:

GRanges object with 1 range and 1 metadata column:
            seqnames               ranges strand |     gene_id
               <Rle>            <IRanges>  <Rle> | <character>
  102465114    chr19 [24942236, 60774397]      - |   102465114
  -------
  seqinfo: 66 sequences (1 circular) from mm10 genome

 

Session info:

R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] BiocInstaller_1.24.0                    
 [2] TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.0
 [3] GenomicFeatures_1.26.0                  
 [4] AnnotationDbi_1.36.0                    
 [5] Biobase_2.34.0                          
 [6] GenomicRanges_1.26.1                    
 [7] GenomeInfoDb_1.10.1                     
 [8] IRanges_2.8.1                           
 [9] S4Vectors_0.12.0                        
[10] BiocGenerics_0.20.0                     

loaded via a namespace (and not attached):
 [1] XVector_0.14.0             zlibbioc_1.20.0           
 [3] GenomicAlignments_1.10.0   BiocParallel_1.8.1        
 [5] lattice_0.20-33            tools_3.3.1               
 [7] SummarizedExperiment_1.4.0 grid_3.3.1                
 [9] DBI_0.5-1                  Matrix_1.2-7.1            
[11] rtracklayer_1.34.1         bitops_1.0-6              
[13] RCurl_1.95-4.8             biomaRt_2.30.0            
[15] RSQLite_1.1                Biostrings_2.42.0         
[17] Rsamtools_1.26.1           XML_3.98-1.5   

Thanks,

Mark

ADD COMMENTlink modified 2.0 years ago by James W. MacDonald48k • written 2.0 years ago by Mark10
0
gravatar for James W. MacDonald
2.0 years ago by
United States
James W. MacDonald48k wrote:

By definition, the gene extent is the start of the 'first' transcript to the end of the 'last' transcript. For non-coding RNA species, which may be found multiple places on a chromosome, this has the unintended effect of returning a really long gene that doesn't really exist. If you did

txs <- transcriptsBy(TxDb.Mmusculus.UCSC.mm10.knownGene)

txs["102465114"]

You sill see that there are two transcripts for this miRNA, spaced quite far apart on chr19.

ADD COMMENTlink written 2.0 years ago by James W. MacDonald48k

Ah yeah, should have checked that. Thanks!

ADD REPLYlink written 2.0 years ago by Mark10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 160 users visited in the last hour