Search
Question: Error in TxDb mm10 coordinates?
0
gravatar for Mark
12 months ago by
Mark10
Imperial College London
Mark10 wrote:

Hi, 

The TxDb.Mmusculus.UCSC.mm10.knownGene package appears to be giving me some strange co-ordinates for certain genes, making them huge, e.g. this microRNA which stretches 35 Mb according to the txdb package but 66 bp according to UCSC. Any suggestions/something I'm overlooking?

library(TxDb.Mmusculus.UCSC.mm10.knownGene)
txdb <- TxDb.Mmusculus.UCSC.mm10.knownGene
genes <- genes(txdb)
subset(genes, width(genes) > 35000000)

Output:

GRanges object with 1 range and 1 metadata column:
            seqnames               ranges strand |     gene_id
               <Rle>            <IRanges>  <Rle> | <character>
  102465114    chr19 [24942236, 60774397]      - |   102465114
  -------
  seqinfo: 66 sequences (1 circular) from mm10 genome

 

Session info:

R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] BiocInstaller_1.24.0                    
 [2] TxDb.Mmusculus.UCSC.mm10.knownGene_3.4.0
 [3] GenomicFeatures_1.26.0                  
 [4] AnnotationDbi_1.36.0                    
 [5] Biobase_2.34.0                          
 [6] GenomicRanges_1.26.1                    
 [7] GenomeInfoDb_1.10.1                     
 [8] IRanges_2.8.1                           
 [9] S4Vectors_0.12.0                        
[10] BiocGenerics_0.20.0                     

loaded via a namespace (and not attached):
 [1] XVector_0.14.0             zlibbioc_1.20.0           
 [3] GenomicAlignments_1.10.0   BiocParallel_1.8.1        
 [5] lattice_0.20-33            tools_3.3.1               
 [7] SummarizedExperiment_1.4.0 grid_3.3.1                
 [9] DBI_0.5-1                  Matrix_1.2-7.1            
[11] rtracklayer_1.34.1         bitops_1.0-6              
[13] RCurl_1.95-4.8             biomaRt_2.30.0            
[15] RSQLite_1.1                Biostrings_2.42.0         
[17] Rsamtools_1.26.1           XML_3.98-1.5   

Thanks,

Mark

ADD COMMENTlink modified 12 months ago by James W. MacDonald45k • written 12 months ago by Mark10
0
gravatar for James W. MacDonald
12 months ago by
United States
James W. MacDonald45k wrote:

By definition, the gene extent is the start of the 'first' transcript to the end of the 'last' transcript. For non-coding RNA species, which may be found multiple places on a chromosome, this has the unintended effect of returning a really long gene that doesn't really exist. If you did

txs <- transcriptsBy(TxDb.Mmusculus.UCSC.mm10.knownGene)

txs["102465114"]

You sill see that there are two transcripts for this miRNA, spaced quite far apart on chr19.

ADD COMMENTlink written 12 months ago by James W. MacDonald45k

Ah yeah, should have checked that. Thanks!

ADD REPLYlink written 12 months ago by Mark10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 179 users visited in the last hour