A bug of "seqlevels<-" on TxDb
1
0
Entering edit mode
wcstcyx ▴ 30
@wcstcyx-11636
Last seen 6.8 years ago
China/Beijing/AMSS,CAS

In this tutorial, "seqlevels<-" can be used to set only part of chromosomes to be active, which is very convenient. However, there may be a minor bug about "seqlevels<-" on TxDb objects.

 

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
seqlevels(txdb) <- c("chr1", "chr2") # only set Chromosome 1 and 2 to be active
ls.str(txdb@.xData)
## conn : Formal class 'SQLiteConnection' [package "RSQLite"] with 6 slots
## initFields : Formal class 'refMethodDef' [package "methods"] with 5 slots
## initialize : Formal class 'refMethodDef' [package "methods"] with 5 slots
## isActiveSeq :  logi [1:93] TRUE TRUE TRUE TRUE TRUE TRUE ...
## packageName :  chr "TxDb.Hsapiens.UCSC.hg19.knownGene"
## user_seqlevels :  chr [1:2] "chr1" "chr2"
## user2seqlevels0 :  int [1:2] 1 2
seqlevels(txdb) <- c("chr3", "chr4")
ls.str(txdb@.xData)
## conn : Formal class 'SQLiteConnection' [package "RSQLite"] with 6 slots
## initFields : Formal class 'refMethodDef' [package "methods"] with 5 slots
## initialize : Formal class 'refMethodDef' [package "methods"] with 5 slots
## isActiveSeq :  logi [1:93] TRUE TRUE TRUE TRUE TRUE TRUE ...
## packageName :  chr "TxDb.Hsapiens.UCSC.hg19.knownGene"
## user_seqlevels :  chr [1:2] "chr3" "chr4"
## user2seqlevels0 :  int [1:2] 1 2

transcripts(txdb, filter=list(tx_chrom = "chr1", tx_strand = "+")) # It should not return "chr3"!
## GRanges object with 4073 ranges and 2 metadata columns:
##          seqnames                 ranges strand |     tx_id     tx_name
##             <Rle>              <IRanges>  <Rle> | <integer> <character>
##      [1]     chr3       [ 11874,  14409]      + |         1  uc001aaa.3
##      [2]     chr3       [ 11874,  14409]      + |         2  uc010nxq.1
##      [3]     chr3       [ 11874,  14409]      + |         3  uc010nxr.1
##      [4]     chr3       [ 69091,  70008]      + |         4  uc001aal.1
##      [5]     chr3       [321084, 321115]      + |         5  uc001aaq.2
##      ...      ...                    ...    ... .       ...         ...
##   [4069]     chr3 [249168447, 249168518]      + |      4069  uc021pmg.1
##   [4070]     chr3 [249200442, 249213345]      + |      4070  uc001ifg.3
##   [4071]     chr3 [249200442, 249213345]      + |      4071  uc001ifh.3
##   [4072]     chr3 [249200442, 249213345]      + |      4072  uc009xhd.3
##   [4073]     chr3 [249211537, 249212562]      + |      4073  uc021pmh.1
##   -------
##   seqinfo: 2 sequences from hg19 genome
exons(txdb, filter=list(tx_chrom = "chr1", tx_strand = "+")) # Also wrong
## GRanges object with 13963 ranges and 1 metadata column:
##           seqnames                 ranges strand |   exon_id
##              <Rle>              <IRanges>  <Rle> | <integer>
##       [1]     chr3         [11874, 12227]      + |         1
##       [2]     chr3         [12595, 12721]      + |         2
##       [3]     chr3         [12613, 12721]      + |         3
##       [4]     chr3         [12646, 12697]      + |         4
##       [5]     chr3         [13221, 14409]      + |         5
##       ...      ...                    ...    ... .       ...
##   [13959]     chr3 [249208015, 249208078]      + |     13959
##   [13960]     chr3 [249208638, 249208758]      + |     13960
##   [13961]     chr3 [249210801, 249213345]      + |     13961
##   [13962]     chr3 [249211478, 249213345]      + |     13962
##   [13963]     chr3 [249211537, 249212562]      + |     13963
##   -------
##   seqinfo: 2 sequences from hg19 genome
transcripts(txdb, filter=list(tx_chrom = "chr2", tx_strand = "+")) # It should not return "chr4"!
## GRanges object with 2642 ranges and 2 metadata columns:
##          seqnames                 ranges strand |     tx_id     tx_name
##             <Rle>              <IRanges>  <Rle> | <integer> <character>
##      [1]     chr4       [264869, 272481]      + |      7968  uc002qwd.2
##      [2]     chr4       [264869, 273149]      + |      7969  uc002qwe.5
##      [3]     chr4       [264869, 278282]      + |      7970  uc002qwf.3
##      [4]     chr4       [264869, 278282]      + |      7971  uc002qwg.3
##      [5]     chr4       [264869, 278282]      + |      7972  uc002qwh.3
##      ...      ...                    ...    ... .       ...         ...
##   [2638]     chr4 [243030844, 243102469]      + |     10605  uc010zpd.1
##   [2639]     chr4 [243030844, 243102469]      + |     10606  uc010zpe.1
##   [2640]     chr4 [243030844, 243102469]      + |     10607  uc010zpf.1
##   [2641]     chr4 [243030844, 243102469]      + |     10608  uc010zpg.1
##   [2642]     chr4 [243064438, 243081022]      + |     10609  uc010zph.2
##   -------
##   seqinfo: 2 sequences from hg19 genome
transcripts(txdb, filter=list(tx_chrom = "chr3", tx_strand = "+")) # It should return "chr3"!
## GRanges object with 0 ranges and 2 metadata columns:
##    seqnames    ranges strand |     tx_id     tx_name
##       <Rle> <IRanges>  <Rle> | <integer> <character>
##   -------
##   seqinfo: 2 sequences from hg19 genome
transcripts(txdb, filter=list(tx_chrom = "chr4", tx_strand = "+")) # It should return "chr4"!
## GRanges object with 0 ranges and 2 metadata columns:
##    seqnames    ranges strand |     tx_id     tx_name
##       <Rle> <IRanges>  <Rle> | <integer> <character>
##   -------
##   seqinfo: 2 sequences from hg19 genome
sessionInfo()
## R version 3.3.2 (2016-10-31)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7601) Service Pack 1
## 
## locale:
## [1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 
## [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936   
## [3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
## [4] LC_NUMERIC=C                                                   
## [5] LC_TIME=Chinese (Simplified)_People's Republic of China.936    
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
## [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
## [2] GenomicFeatures_1.26.0                 
## [3] AnnotationDbi_1.36.0                   
## [4] Biobase_2.34.0                         
## [5] GenomicRanges_1.26.1                   
## [6] GenomeInfoDb_1.10.1                    
## [7] IRanges_2.8.1                          
## [8] S4Vectors_0.12.1                       
## [9] BiocGenerics_0.20.0                    
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.8                knitr_1.15.1              
##  [3] XVector_0.14.0             magrittr_1.5              
##  [5] GenomicAlignments_1.10.0   zlibbioc_1.20.0           
##  [7] BiocParallel_1.8.1         lattice_0.20-34           
##  [9] stringr_1.1.0              tools_3.3.2               
## [11] grid_3.3.2                 SummarizedExperiment_1.4.0
## [13] DBI_0.5-1                  htmltools_0.3.5           
## [15] yaml_2.1.14                rprojroot_1.1             
## [17] digest_0.6.10              Matrix_1.2-7.1            
## [19] rtracklayer_1.34.1         bitops_1.0-6              
## [21] biomaRt_2.30.0             RCurl_1.95-4.8            
## [23] memoise_1.0.0              evaluate_0.10             
## [25] RSQLite_1.1                rmarkdown_1.2             
## [27] stringi_1.1.2              Rsamtools_1.26.1          
## [29] Biostrings_2.42.1          backports_1.0.4           
## [31] XML_3.98-1.5

 

It seems that user2seqlevels0 in txdb is not right when I use "seqlevels<-" the second time. And is that where the problem comes from?

Thanks in advance.

Can Wang

txdb.hsapiens.ucsc.hg19.knowngene seqlevels bugs • 1.7k views
ADD COMMENT
2
Entering edit mode
@herve-pages-1542
Last seen 30 minutes ago
Seattle, WA, United States

Hi Can Wang,

Thanks for the report. This should be addressed in GenomicFeatures 1.26.1 (release) and 1.27.5 (devel). Unfortunately there is another issue at the moment in GenomicFeatures that breaks makeTxDbFromBiomart() and will prevent the updated GenomicFeatures from propagating to the public repo. The issue is caused by a change to the Ensembl Mart and is not related to the seqlevels() setter. See the build reports for the details:

https://bioconductor.org/checkResults/3.4/bioc-LATEST/GenomicFeatures/malbec1-checksrc.html

https://bioconductor.org/checkResults/3.5/bioc-LATEST/GenomicFeatures/malbec2-checksrc.html

A workaround for now is to install the package directly from svn e.g.

svn co https://hedgehog.fhcrc.org/bioconductor/branches/RELEASE_3_4/madman/Rpacks/GenomicFeatures
R CMD INSTALL GenomicFeatures

For the devel version (i.e. if you're using BioC 3.5 with R 3.4) replace the URL with

https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/GenomicFeatures

Cheers,

H.

ADD COMMENT
0
Entering edit mode

Hi Hervé Pagès,

Thanks for your reply.

svn co https://hedgehog.fhcrc.org/bioconductor/branches/RELEASE_3_4/madman/Rpacks/GenomicFeatures
R CMD INSTALL GenomicFeatures

I intalled it by that code and this bug does not appear anymore.

Thanks!

Can

ADD REPLY
0
Entering edit mode

Hi Hervé Pagès,

Now in GenomicFeatures 1.26.2, the problem has been solved! Thanks!

Can

ADD REPLY

Login before adding your answer.

Traffic: 756 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6