Search
Question: A bug of "seqlevels<-" on TxDb
0
gravatar for wcstcyx
11 months ago by
wcstcyx30
China/Beijing/AMSS,CAS
wcstcyx30 wrote:

In this tutorial, "seqlevels<-" can be used to set only part of chromosomes to be active, which is very convenient. However, there may be a minor bug about "seqlevels<-" on TxDb objects.

 

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
seqlevels(txdb) <- c("chr1", "chr2") # only set Chromosome 1 and 2 to be active
ls.str(txdb@.xData)
## conn : Formal class 'SQLiteConnection' [package "RSQLite"] with 6 slots
## initFields : Formal class 'refMethodDef' [package "methods"] with 5 slots
## initialize : Formal class 'refMethodDef' [package "methods"] with 5 slots
## isActiveSeq :  logi [1:93] TRUE TRUE TRUE TRUE TRUE TRUE ...
## packageName :  chr "TxDb.Hsapiens.UCSC.hg19.knownGene"
## user_seqlevels :  chr [1:2] "chr1" "chr2"
## user2seqlevels0 :  int [1:2] 1 2
seqlevels(txdb) <- c("chr3", "chr4")
ls.str(txdb@.xData)
## conn : Formal class 'SQLiteConnection' [package "RSQLite"] with 6 slots
## initFields : Formal class 'refMethodDef' [package "methods"] with 5 slots
## initialize : Formal class 'refMethodDef' [package "methods"] with 5 slots
## isActiveSeq :  logi [1:93] TRUE TRUE TRUE TRUE TRUE TRUE ...
## packageName :  chr "TxDb.Hsapiens.UCSC.hg19.knownGene"
## user_seqlevels :  chr [1:2] "chr3" "chr4"
## user2seqlevels0 :  int [1:2] 1 2

transcripts(txdb, filter=list(tx_chrom = "chr1", tx_strand = "+")) # It should not return "chr3"!
## GRanges object with 4073 ranges and 2 metadata columns:
##          seqnames                 ranges strand |     tx_id     tx_name
##             <Rle>              <IRanges>  <Rle> | <integer> <character>
##      [1]     chr3       [ 11874,  14409]      + |         1  uc001aaa.3
##      [2]     chr3       [ 11874,  14409]      + |         2  uc010nxq.1
##      [3]     chr3       [ 11874,  14409]      + |         3  uc010nxr.1
##      [4]     chr3       [ 69091,  70008]      + |         4  uc001aal.1
##      [5]     chr3       [321084, 321115]      + |         5  uc001aaq.2
##      ...      ...                    ...    ... .       ...         ...
##   [4069]     chr3 [249168447, 249168518]      + |      4069  uc021pmg.1
##   [4070]     chr3 [249200442, 249213345]      + |      4070  uc001ifg.3
##   [4071]     chr3 [249200442, 249213345]      + |      4071  uc001ifh.3
##   [4072]     chr3 [249200442, 249213345]      + |      4072  uc009xhd.3
##   [4073]     chr3 [249211537, 249212562]      + |      4073  uc021pmh.1
##   -------
##   seqinfo: 2 sequences from hg19 genome
exons(txdb, filter=list(tx_chrom = "chr1", tx_strand = "+")) # Also wrong
## GRanges object with 13963 ranges and 1 metadata column:
##           seqnames                 ranges strand |   exon_id
##              <Rle>              <IRanges>  <Rle> | <integer>
##       [1]     chr3         [11874, 12227]      + |         1
##       [2]     chr3         [12595, 12721]      + |         2
##       [3]     chr3         [12613, 12721]      + |         3
##       [4]     chr3         [12646, 12697]      + |         4
##       [5]     chr3         [13221, 14409]      + |         5
##       ...      ...                    ...    ... .       ...
##   [13959]     chr3 [249208015, 249208078]      + |     13959
##   [13960]     chr3 [249208638, 249208758]      + |     13960
##   [13961]     chr3 [249210801, 249213345]      + |     13961
##   [13962]     chr3 [249211478, 249213345]      + |     13962
##   [13963]     chr3 [249211537, 249212562]      + |     13963
##   -------
##   seqinfo: 2 sequences from hg19 genome
transcripts(txdb, filter=list(tx_chrom = "chr2", tx_strand = "+")) # It should not return "chr4"!
## GRanges object with 2642 ranges and 2 metadata columns:
##          seqnames                 ranges strand |     tx_id     tx_name
##             <Rle>              <IRanges>  <Rle> | <integer> <character>
##      [1]     chr4       [264869, 272481]      + |      7968  uc002qwd.2
##      [2]     chr4       [264869, 273149]      + |      7969  uc002qwe.5
##      [3]     chr4       [264869, 278282]      + |      7970  uc002qwf.3
##      [4]     chr4       [264869, 278282]      + |      7971  uc002qwg.3
##      [5]     chr4       [264869, 278282]      + |      7972  uc002qwh.3
##      ...      ...                    ...    ... .       ...         ...
##   [2638]     chr4 [243030844, 243102469]      + |     10605  uc010zpd.1
##   [2639]     chr4 [243030844, 243102469]      + |     10606  uc010zpe.1
##   [2640]     chr4 [243030844, 243102469]      + |     10607  uc010zpf.1
##   [2641]     chr4 [243030844, 243102469]      + |     10608  uc010zpg.1
##   [2642]     chr4 [243064438, 243081022]      + |     10609  uc010zph.2
##   -------
##   seqinfo: 2 sequences from hg19 genome
transcripts(txdb, filter=list(tx_chrom = "chr3", tx_strand = "+")) # It should return "chr3"!
## GRanges object with 0 ranges and 2 metadata columns:
##    seqnames    ranges strand |     tx_id     tx_name
##       <Rle> <IRanges>  <Rle> | <integer> <character>
##   -------
##   seqinfo: 2 sequences from hg19 genome
transcripts(txdb, filter=list(tx_chrom = "chr4", tx_strand = "+")) # It should return "chr4"!
## GRanges object with 0 ranges and 2 metadata columns:
##    seqnames    ranges strand |     tx_id     tx_name
##       <Rle> <IRanges>  <Rle> | <integer> <character>
##   -------
##   seqinfo: 2 sequences from hg19 genome
sessionInfo()
## R version 3.3.2 (2016-10-31)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7601) Service Pack 1
## 
## locale:
## [1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 
## [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936   
## [3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
## [4] LC_NUMERIC=C                                                   
## [5] LC_TIME=Chinese (Simplified)_People's Republic of China.936    
## 
## attached base packages:
## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
## [1] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
## [2] GenomicFeatures_1.26.0                 
## [3] AnnotationDbi_1.36.0                   
## [4] Biobase_2.34.0                         
## [5] GenomicRanges_1.26.1                   
## [6] GenomeInfoDb_1.10.1                    
## [7] IRanges_2.8.1                          
## [8] S4Vectors_0.12.1                       
## [9] BiocGenerics_0.20.0                    
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.8                knitr_1.15.1              
##  [3] XVector_0.14.0             magrittr_1.5              
##  [5] GenomicAlignments_1.10.0   zlibbioc_1.20.0           
##  [7] BiocParallel_1.8.1         lattice_0.20-34           
##  [9] stringr_1.1.0              tools_3.3.2               
## [11] grid_3.3.2                 SummarizedExperiment_1.4.0
## [13] DBI_0.5-1                  htmltools_0.3.5           
## [15] yaml_2.1.14                rprojroot_1.1             
## [17] digest_0.6.10              Matrix_1.2-7.1            
## [19] rtracklayer_1.34.1         bitops_1.0-6              
## [21] biomaRt_2.30.0             RCurl_1.95-4.8            
## [23] memoise_1.0.0              evaluate_0.10             
## [25] RSQLite_1.1                rmarkdown_1.2             
## [27] stringi_1.1.2              Rsamtools_1.26.1          
## [29] Biostrings_2.42.1          backports_1.0.4           
## [31] XML_3.98-1.5

 

It seems that user2seqlevels0 in txdb is not right when I use "seqlevels<-" the second time. And is that where the problem comes from?

Thanks in advance.

Can Wang

ADD COMMENTlink modified 11 months ago by Hervé Pagès ♦♦ 13k • written 11 months ago by wcstcyx30
2
gravatar for Hervé Pagès
11 months ago by
Hervé Pagès ♦♦ 13k
United States
Hervé Pagès ♦♦ 13k wrote:

Hi Can Wang,

Thanks for the report. This should be addressed in GenomicFeatures 1.26.1 (release) and 1.27.5 (devel). Unfortunately there is another issue at the moment in GenomicFeatures that breaks makeTxDbFromBiomart() and will prevent the updated GenomicFeatures from propagating to the public repo. The issue is caused by a change to the Ensembl Mart and is not related to the seqlevels() setter. See the build reports for the details:

https://bioconductor.org/checkResults/3.4/bioc-LATEST/GenomicFeatures/malbec1-checksrc.html

https://bioconductor.org/checkResults/3.5/bioc-LATEST/GenomicFeatures/malbec2-checksrc.html

A workaround for now is to install the package directly from svn e.g.

svn co https://hedgehog.fhcrc.org/bioconductor/branches/RELEASE_3_4/madman/Rpacks/GenomicFeatures
R CMD INSTALL GenomicFeatures

For the devel version (i.e. if you're using BioC 3.5 with R 3.4) replace the URL with

https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/GenomicFeatures

Cheers,

H.

ADD COMMENTlink written 11 months ago by Hervé Pagès ♦♦ 13k

Hi Hervé Pagès,

Thanks for your reply.

svn co https://hedgehog.fhcrc.org/bioconductor/branches/RELEASE_3_4/madman/Rpacks/GenomicFeatures
R CMD INSTALL GenomicFeatures

I intalled it by that code and this bug does not appear anymore.

Thanks!

Can

ADD REPLYlink written 11 months ago by wcstcyx30

Hi Hervé Pagès,

Now in GenomicFeatures 1.26.2, the problem has been solved! Thanks!

Can

ADD REPLYlink modified 11 months ago • written 11 months ago by wcstcyx30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 152 users visited in the last hour