How to specify transcript_ids in makeTxDbFromUCSC function in GenomicFeatures package?
rbacher ▴ 20
Last seen 2.8 years ago

I would like to create a TxDb object restricted to only a specific set of transcripts. However, when I set the transcript_ids, it doesn't appear to have what I assumed would be the intended effect. Instead of the TxDb object containing only with information for the given transcript, it appears to return everything.

Am I specifying the transcript_ids incorrectly? Or is there a better way to do this?

Any advice is appreciated!


> transcript_ids <- c("uc001aaa.3")

> txdbTry <- makeTxDbFromUCSC(genome="hg19", tablename="knownGene", transcript_ids=transcript_ids)
> metadata(txdbTry)

   transcript_nrow                                        82960​

> length(keys(txdbTry))
[1] 23459

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.1

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

attached base packages:

[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ChIPseeker_1.15.2                       devtools_1.13.4                         biomaRt_2.35.1                         
 [4] limma_3.34.3                                        TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
 [7] GenomicFeatures_1.30.0                  AnnotationDbi_1.40.0                    Biobase_2.38.0                         
[10] GenomicRanges_1.30.0                    GenomeInfoDb_1.14.0                     IRanges_2.12.0                         
[13] S4Vectors_0.16.0                        BiocGenerics_0.24.0                     EBSeq_1.18.0                           
[16] testthat_1.0.2                          gplots_3.0.1                            blockmodeling_0.1.9                    
[19] BiocInstaller_1.28.0                   

loaded via a namespace (and not attached):
 [1] bitops_1.0-6               matrixStats_0.52.2         bit64_0.9-7                httr_1.3.1                 RColorBrewer_1.1-2        
 [6] progress_1.1.2             UpSetR_1.3.3               tools_3.4.3                R6_2.2.2                   KernSmooth_2.23-15        
[11] DBI_0.7                    lazyeval_0.2.1             colorspace_1.3-2           withr_2.1.0                gridExtra_2.3             
[16] prettyunits_1.0.2          RMySQL_0.10.13             curl_3.1                   git2r_0.19.0               bit_1.1-12                
[21] compiler_3.4.3             DelayedArray_0.4.1         rtracklayer_1.38.2         caTools_1.17.1             scales_0.5.0              
[26] stringr_1.2.0              digest_0.6.12              Rsamtools_1.30.0           DOSE_3.4.0                 XVector_0.18.0            
[31] pkgconfig_2.0.1            plotrix_3.7                rlang_0.1.4                rstudioapi_0.7             RSQLite_2.0               
[36] bindr_0.1                  BiocParallel_1.12.0        gtools_3.5.0               GOSemSim_2.4.0             dplyr_0.7.4               
[41] RCurl_1.95-4.8             magrittr_1.5               GO.db_3.5.0                GenomeInfoDbData_0.99.1    Matrix_1.2-12             
[46] Rcpp_0.12.14               munsell_0.4.3              stringi_1.1.6              yaml_2.1.16                SummarizedExperiment_1.8.0
[51] zlibbioc_1.24.0            plyr_1.8.4                 qvalue_2.10.0              grid_3.4.3                 blob_1.1.0                
[56] gdata_2.18.0               DO.db_2.9                  crayon_1.3.4               lattice_0.20-35            Biostrings_2.46.0         
[61] splines_3.4.3              knitr_1.17                 fgsea_1.4.0                igraph_1.1.2               boot_1.3-20               
[66] reshape2_1.4.3             fastmatch_1.1-0            XML_3.98-1.9               glue_1.2.0                 data.table_1.10.4-3       
[71] gtable_0.2.0               assertthat_0.2.0           ggplot2_2.2.1              gridBase_0.4-7             tibble_1.3.4              
[76] rvcheck_0.0.9              GenomicAlignments_1.14.1   memoise_1.1.0              bindrcpp_0.2              


Is there a particular reason that installing the TxDb.Hsapiens.UCSC.hg19.knownGene and then using just the transcripts you care about isn't sufficient?

I intend to use the reduced TxDB object as input to the CHIPseeker package via:

ChIPseeker::annotatePeak(peaks, tssRegion=c(-3000, 3000), assignGenomicAnnotation=TRUE, TxDb=newTxDb)​

where newTxDb only contains my genes of interest. I know the annotatePeak will let me input a GRanges object as the TxDb, but then it is unable to assignGenomicAnnotation and I would like the information about intron, exon, UTR, etc.



