Entering edit mode
I would like to create a TxDb object restricted to only a specific set of transcripts. However, when I set the transcript_ids, it doesn't appear to have what I assumed would be the intended effect. Instead of the TxDb object containing only with information for the given transcript, it appears to return everything.
Am I specifying the transcript_ids incorrectly? Or is there a better way to do this?
Any advice is appreciated!
Example:
> transcript_ids <- c("uc001aaa.3") > txdbTry <- makeTxDbFromUCSC(genome="hg19", tablename="knownGene", transcript_ids=transcript_ids) > metadata(txdbTry) transcript_nrow 82960 > length(keys(txdbTry)) [1] 23459 > sessionInfo() R version 3.4.3 (2017-11-30) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS High Sierra 10.13.1 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base other attached packages: [1] ChIPseeker_1.15.2 devtools_1.13.4 biomaRt_2.35.1 [4] limma_3.34.3 org.Hs.eg.db_3.5.0 TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2 [7] GenomicFeatures_1.30.0 AnnotationDbi_1.40.0 Biobase_2.38.0 [10] GenomicRanges_1.30.0 GenomeInfoDb_1.14.0 IRanges_2.12.0 [13] S4Vectors_0.16.0 BiocGenerics_0.24.0 EBSeq_1.18.0 [16] testthat_1.0.2 gplots_3.0.1 blockmodeling_0.1.9 [19] BiocInstaller_1.28.0 loaded via a namespace (and not attached): [1] bitops_1.0-6 matrixStats_0.52.2 bit64_0.9-7 httr_1.3.1 RColorBrewer_1.1-2 [6] progress_1.1.2 UpSetR_1.3.3 tools_3.4.3 R6_2.2.2 KernSmooth_2.23-15 [11] DBI_0.7 lazyeval_0.2.1 colorspace_1.3-2 withr_2.1.0 gridExtra_2.3 [16] prettyunits_1.0.2 RMySQL_0.10.13 curl_3.1 git2r_0.19.0 bit_1.1-12 [21] compiler_3.4.3 DelayedArray_0.4.1 rtracklayer_1.38.2 caTools_1.17.1 scales_0.5.0 [26] stringr_1.2.0 digest_0.6.12 Rsamtools_1.30.0 DOSE_3.4.0 XVector_0.18.0 [31] pkgconfig_2.0.1 plotrix_3.7 rlang_0.1.4 rstudioapi_0.7 RSQLite_2.0 [36] bindr_0.1 BiocParallel_1.12.0 gtools_3.5.0 GOSemSim_2.4.0 dplyr_0.7.4 [41] RCurl_1.95-4.8 magrittr_1.5 GO.db_3.5.0 GenomeInfoDbData_0.99.1 Matrix_1.2-12 [46] Rcpp_0.12.14 munsell_0.4.3 stringi_1.1.6 yaml_2.1.16 SummarizedExperiment_1.8.0 [51] zlibbioc_1.24.0 plyr_1.8.4 qvalue_2.10.0 grid_3.4.3 blob_1.1.0 [56] gdata_2.18.0 DO.db_2.9 crayon_1.3.4 lattice_0.20-35 Biostrings_2.46.0 [61] splines_3.4.3 knitr_1.17 fgsea_1.4.0 igraph_1.1.2 boot_1.3-20 [66] reshape2_1.4.3 fastmatch_1.1-0 XML_3.98-1.9 glue_1.2.0 data.table_1.10.4-3 [71] gtable_0.2.0 assertthat_0.2.0 ggplot2_2.2.1 gridBase_0.4-7 tibble_1.3.4 [76] rvcheck_0.0.9 GenomicAlignments_1.14.1 memoise_1.1.0 bindrcpp_0.2
Is there a particular reason that installing the TxDb.Hsapiens.UCSC.hg19.knownGene and then using just the transcripts you care about isn't sufficient?
I intend to use the reduced TxDB object as input to the CHIPseeker package via:
where newTxDb only contains my genes of interest. I know the annotatePeak will let me input a GRanges object as the TxDb, but then it is unable to assignGenomicAnnotation and I would like the information about intron, exon, UTR, etc.