Hi, I am working with macaque, whose annotation and reference are not complete. I download Refseq annotation from UCSC and load the gtf using this.
annotation_m = makeTxDbFromGFF(gtfFile, format = 'gtf', dataSource = 'ensemble', organism = 'Macaca fascicularis', chrominfo = chromInfo, metadata = metaData)
Import genomic features from the file as a GRanges object ... OK Prepare the 'metadata' data frame ... OK Make the TxDb object ... OK Warning messages: 1: In makeTxDbFromGRanges(gr, metadata = metadata) : The following transcripts were dropped because their exon ranks could not be inferred (either because the exons are not on the same chromosome/strand or because they are not separated by introns): NM_001283298, NM_001283311, NM_001283379, NM_001283387, NM_001283401, NM_001283462, NM_001283504, NM_001283522, NM_001283551, NM_001283594, NM_001283671, NM_001283708, NM_001283746, NM_001283802, NM_001283855, NM_001283893, NM_001284027, NM_001284076, NM_001284114, NM_001284173, NM_001284607, NM_001284630, NM_001284689, NM_001284707, NM_001284756, NM_001284835, NM_001284840, NM_001284890, NM_001284912, NM_001285101, NM_001285216, NM_001285318, NM_001319465, NM_001319481, NM_001319512, NM_001319514, NM_001319538, NM_001319588, NM_001319591
These genes may be paralogous genes .I don't want to drop them. Is there any way to include them ? I haven't find any answers. Thank you very much for helping me!
> sessionInfo() R version 3.4.3 (2017-11-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 14.04.5 LTS Matrix products: default BLAS: /usr/local/lib/libblas.so.3.2.1 LAPACK: /usr/local/lib/liblapack.so.3.2.1 locale: [1] LC_CTYPE=en_HK.UTF-8 LC_NUMERIC=C LC_TIME=en_HK.UTF-8 [4] LC_COLLATE=en_HK.UTF-8 LC_MONETARY=en_HK.UTF-8 LC_MESSAGES=en_HK.UTF-8 [7] LC_PAPER=en_HK.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_HK.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel splines stats graphics grDevices utils datasets methods [10] base other attached packages: [1] GenomicFeatures_1.30.3 AnnotationDbi_1.40.0 GenomicRanges_1.30.3 GenomeInfoDb_1.14.0 [5] IRanges_2.12.0 S4Vectors_0.16.0 bladderbatch_1.16.0 Biobase_2.38.0 [9] BiocGenerics_0.24.0 limma_3.34.9 sva_3.26.0 BiocParallel_1.12.0 [13] genefilter_1.60.0 mgcv_1.8-12 nlme_3.1-128 survival_2.39-4 [17] BiocInstaller_1.28.0 RColorBrewer_1.1-2 gplots_3.0.1 scales_0.5.0 [21] cqn_1.24.0 quantreg_5.35 SparseM_1.77 preprocessCore_1.40.0 [25] nor1mix_1.2-3 mclust_5.4 ggfortify_0.4.5 ggplot2_2.2.1 [29] hash_2.2.6 pheatmap_1.0.8 stringr_1.3.0 loaded via a namespace (and not attached): [1] bitops_1.0-6 matrixStats_0.53.1 bit64_0.9-7 [4] progress_1.2.0 httr_1.3.1 tools_3.4.3 [7] R6_2.2.2 KernSmooth_2.23-15 DBI_1.0.0 [10] lazyeval_0.2.1 colorspace_1.3-2 tidyselect_0.2.5 [13] gridExtra_2.3 prettyunits_1.0.2 RMySQL_0.10.15 [16] bit_1.1-14 compiler_3.4.3 DelayedArray_0.4.1 [19] rtracklayer_1.38.3 caTools_1.17.1 digest_0.6.15 [22] Rsamtools_1.30.0 XVector_0.18.0 pkgconfig_2.0.1 [25] rlang_0.2.2 rstudioapi_0.7 RSQLite_2.1.1 [28] bindr_0.1.1 gtools_3.5.0 dplyr_0.7.6 [31] RCurl_1.95-4.10 magrittr_1.5 GenomeInfoDbData_1.0.0 [34] Matrix_1.2-6 Rcpp_0.12.19 munsell_0.4.3 [37] stringi_1.1.6 SummarizedExperiment_1.8.1 zlibbioc_1.24.0 [40] plyr_1.8.4 grid_3.4.3 blob_1.1.1 [43] gdata_2.18.0 crayon_1.3.4 lattice_0.20-33 [46] Biostrings_2.46.0 annotate_1.56.2 hms_0.4.2 [49] pillar_1.2.1 biomaRt_2.34.2 XML_3.98-1.11 [52] glue_1.3.0 MatrixModels_0.4-1 gtable_0.2.0 [55] purrr_0.2.5 tidyr_0.8.1 assertthat_0.2.0 [58] xtable_1.8-3 tibble_1.4.2 GenomicAlignments_1.14.2 [61] memoise_1.1.0 bindrcpp_0.2.2 |
||
|
||
|
First thank you very much. I think this solution is good however I also have a manual annotation which cannot be imported using makeTxDbFromUCSC(). The same gene on multiple chromosome also exists in it. How should I deal with this ?
The TxDb packages are intended to provide information about genomic locations based on a particular annotation service. When you use an accessor to get those data you get (depending on the accessor) either a GRanges or GRangesList object.
Your question could be formulated as 'How can I put some arbitrary genomic locations into a TxDb object?' or alternatively 'How can I add some arbitrary genomic locations to the data I get from a TxDb object?'. The second question is obviously a simpler thing to answer, and you can do that yourself by reading about the GRanges and GRangesList objects.