Hi,
I would like to extract exon features for specific transcripts. I am using the "exons" function from GenomicFeatures to retrieve exon ranks corresponding to a transcript ID. However, this doesn't work for overlapping genes where exons are strictly identical. Below is one example, but I got the same result for different transcript IDs, and gtf files from various sources. So my question is: how can I extract a list of exon ranks for a specific transcript when genes are overlapping ?
Thank you very much in advance,
Emilie
> library("GenomicFeatures")
> txdb2 <- makeTranscriptDbFromUCSC(genome="hg19", tablename = "ensGene")
> exons(txdb2,vals=list(tx_name="ENST00000336097"),columns=c("gene_id","exon_rank"))
GRanges object with 6 ranges and 2 metadata columns:
seqnames ranges strand | gene_id exon_rank
[1] chr17 [49230937, 49231023] + | ENSG00000239672 1
[2] chr17 [49231586, 49231805] + | ENSG00000011052,ENSG00000239672,ENSG00000243678 2
[3] chr17 [49233012, 49233141] + | ENSG00000011052,ENSG00000239672,ENSG00000243678 2,3
[4] chr17 [49237341, 49237442] + | ENSG00000011052,ENSG00000239672,ENSG00000243678 3,4
[5] chr17 [49238521, 49238633] + | ENSG00000011052,ENSG00000239672,ENSG00000243678 4,5
[6] chr17 [49239089, 49239422] + | ENSG00000239672 6
-------
seqinfo: 93 sequences (1 circular) from hg19 genome
> exons(txdb2,vals=list(tx_name="ENST00000336097"),columns=c("exon_rank"))$exon_rank
IntegerList of length 6
[[1]] 1
[[2]] 2
[[3]] 2 3
[[4]] 3 4
[[5]] 4 5
[[6]] 6
> sessionInfo()
R version 3.1.3 (2015-03-09)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu precise (12.04.5 LTS)
locale:
[1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8
[5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8 LC_PAPER=fr_FR.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] grid stats4 parallel stats graphics grDevices utils datasets methods base other
attached packages:
[1] GenomicFeatures_1.18.6 AnnotationDbi_1.28.2 Biobase_2.26.0
[4] BSgenome.Hsapiens.UCSC.hg19_1.4.0 gridExtra_0.9.1 plyr_1.8.1
[7] ggplot2_1.0.1 BSgenome_1.34.1 rtracklayer_1.26.3
[10] GenomicRanges_1.18.4 GenomeInfoDb_1.2.4 Biostrings_2.34.1
[13] XVector_0.6.0 IRanges_2.0.1 S4Vectors_0.4.0
[16] BiocGenerics_0.12.1 RSQLite_1.0.0 DBI_0.3.1
loaded via a namespace (and not attached):
[1] base64enc_0.1-2 BatchJobs_1.6 BBmisc_1.9 BiocParallel_1.0.3
[5] biomaRt_2.22.0 bitops_1.0-6 brew_1.0-6 checkmate_1.5.2
[9] codetools_0.2-11 colorspace_1.2-6 digest_0.6.8 fail_1.2
[13] foreach_1.4.2 GenomicAlignments_1.2.2 gtable_0.1.2 iterators_1.0.7
[17] MASS_7.3-40 munsell_0.4.2 proto_0.3-10 Rcpp_0.11.5
[21] RCurl_1.95-4.5 reshape2_1.4.1 Rsamtools_1.18.3 scales_0.2.4
[25] sendmailR_1.2-1 stringr_0.6.2 tools_3.1.3 XML_3.98-1.1
[29] zlibbioc_1.12.0
