Search
Question: Positional Details with Features through UniProt.ws Ultimately to display as tracks in ggbio
0
gravatar for Anne Deslattes Mays
2.9 years ago by
United States
Anne Deslattes Mays30 wrote:
Dear all, biocLite(?UniProt.ws?) libraryUniProt.ws) selectUniProt.ws,keys=("P02794"),columns=c("DOMAINS","FEATURES"),key type="UNIPROTKB") Getting extra data for P02794 NA NA etc UNIPROTKB DOMAINS 1 P02794 Ferritin-like diiron domain (1) FEATURES 1 Chain (2); Domain (1); Erroneous initiation (1); Helix (6); Initiator methionine (1); Metal binding (6); Modified residue (4); Sequence conflict (1); Turn (2) What I want are the positional details for each of these features ? which are visible through the uniprot web page. FTH1 is 183 amino acids in length. There are 6 metal binding sites, each at a specific position. This information is there since you can have the web site return the positional details. I would like them so I may manipulate them with new evidential information. Ultimately I wish to display them with tracks from ggbio ? pb.53A.pos.ga <- readGAlignmentsFromBam(pb.53A.pos.bamfile, param = ScanBamParam(which = genesymbol["FTH1"],what=c("seq")), use.names = TRUE) FTH1.ga <- geom_alignment(data = txdb,which=genesymbol["FTH1"]) So here I have sample information which I have aligned to the reference genome. I retrieve that information from a bam file. # create the GAlignments objects for each isoform FTH1.isoform.1 <- pb.53A.pos.ga[c(7)] FTH1.isoform.2 <- pb.53A.pos.ga[c(15)] FTH1.isoform.3 <- pb.53A.pos.ga[c(13)] FTH1.isoform.4 <- pb.53A.pos.ga[c(8)] FTH1.isoform.5 <- pb.53A.pos.ga[c(2)] FTH1.isoform.6 <- pb.53A.pos.ga[c(1)] p1 <- autoplot(FTH1.isoform.1, fill = "brown", color = "brown") p2 <- autoplot(FTH1.isoform.2, fill = "blue", color = "blue") p3 <- autoplot(FTH1.isoform.3, fill = "brown", color = "brown") p4 <- autoplot(FTH1.isoform.4, fill = "brown", color = "brown") p5 <- autoplot(FTH1.isoform.5, fill = "brown", color = "brown") p6 <- autoplot(FTH1.isoform.6, fill = "brown", color = "brown") tracks( FTH1=p1.FTH1, "Iso 1"=p1, "Iso 2"=p2, "Iso 3"=p3, "Iso 4"=p4, "Iso 5"=p5, "Iso 6"=p6) I then can autopilot each of the separate isoforms. What I want to do however, is annotate the isoforms so that they each show the coding region with the full height of the bar, and a reduced height for the non-coding regions. Additionally, I want to color the graphic with the details for the protein, such as the metal binding sites, domains, etc. So that computationally I can generate an informative picture which explains what is lost or gained in separate isoforms. Thoughts? Anne R version 3.1.0 (2014-04-10) Platform: x86_64-apple-darwin13.1.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] UniProt.ws_2.4.2 [2] RCurl_1.95-4.3 [3] bitops_1.0-6 [4] RSQLite_0.11.4 [5] DBI_0.2-7 [6] biomaRt_2.20.0 [7] BiocInstaller_1.14.2 [8] GenomicAlignments_1.0.5 [9] BSgenome_1.32.0 [10] Rsamtools_1.16.1 [11] Biostrings_2.32.1 [12] XVector_0.4.0 [13] ggbio_1.12.8 [14] ggplot2_1.0.0 [15] TxDb.Hsapiens.UCSC.hg19.knownGene_2.14.0 [16] GenomicFeatures_1.16.2 [17] AnnotationDbi_1.26.0 [18] Biobase_2.24.0 [19] GenomicRanges_1.16.4 [20] GenomeInfoDb_1.0.2 [21] IRanges_1.22.10 [22] BiocGenerics_0.10.0 loaded via a namespace (and not attached): [1] BatchJobs_1.3 BBmisc_1.7 BiocParallel_0.6.1 [4] biovizBase_1.12.1 brew_1.0-6 checkmate_1.2 [7] cluster_1.15.2 codetools_0.2-8 colorspace_1.2-4 [10] dichromat_2.0-0 digest_0.6.4 fail_1.2 [13] foreach_1.4.2 Formula_1.1-2 grid_3.1.0 [16] gridExtra_0.9.1 gtable_0.1.2 Hmisc_3.14-4 [19] iterators_1.0.7 labeling_0.2 lattice_0.20-29 [22] latticeExtra_0.6-26 MASS_7.3-33 munsell_0.4.2 [25] plyr_1.8.1 proto_0.3-10 RColorBrewer_1.0-5 [28] Rcpp_0.11.2 reshape2_1.4 rtracklayer_1.24.2 [31] scales_0.2.4 sendmailR_1.1-2 splines_3.1.0 [34] stats4_3.1.0 stringr_0.6.2 survival_2.37-7 [37] tcltk_3.1.0 tools_3.1.0 VariantAnnotation_1.10.5 [40] XML_3.98-1.1 zlibbioc_1.10.0 [[alternative HTML version deleted]]
ADD COMMENTlink modified 2.8 years ago by Tengfei Yin490 • written 2.9 years ago by Anne Deslattes Mays30
0
gravatar for Tengfei Yin
2.8 years ago by
Tengfei Yin490
Tengfei Yin490 wrote:
Hey Anne, So sorry for the late reply. Ideally, I should have some kind of mapper function in biovizBase to help map protein space to genomic space, so you don't have to do it yourself, but before I have that, a hack would be massage your protein domain data into a GRanges object, with domain function as coloumn, and use genomic coordinates, and then create a separate track to plot the object as rectangle and use color legend to indicate domain function. I will try to develop a more general approach for doing this, if you want, please send me an example RData or example data, so we can work on that together. ps: in case I don't miss your request, feel free to use github page issues <https: github.com="" tengfei="" ggbio="" issues="">here cheers Tengfei On Sat, Aug 16, 2014 at 6:57 AM, Anne Deslattes Mays <ad376 at="" georgetown.edu=""> wrote: > Dear all, > > biocLite(?UniProt.ws?) > libraryUniProt.ws) > > > selectUniProt.ws,keys=("P02794"),columns=c("DOMAINS","FEATURES"),k eytype="UNIPROTKB") > Getting extra data for P02794 NA NA etc > UNIPROTKB DOMAINS > 1 P02794 Ferritin-like diiron domain (1) > > > FEATURES > 1 Chain (2); Domain (1); Erroneous initiation (1); Helix (6); Initiator > methionine (1); Metal binding (6); Modified residue (4); Sequence conflict > (1); Turn (2) > > What I want are the positional details for each of these features ? which > are visible through the uniprot web page. > FTH1 is 183 amino acids in length. There are 6 metal binding sites, each > at a specific position. > This information is there since you can have the web site return the > positional details. I would like them so I may manipulate them with new > evidential information. > > Ultimately I wish to display them with tracks from ggbio ? > pb.53A.pos.ga <- readGAlignmentsFromBam(pb.53A.pos.bamfile, > param = ScanBamParam(which = > genesymbol["FTH1"],what=c("seq")), > use.names = TRUE) > > FTH1.ga <- geom_alignment(data = txdb,which=genesymbol["FTH1"]) > > So here I have sample information which I have aligned to the reference > genome. I retrieve that information from a bam file. > # create the GAlignments objects for each isoform > FTH1.isoform.1 <- pb.53A.pos.ga[c(7)] > FTH1.isoform.2 <- pb.53A.pos.ga[c(15)] > FTH1.isoform.3 <- pb.53A.pos.ga[c(13)] > FTH1.isoform.4 <- pb.53A.pos.ga[c(8)] > FTH1.isoform.5 <- pb.53A.pos.ga[c(2)] > FTH1.isoform.6 <- pb.53A.pos.ga[c(1)] > > > p1 <- autoplot(FTH1.isoform.1, fill = "brown", color = "brown") > p2 <- autoplot(FTH1.isoform.2, fill = "blue", color = "blue") > p3 <- autoplot(FTH1.isoform.3, fill = "brown", color = "brown") > p4 <- autoplot(FTH1.isoform.4, fill = "brown", color = "brown") > p5 <- autoplot(FTH1.isoform.5, fill = "brown", color = "brown") > p6 <- autoplot(FTH1.isoform.6, fill = "brown", color = "brown") > > tracks( FTH1=p1.FTH1, > "Iso 1"=p1, > "Iso 2"=p2, > "Iso 3"=p3, > "Iso 4"=p4, > "Iso 5"=p5, > "Iso 6"=p6) > > > I then can autopilot each of the separate isoforms. What I want to do > however, is annotate the isoforms so that they each show the coding region > with the full height of the bar, and a reduced height for the non- coding > regions. > > Additionally, I want to color the graphic with the details for the > protein, such as the metal binding sites, domains, etc. So that > computationally I can generate an informative picture which explains what > is lost or gained in separate isoforms. > > Thoughts? > > Anne > R version 3.1.0 (2014-04-10) > Platform: x86_64-apple-darwin13.1.0 (64-bit) > > locale: > [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] UniProt.ws_2.4.2 > [2] RCurl_1.95-4.3 > [3] bitops_1.0-6 > [4] RSQLite_0.11.4 > [5] DBI_0.2-7 > [6] biomaRt_2.20.0 > [7] BiocInstaller_1.14.2 > [8] GenomicAlignments_1.0.5 > [9] BSgenome_1.32.0 > [10] Rsamtools_1.16.1 > [11] Biostrings_2.32.1 > [12] XVector_0.4.0 > [13] ggbio_1.12.8 > [14] ggplot2_1.0.0 > [15] TxDb.Hsapiens.UCSC.hg19.knownGene_2.14.0 > [16] GenomicFeatures_1.16.2 > [17] AnnotationDbi_1.26.0 > [18] Biobase_2.24.0 > [19] GenomicRanges_1.16.4 > [20] GenomeInfoDb_1.0.2 > [21] IRanges_1.22.10 > [22] BiocGenerics_0.10.0 > > loaded via a namespace (and not attached): > [1] BatchJobs_1.3 BBmisc_1.7 BiocParallel_0.6.1 > [4] biovizBase_1.12.1 brew_1.0-6 checkmate_1.2 > [7] cluster_1.15.2 codetools_0.2-8 colorspace_1.2-4 > [10] dichromat_2.0-0 digest_0.6.4 fail_1.2 > [13] foreach_1.4.2 Formula_1.1-2 grid_3.1.0 > [16] gridExtra_0.9.1 gtable_0.1.2 Hmisc_3.14-4 > [19] iterators_1.0.7 labeling_0.2 lattice_0.20-29 > [22] latticeExtra_0.6-26 MASS_7.3-33 munsell_0.4.2 > [25] plyr_1.8.1 proto_0.3-10 RColorBrewer_1.0-5 > [28] Rcpp_0.11.2 reshape2_1.4 rtracklayer_1.24.2 > [31] scales_0.2.4 sendmailR_1.1-2 splines_3.1.0 > [34] stats4_3.1.0 stringr_0.6.2 survival_2.37-7 > [37] tcltk_3.1.0 tools_3.1.0 > VariantAnnotation_1.10.5 > [40] XML_3.98-1.1 zlibbioc_1.10.0 > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Tengfei Yin, PhD Product Manager Seven Bridges Genomics sbgenomics.com One Broadway FL 7 Cambridge, MA 02142 (617) 866-0446 [[alternative HTML version deleted]]
ADD COMMENTlink written 2.8 years ago by Tengfei Yin490
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 112 users visited in the last hour