Entering edit mode
Dear Anne and Tengfei,
The mapping Pbase vignette [1] is an initial description of mapping
protein
coordinates back to the genome. My plan is to implement what is
described in the vignette in the package but haven't had time to do so
yet.
Please do not hesitate to comment or make suggestions that would be
useful to you or inter-operable with your use cases.
Best wishes,
Laurent
[1] http://bioconductor.org/packages/devel/bioc/vignettes/Pbase/inst/d
oc/mapping.html
On 27 August 2014 11:00, bioconductor-request at r-project.org wrote:
> Message: 25
> Date: Tue, 26 Aug 2014 18:37:49 -0400
> From: Tengfei Yin <tengfei.yin at="" sbgenomics.com="">
> To: Anne Deslattes Mays <ad376 at="" georgetown.edu="">
> Cc: Anne Deslattes Mays Cc Routing Num 255071981
> <adeslat at="" sbresearchllc.com="">, Bioconductor mailing list
> <bioconductor at="" r-project.org="">
> Subject: Re: [BioC] Positional Details with Features through
> UniProt.ws Ultimately to display as tracks in ggbio
> Message-ID:
> <cagkue7vokqs4guvbcob5c2_23myhjc51lmu1nv7g1_4k2inhoa at="" mail.gmail.com="">
> Content-Type: text/plain; charset="UTF-8"
>
> Hey Anne,
>
> So sorry for the late reply.
>
> Ideally, I should have some kind of mapper function in biovizBase to
help
> map protein space to genomic space, so you don't have to do it
yourself,
> but before I have that, a hack would be massage your protein domain
data
> into a GRanges object, with domain function as coloumn, and use
genomic
> coordinates, and then create a separate track to plot the object as
> rectangle and use color legend to indicate domain function.
>
> I will try to develop a more general approach for doing this, if you
want,
> please send me an example RData or example data, so we can work on
that
> together.
>
> ps: in case I don't miss your request, feel free to use github page
issues
> <https: github.com="" tengfei="" ggbio="" issues="">here
>
> cheers
>
> Tengfei
>
>
>
>
> On Sat, Aug 16, 2014 at 6:57 AM, Anne Deslattes Mays <ad376 at="" georgetown.edu="">
> wrote:
>
>> Dear all,
>>
>> biocLite(?UniProt.ws?)
>> libraryUniProt.ws)
>>
>>
>> selectUniProt.ws,keys=("P02794"),columns=c("DOMAINS","FEATURES"),
keytype="UNIPROTKB")
>> Getting extra data for P02794 NA NA etc
>> UNIPROTKB DOMAINS
>> 1 P02794 Ferritin-like diiron domain (1)
>>
>>
>> FEATURES
>> 1 Chain (2); Domain (1); Erroneous initiation (1); Helix (6);
Initiator
>> methionine (1); Metal binding (6); Modified residue (4); Sequence
conflict
>> (1); Turn (2)
>>
>> What I want are the positional details for each of these features ?
which
>> are visible through the uniprot web page.
>> FTH1 is 183 amino acids in length. There are 6 metal binding
sites, each
>> at a specific position.
>> This information is there since you can have the web site return
the
>> positional details. I would like them so I may manipulate them
with new
>> evidential information.
>>
>> Ultimately I wish to display them with tracks from ggbio ?
>> pb.53A.pos.ga <- readGAlignmentsFromBam(pb.53A.pos.bamfile,
>> param = ScanBamParam(which =
>> genesymbol["FTH1"],what=c("seq")),
>> use.names = TRUE)
>>
>> FTH1.ga <- geom_alignment(data = txdb,which=genesymbol["FTH1"])
>>
>> So here I have sample information which I have aligned to the
reference
>> genome. I retrieve that information from a bam file.
>> # create the GAlignments objects for each isoform
>> FTH1.isoform.1 <- pb.53A.pos.ga[c(7)]
>> FTH1.isoform.2 <- pb.53A.pos.ga[c(15)]
>> FTH1.isoform.3 <- pb.53A.pos.ga[c(13)]
>> FTH1.isoform.4 <- pb.53A.pos.ga[c(8)]
>> FTH1.isoform.5 <- pb.53A.pos.ga[c(2)]
>> FTH1.isoform.6 <- pb.53A.pos.ga[c(1)]
>>
>>
>> p1 <- autoplot(FTH1.isoform.1, fill = "brown", color = "brown")
>> p2 <- autoplot(FTH1.isoform.2, fill = "blue", color = "blue")
>> p3 <- autoplot(FTH1.isoform.3, fill = "brown", color = "brown")
>> p4 <- autoplot(FTH1.isoform.4, fill = "brown", color = "brown")
>> p5 <- autoplot(FTH1.isoform.5, fill = "brown", color = "brown")
>> p6 <- autoplot(FTH1.isoform.6, fill = "brown", color = "brown")
>>
>> tracks( FTH1=p1.FTH1,
>> "Iso 1"=p1,
>> "Iso 2"=p2,
>> "Iso 3"=p3,
>> "Iso 4"=p4,
>> "Iso 5"=p5,
>> "Iso 6"=p6)
>>
>>
>> I then can autopilot each of the separate isoforms. What I want
to do
>> however, is annotate the isoforms so that they each show the coding
region
>> with the full height of the bar, and a reduced height for the non-
coding
>> regions.
>>
>> Additionally, I want to color the graphic with the details for the
>> protein, such as the metal binding sites, domains, etc. So that
>> computationally I can generate an informative picture which
explains what
>> is lost or gained in separate isoforms.
>>
>> Thoughts?
>>
>> Anne
>> R version 3.1.0 (2014-04-10)
>> Platform: x86_64-apple-darwin13.1.0 (64-bit)
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] parallel stats graphics grDevices utils datasets
methods
>> [8] base
>>
>> other attached packages:
>> [1] UniProt.ws_2.4.2
>> [2] RCurl_1.95-4.3
>> [3] bitops_1.0-6
>> [4] RSQLite_0.11.4
>> [5] DBI_0.2-7
>> [6] biomaRt_2.20.0
>> [7] BiocInstaller_1.14.2
>> [8] GenomicAlignments_1.0.5
>> [9] BSgenome_1.32.0
>> [10] Rsamtools_1.16.1
>> [11] Biostrings_2.32.1
>> [12] XVector_0.4.0
>> [13] ggbio_1.12.8
>> [14] ggplot2_1.0.0
>> [15] TxDb.Hsapiens.UCSC.hg19.knownGene_2.14.0
>> [16] GenomicFeatures_1.16.2
>> [17] AnnotationDbi_1.26.0
>> [18] Biobase_2.24.0
>> [19] GenomicRanges_1.16.4
>> [20] GenomeInfoDb_1.0.2
>> [21] IRanges_1.22.10
>> [22] BiocGenerics_0.10.0
>>
>> loaded via a namespace (and not attached):
>> [1] BatchJobs_1.3 BBmisc_1.7
BiocParallel_0.6.1
>> [4] biovizBase_1.12.1 brew_1.0-6
checkmate_1.2
>> [7] cluster_1.15.2 codetools_0.2-8
colorspace_1.2-4
>> [10] dichromat_2.0-0 digest_0.6.4 fail_1.2
>> [13] foreach_1.4.2 Formula_1.1-2 grid_3.1.0
>> [16] gridExtra_0.9.1 gtable_0.1.2 Hmisc_3.14-4
>> [19] iterators_1.0.7 labeling_0.2
lattice_0.20-29
>> [22] latticeExtra_0.6-26 MASS_7.3-33
munsell_0.4.2
>> [25] plyr_1.8.1 proto_0.3-10
RColorBrewer_1.0-5
>> [28] Rcpp_0.11.2 reshape2_1.4
rtracklayer_1.24.2
>> [31] scales_0.2.4 sendmailR_1.1-2
splines_3.1.0
>> [34] stats4_3.1.0 stringr_0.6.2
survival_2.37-7
>> [37] tcltk_3.1.0 tools_3.1.0
>> VariantAnnotation_1.10.5
>> [40] XML_3.98-1.1 zlibbioc_1.10.0
>> [[alternative HTML version deleted]]
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Laurent Gatto
http://cpu.sysbiol.cam.ac.uk/