assemble DNAString complete coding sequence from exons?

0

Entering edit mode

Paul Shannon ★ 1.1k

@paul-shannon-578

Last seen 9.7 years ago

I am sure there is an elegant way to do this. Could somebody clue me in? I have (in a simple case) two exons for a gene on the + strand, and a the full DNAString sequence of its chromosome. My naive technique for constructing a DNAString of the entire coding sequence is 1) paste together toString (subseq (seq.chrom, exon.start, exon.end)) for each exon 2) construct a DNAString from the resulting chars. There must be a better way. What is it? Thanks! - Paul

• 672 views

ADD COMMENT • link updated 15.1 years ago by Hervé Pagès 16k • written 15.1 years ago by Paul Shannon ★ 1.1k

0

Entering edit mode

Hervé Pagès 16k

@herve-pages-1542

Last seen 23 hours ago

Seattle, WA, United States

Hi Paul, Yes, we need a better (and more efficient) way. This, as well as support for subsequence replacement (via subseq<-) that you just asked me off-list, is on its way. Should be ready later today or tomorrow. Thanks for your patience. H. Paul Shannon wrote: > I am sure there is an elegant way to do this. Could somebody clue me in? > > I have (in a simple case) two exons for a gene on the + strand, and a > the full DNAString sequence of its chromosome. > > My naive technique for constructing a DNAString of the entire coding > sequence is > > 1) paste together toString (subseq (seq.chrom, exon.start, exon.end)) > for each exon > 2) construct a DNAString from the resulting chars. > > There must be a better way. What is it? > > Thanks! > > - Paul > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319

ADD COMMENT • link 15.1 years ago Hervé Pagès 16k

0

Entering edit mode

Hervé Pagès 16k

@herve-pages-1542

Last seen 23 hours ago

Seattle, WA, United States

Hi Paul, You can now use c() on XString objects (and more generally on XRaw objects). See ?XRaw for some examples (the examples given in this man page use XRaw objects but they translate directly to XString objects which are a particular type of XRaw objects). I've also added the xscat() function which is an equivalent of paste(..., sep="") for XString/XStringSet/XStringViews objects. See ?xscat Regarding your February's request for more support for modifying an XString object (https://stat.ethz.ch/pipermail/bioconductor/2009-February/0262 09.html), I've added a subseq() replacement method (subseq<-) for XRaw/XString objects. Again, some examples with XRaw objects are given in ?XRaw. Here are a couple of more advanced examples with a chromosome sequence: (a) Delete regions specified by their coordinates: v <- Views(chrom, start=region_starts, end=regions_ends) do.call(c, as.list(gaps(v))) Note that 'v' could be the result of a call to matchPattern(some_pattern, chrom). This provides an easy way to delete patterns from a chromosome. (b) Modifying a chromosome: library(BSgenome.Dmelanogaster.UCSC.dm3) chr2L <- unmasked(Dmelanogaster$chr2L) # delete the first 1000 bases: subseq(chr2L, end=1000) <- NULL # insert 5 As right after base 6: subseq(chr2L, end=6, width=0) <- DNAString("AAAAA") # replace base -10 (base 10 counting from the 3' end) by 2 Gs: subseq(chr2L, start=-10, width=1) <- DNAString("GG") Note that these new functionalities don't work (yet) on MaskedXString objects. They are available in BioC devel only starting with Biostrings 2.11.44 and IRanges 1.1.55. They should propagate to the public repo in the next 24 hours but you can get them from svn if you need them now. Your feedback is welcome. Cheers, H. > sessionInfo() R version 2.9.0 Under development (unstable) (2009-02-11 r47901) x86_64-unknown-linux-gnu locale: LC_CTYPE=en_CA.UTF-8;LC_NUMERIC=C;LC_TIME=en_CA.UTF-8;LC_COLLATE=en_CA .UTF-8;LC_MONETARY=C;LC_MESSAGES=en_CA.UTF-8;LC_PAPER=en_CA.UTF-8;LC_N AME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_CA.UTF-8;LC_IDENTI FICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] BSgenome_1.11.13 Biostrings_2.11.44 IRanges_1.1.55 loaded via a namespace (and not attached): [1] Biobase_2.3.10 Paul Shannon wrote: > I am sure there is an elegant way to do this. Could somebody clue me in? > > I have (in a simple case) two exons for a gene on the + strand, and a > the full DNAString sequence of its chromosome. > > My naive technique for constructing a DNAString of the entire coding > sequence is > > 1) paste together toString (subseq (seq.chrom, exon.start, exon.end)) > for each exon > 2) construct a DNAString from the resulting chars. > > There must be a better way. What is it? > > Thanks! > > - Paul > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319

ADD COMMENT • link 15.1 years ago Hervé Pagès 16k

Login before adding your answer.