Question about Rle
1
0
Entering edit mode
Asma rabe ▴ 290
@asma-rabe-4697
Last seen 6.3 years ago
Japan
Hi All, I have a question DNA sequence encoding in Rle objects #------------------------- >From IRanges vignette the sequence *{*1, 1, 1, 2, 3, 3*} * can be represented as values = *{*1, 2, 3*}* , run lengths = {3, 1, 2*}* . #--------------------------- suppose we have a sequence of chr1 from pos 1-8 ATGTATCC. How it can be as Rle object Rle the so that it is recognized at pos 3 for instance the nuceoyide is G for comparison with reference genome? Thank you very much in advance. [[alternative HTML version deleted]]
IRanges IRanges • 662 views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.4 years ago
United States
Hi, The standard Bioconductor container for sequences are the XString family. BA, DNA, RNA and AA types are supported. See the man page for details on subsetting and manipulation. library(Biostrings) ?XString ?DNAString You can represent a single sequence dna <- DNAString("ATGTATCC") >> dna > 8-letter "DNAString" instance > seq: ATGTATCC >> dna[3] > 1-letter "DNAString" instance > seq: G or a set of sequences. dnast <- DNAStringSet(c("AT", "G", "ATC", "C")) >> dnast > A DNAStringSet instance of length 4 > width seq > [1] 2 AT > [2] 1 G > [3] 3 ATC > [4] 1 C Methods for operating on sequences are built on the XString framework, not Rles. If you really wanted to represent a sequence as an Rle it would be a character Rle and constructed in the same way the other atomic types are. See ?Rle. rle <- Rle(c("A", "T", "G", "T", "A", "T", "C"), c(rep(1, 6), 2)) >> rle > character-Rle of length 8 with 7 runs > Lengths: 1 1 1 1 1 1 2 > Values : "A" "T" "G" "T" "A" "T" "C" >> rle[3] > character-Rle of length 1 with 1 run > Lengths: 1 > Values : "G" Valerie On 07/29/2014 04:04 AM, Asma rabe wrote: > Hi All, > > > I have a question DNA sequence encoding in Rle objects > > > #------------------------- > >>From IRanges vignette > > > the sequence *{*1, 1, 1, 2, 3, 3*} * can be represented as values > > = *{*1, 2, 3*}* , run lengths = {3, 1, 2*}* . > > > #--------------------------- > > suppose we have a sequence of chr1 from pos 1-8 ATGTATCC. How it can be > as Rle object Rle the so that it is recognized at pos 3 for instance the > nuceoyide is G for comparison with reference genome? > > > Thank you very much in advance. > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Valerie Obenchain Program in Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, Seattle, WA 98109 Email: vobencha at fhcrc.org Phone: (206) 667-3158
ADD COMMENT
0
Entering edit mode
On 07/31/2014 11:41 AM, Valerie Obenchain wrote: > Hi, > > The standard Bioconductor container for sequences are the XString Sorry, the 'BA' below is a typo and should be 'BS'. The BS class is for any long sequence of characters vs the more biologically oriented DNA, RNA and AA types. Valerie > family. BA, DNA, RNA and AA types are supported. See the man page for > details on subsetting and manipulation. > > library(Biostrings) > ?XString > ?DNAString > > You can represent a single sequence > dna <- DNAString("ATGTATCC") >>> dna >> 8-letter "DNAString" instance >> seq: ATGTATCC >>> dna[3] >> 1-letter "DNAString" instance >> seq: G > > or a set of sequences. > dnast <- DNAStringSet(c("AT", "G", "ATC", "C")) >>> dnast >> A DNAStringSet instance of length 4 >> width seq >> [1] 2 AT >> [2] 1 G >> [3] 3 ATC >> [4] 1 C > > Methods for operating on sequences are built on the XString framework, > not Rles. If you really wanted to represent a sequence as an Rle it > would be a character Rle and constructed in the same way the other > atomic types are. See ?Rle. > > rle <- Rle(c("A", "T", "G", "T", "A", "T", "C"), c(rep(1, 6), 2)) >>> rle >> character-Rle of length 8 with 7 runs >> Lengths: 1 1 1 1 1 1 2 >> Values : "A" "T" "G" "T" "A" "T" "C" >>> rle[3] >> character-Rle of length 1 with 1 run >> Lengths: 1 >> Values : "G" > > > Valerie > > > On 07/29/2014 04:04 AM, Asma rabe wrote: >> Hi All, >> >> >> I have a question DNA sequence encoding in Rle objects >> >> >> #------------------------- >> >>> From IRanges vignette >> >> >> the sequence *{*1, 1, 1, 2, 3, 3*} * can be represented as values >> >> = *{*1, 2, 3*}* , run lengths = {3, 1, 2*}* . >> >> >> #--------------------------- >> >> suppose we have a sequence of chr1 from pos 1-8 ATGTATCC. How it >> can be >> as Rle object Rle the so that it is recognized at pos 3 for instance the >> nuceoyide is G for comparison with reference genome? >> >> >> Thank you very much in advance. >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > >
ADD REPLY

Login before adding your answer.

Traffic: 876 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6