how to use rtracklayer to retrieve sequence of list of coordinates

0

Entering edit mode

sabrina.shao ▴ 220

@sabrinashao-1661

Last seen 10.2 years ago

Hi,all: I have a long list of coordinates, which was created by the following code: testIranges<- IRanges(start=currCoorList $starts, end=currCoorLis$ends) testTrack<-GenomicData(testIranges,chrom=currCoorList$chr, strand=currCoorList$strand,genome="mm9") I could then export the testTrack into .bed file and upload in the UCSC genome browser and get the sequences of the listed coordinate But I was thinking if there is a function from rtracklayer to retrieve sequence directly. I checked the rtracklayer reference manual, there was a function getSeq. But I tried to use the following code: track(session,"test")<-testTrack seq<-getSeq(session,range(session),track="testTrack") it told me Error: could not find function "getSeq" Can anyone give me a hand? Thanks! -- Sabrina [[alternative HTML version deleted]]

rtracklayer rtracklayer • 3.1k views

ADD COMMENT • link updated 14.8 years ago by Steve Lianoglou ★ 13k • written 14.8 years ago by sabrina.shao ▴ 220

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 21 months ago

United States

Hi Sabrina, On Mon, Mar 1, 2010 at 2:04 PM, sabrina s <sabrina.shao at="" gmail.com=""> wrote: > Hi,all: > I have a long list of coordinates, which was created by the following code: > testIranges<- IRanges(start=currCoorList $starts, > ? ? ? ?end=currCoorLis$ends) > > ? testTrack<-GenomicData(testIranges,chrom=currCoorList$chr, > ? ? ? ?strand=currCoorList$strand,genome="mm9") > > I could then export the testTrack into .bed file and upload in the UCSC > genome browser and get the sequences of the listed coordinate > But I was thinking if there is a function from rtracklayer to retrieve > sequence directly. If you're trying to get coordinates from some reference genome, you can use the BSgenome.*.* packages and skip rtracklayer + internet queries entirely. For instance, say I have an IRanges object `r1` which is a set of intervals on chromosome 1 from hg18 that I want sequence information for, I could do it like this: R> r1 IRanges of length 589 start end width [1] 18016124 18016155 32 [2] 18020749 18020780 32 [3] 18024599 18024630 32 [4] 18024830 18024861 32 [5] 18051985 18052016 32 [6] 18064760 18064791 32 [7] 18088145 18088176 32 [8] 18088200 18088231 32 [9] 18110085 18110116 32 ... R> library(BSgenome.Hsapiens.UCSC.hg18) R> chr1 <- unmasked(HSapiens$chr1) R> myseqs <- Views(chr1, start=start(r1), end=end(r1)) R> myseqs ?Views on a 247249719-letter DNAString subject subject: TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT AACCCTAA...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNN views: ???????start ?????end width [1] 18016124 18016155 ???32 [GTTTTTTGTTTTGTTTTGTTTTTTGTTTTTTA] [2] 18020749 18020780 ???32 [CGCCTAGTCTGGAGTGCAGTGGCACAATCTTG] [3] 18024599 18024630 ???32 [TCTCACCTCTTCACCGAAGTGGTCTTCTGAAC] [4] 18024830 18024861 ???32 [GAATTCTCCCTCGTTGCAGATCCTGTTTGAGT] [5] 18051985 18052016 ???32 [TTATTTTTTTTGAGGAAGAGTCTTGCTCTGTC] [6] 18064760 18064791 ???32 [GTGGGAGGATCGCTTGAGCCCTGGAGGTTGGG] [7] 18088145 18088176 ???32 [ATCATGGTGGGAGATGAAAGGCACTTCTTACA] [8] 18088200 18088231 ???32 [GAAGAAGCAAAAGCGGAAACCCCTGCTAAACC] [9] 18110085 18110116 ???32 [GGATCATGAGTTCAAGAGTTCGAGACCAGCCT] [10] 18118454 18118485 ???32 [TGTTGCCTAGGCTGGTCTCAAACTCCTGCCCT] ... calling "as.character" on the myseqs object will give you the sequence as a character vector. Does that get you what you're after? -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD COMMENT • link 14.8 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Hi, Steve: Thanks! I never used BSgenome before. My list of ranges (testIranges) actually is a mixture of different chromosomes. So in that case, if I use BSgenome, do I have to separate them by chromosomes? Thanks for your help! Sabrina On Mon, Mar 1, 2010 at 2:31 PM, Steve Lianoglou < mailinglist.honeypot@gmail.com> wrote: > Hi Sabrina, > > On Mon, Mar 1, 2010 at 2:04 PM, sabrina s <sabrina.shao@gmail.com> wrote: > > Hi,all: > > I have a long list of coordinates, which was created by the following > code: > > testIranges<- IRanges(start=currCoorList $starts, > > end=currCoorLis$ends) > > > > testTrack<-GenomicData(testIranges,chrom=currCoorList$chr, > > strand=currCoorList$strand,genome="mm9") > > > > I could then export the testTrack into .bed file and upload in the UCSC > > genome browser and get the sequences of the listed coordinate > > But I was thinking if there is a function from rtracklayer to retrieve > > sequence directly. > > If you're trying to get coordinates from some reference genome, you > can use the BSgenome.*.* packages and skip rtracklayer + internet > queries entirely. > > For instance, say I have an IRanges object `r1` which is a set of > intervals on chromosome 1 from hg18 that I want sequence information > for, I could do it like this: > > R> r1 > IRanges of length 589 > start end width > [1] 18016124 18016155 32 > [2] 18020749 18020780 32 > [3] 18024599 18024630 32 > [4] 18024830 18024861 32 > [5] 18051985 18052016 32 > [6] 18064760 18064791 32 > [7] 18088145 18088176 32 > [8] 18088200 18088231 32 > [9] 18110085 18110116 32 > ... > > R> library(BSgenome.Hsapiens.UCSC.hg18) > R> chr1 <- unmasked(HSapiens$chr1) > R> myseqs <- Views(chr1, start=start(r1), end=end(r1)) > R> myseqs > Views on a 247249719-letter DNAString subject > subject: > TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA A...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNN > views: > start end width > [1] 18016124 18016155 32 [GTTTTTTGTTTTGTTTTGTTTTTTGTTTTTTA] > [2] 18020749 18020780 32 [CGCCTAGTCTGGAGTGCAGTGGCACAATCTTG] > [3] 18024599 18024630 32 [TCTCACCTCTTCACCGAAGTGGTCTTCTGAAC] > [4] 18024830 18024861 32 [GAATTCTCCCTCGTTGCAGATCCTGTTTGAGT] > [5] 18051985 18052016 32 [TTATTTTTTTTGAGGAAGAGTCTTGCTCTGTC] > [6] 18064760 18064791 32 [GTGGGAGGATCGCTTGAGCCCTGGAGGTTGGG] > [7] 18088145 18088176 32 [ATCATGGTGGGAGATGAAAGGCACTTCTTACA] > [8] 18088200 18088231 32 [GAAGAAGCAAAAGCGGAAACCCCTGCTAAACC] > [9] 18110085 18110116 32 [GGATCATGAGTTCAAGAGTTCGAGACCAGCCT] > [10] 18118454 18118485 32 [TGTTGCCTAGGCTGGTCTCAAACTCCTGCCCT] > ... > > calling "as.character" on the myseqs object will give you the sequence > as a character vector. > > Does that get you what you're after? > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact<http: cbio.mskc="" c.org="" %7elianos="" contact=""> > -- Sabrina [[alternative HTML version deleted]]

ADD REPLY • link 14.8 years ago sabrina.shao ▴ 220

0

Entering edit mode

Hi, On Mon, Mar 1, 2010 at 3:12 PM, sabrina s <sabrina.shao at="" gmail.com=""> wrote: > Hi, Steve: > Thanks! I never used BSgenome before. My list of ranges (testIranges) > actually is a mixture of different chromosomes. So in that case, if I use > BSgenome, do I have to separate them by chromosomes? Thanks for your help! I don't know if you *have* to, but I would ... I think it will make your life much easier to do so for any downstream analysis anyway, right? -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 14.8 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

Hi, Steve: I am trying to get it worked without separating the chromosomes. I used the following code: seqQuery<-data.frame( chr=currCoorList$chr, start=currCoorList$starts, end=currCoorList$ends, strand=currCoorList$strand) mySeq<- getSeq(Mmusculus, seqQuery$chr, start=seqQuery$start, end=seqQuery$end, strand=seqQuery$strand ) But then I got an error: Error in reverseComplement(DNAStringSet(seqs[ii])) : could not find function "copy" Did I miss some depending package? Sabrina On Mon, Mar 1, 2010 at 2:31 PM, Steve Lianoglou < mailinglist.honeypot@gmail.com> wrote: > Hi Sabrina, > > On Mon, Mar 1, 2010 at 2:04 PM, sabrina s <sabrina.shao@gmail.com> wrote: > > Hi,all: > > I have a long list of coordinates, which was created by the following > code: > > testIranges<- IRanges(start=currCoorList $starts, > > end=currCoorLis$ends) > > > > testTrack<-GenomicData(testIranges,chrom=currCoorList$chr, > > strand=currCoorList$strand,genome="mm9") > > > > I could then export the testTrack into .bed file and upload in the UCSC > > genome browser and get the sequences of the listed coordinate > > But I was thinking if there is a function from rtracklayer to retrieve > > sequence directly. > > If you're trying to get coordinates from some reference genome, you > can use the BSgenome.*.* packages and skip rtracklayer + internet > queries entirely. > > For instance, say I have an IRanges object `r1` which is a set of > intervals on chromosome 1 from hg18 that I want sequence information > for, I could do it like this: > > R> r1 > IRanges of length 589 > start end width > [1] 18016124 18016155 32 > [2] 18020749 18020780 32 > [3] 18024599 18024630 32 > [4] 18024830 18024861 32 > [5] 18051985 18052016 32 > [6] 18064760 18064791 32 > [7] 18088145 18088176 32 > [8] 18088200 18088231 32 > [9] 18110085 18110116 32 > ... > > R> library(BSgenome.Hsapiens.UCSC.hg18) > R> chr1 <- unmasked(HSapiens$chr1) > R> myseqs <- Views(chr1, start=start(r1), end=end(r1)) > R> myseqs > Views on a 247249719-letter DNAString subject > subject: > TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA A...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNN > views: > start end width > [1] 18016124 18016155 32 [GTTTTTTGTTTTGTTTTGTTTTTTGTTTTTTA] > [2] 18020749 18020780 32 [CGCCTAGTCTGGAGTGCAGTGGCACAATCTTG] > [3] 18024599 18024630 32 [TCTCACCTCTTCACCGAAGTGGTCTTCTGAAC] > [4] 18024830 18024861 32 [GAATTCTCCCTCGTTGCAGATCCTGTTTGAGT] > [5] 18051985 18052016 32 [TTATTTTTTTTGAGGAAGAGTCTTGCTCTGTC] > [6] 18064760 18064791 32 [GTGGGAGGATCGCTTGAGCCCTGGAGGTTGGG] > [7] 18088145 18088176 32 [ATCATGGTGGGAGATGAAAGGCACTTCTTACA] > [8] 18088200 18088231 32 [GAAGAAGCAAAAGCGGAAACCCCTGCTAAACC] > [9] 18110085 18110116 32 [GGATCATGAGTTCAAGAGTTCGAGACCAGCCT] > [10] 18118454 18118485 32 [TGTTGCCTAGGCTGGTCTCAAACTCCTGCCCT] > ... > > calling "as.character" on the myseqs object will give you the sequence > as a character vector. > > Does that get you what you're after? > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact<http: cbio.mskc="" c.org="" %7elianos="" contact=""> > -- Sabrina [[alternative HTML version deleted]]

ADD REPLY • link 14.8 years ago sabrina.shao ▴ 220

0

Entering edit mode

On Mon, Mar 1, 2010 at 1:02 PM, sabrina s <sabrina.shao@gmail.com> wrote: > Hi, Steve: > I am trying to get it worked without separating the chromosomes. I used the > following code: > seqQuery<-data.frame( > chr=currCoorList$chr, > start=currCoorList$starts, > end=currCoorList$ends, > strand=currCoorList$strand) > mySeq<- getSeq(Mmusculus, seqQuery$chr, > start=seqQuery$start, end=seqQuery$end, > strand=seqQuery$strand ) > > > But then I got an error: > Error in reverseComplement(DNAStringSet(seqs[ii])) : > could not find function "copy" > Did I miss some depending package? > > Don't know what's up with that, but you don't need to use a data.frame there. You already had a RangedData above, just pass that to getSeq, i.e. getSeq(Mmusculus, testTrack) Sabrina > > On Mon, Mar 1, 2010 at 2:31 PM, Steve Lianoglou < > mailinglist.honeypot@gmail.com> wrote: > > > Hi Sabrina, > > > > On Mon, Mar 1, 2010 at 2:04 PM, sabrina s <sabrina.shao@gmail.com> > wrote: > > > Hi,all: > > > I have a long list of coordinates, which was created by the following > > code: > > > testIranges<- IRanges(start=currCoorList $starts, > > > end=currCoorLis$ends) > > > > > > testTrack<-GenomicData(testIranges,chrom=currCoorList$chr, > > > strand=currCoorList$strand,genome="mm9") > > > > > > I could then export the testTrack into .bed file and upload in the UCSC > > > genome browser and get the sequences of the listed coordinate > > > But I was thinking if there is a function from rtracklayer to retrieve > > > sequence directly. > > > > If you're trying to get coordinates from some reference genome, you > > can use the BSgenome.*.* packages and skip rtracklayer + internet > > queries entirely. > > > > For instance, say I have an IRanges object `r1` which is a set of > > intervals on chromosome 1 from hg18 that I want sequence information > > for, I could do it like this: > > > > R> r1 > > IRanges of length 589 > > start end width > > [1] 18016124 18016155 32 > > [2] 18020749 18020780 32 > > [3] 18024599 18024630 32 > > [4] 18024830 18024861 32 > > [5] 18051985 18052016 32 > > [6] 18064760 18064791 32 > > [7] 18088145 18088176 32 > > [8] 18088200 18088231 32 > > [9] 18110085 18110116 32 > > ... > > > > R> library(BSgenome.Hsapiens.UCSC.hg18) > > R> chr1 <- unmasked(HSapiens$chr1) > > R> myseqs <- Views(chr1, start=start(r1), end=end(r1)) > > R> myseqs > > Views on a 247249719-letter DNAString subject > > subject: > > > TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA A...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNN > > views: > > start end width > > [1] 18016124 18016155 32 [GTTTTTTGTTTTGTTTTGTTTTTTGTTTTTTA] > > [2] 18020749 18020780 32 [CGCCTAGTCTGGAGTGCAGTGGCACAATCTTG] > > [3] 18024599 18024630 32 [TCTCACCTCTTCACCGAAGTGGTCTTCTGAAC] > > [4] 18024830 18024861 32 [GAATTCTCCCTCGTTGCAGATCCTGTTTGAGT] > > [5] 18051985 18052016 32 [TTATTTTTTTTGAGGAAGAGTCTTGCTCTGTC] > > [6] 18064760 18064791 32 [GTGGGAGGATCGCTTGAGCCCTGGAGGTTGGG] > > [7] 18088145 18088176 32 [ATCATGGTGGGAGATGAAAGGCACTTCTTACA] > > [8] 18088200 18088231 32 [GAAGAAGCAAAAGCGGAAACCCCTGCTAAACC] > > [9] 18110085 18110116 32 [GGATCATGAGTTCAAGAGTTCGAGACCAGCCT] > > [10] 18118454 18118485 32 [TGTTGCCTAGGCTGGTCTCAAACTCCTGCCCT] > > ... > > > > calling "as.character" on the myseqs object will give you the sequence > > as a character vector. > > > > Does that get you what you're after? > > -steve > > > > -- > > Steve Lianoglou > > Graduate Student: Computational Systems Biology > > | Memorial Sloan-Kettering Cancer Center > > | Weill Medical College of Cornell University > > Contact Info: http://cbio.mskcc.org/~lianos/contact< > http://cbio.mskcc.org/%7Elianos/contact> > > > > > > -- > Sabrina > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD REPLY • link 14.8 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Hi, Michael: I tried as you suggested, but I had the following error: Error in IRanges:::.normargSEW(start, "start") : 'start' must be a vector of integers my testTrack looks like this: RangedData with 76 rows and... space ranges | strand <character> <iranges> | <factor> 1 chr1 [137159091, 137159290] | - 2 chr1 [155302742, 155302941] | - etc Any suggestions? Thanks! Sabrina On Mon, Mar 1, 2010 at 6:02 PM, Michael Lawrence <lawrence.michael@gene.com>wrote: > > > On Mon, Mar 1, 2010 at 1:02 PM, sabrina s <sabrina.shao@gmail.com> wrote: > >> Hi, Steve: >> I am trying to get it worked without separating the chromosomes. I used >> the >> following code: >> seqQuery<-data.frame( >> chr=currCoorList$chr, >> start=currCoorList$starts, >> end=currCoorList$ends, >> strand=currCoorList$strand) >> mySeq<- getSeq(Mmusculus, seqQuery$chr, >> start=seqQuery$start, end=seqQuery$end, >> strand=seqQuery$strand ) >> >> >> But then I got an error: >> Error in reverseComplement(DNAStringSet(seqs[ii])) : >> could not find function "copy" >> > > Did I miss some depending package? >> >> > Don't know what's up with that, but you don't need to use a data.frame > there. You already had a RangedData above, just pass that to getSeq, i.e. > > getSeq(Mmusculus, testTrack) > > Sabrina >> >> On Mon, Mar 1, 2010 at 2:31 PM, Steve Lianoglou < >> mailinglist.honeypot@gmail.com> wrote: >> >> > Hi Sabrina, >> > >> > On Mon, Mar 1, 2010 at 2:04 PM, sabrina s <sabrina.shao@gmail.com> >> wrote: >> > > Hi,all: >> > > I have a long list of coordinates, which was created by the following >> > code: >> > > testIranges<- IRanges(start=currCoorList $starts, >> > > end=currCoorLis$ends) >> > > >> > > testTrack<-GenomicData(testIranges,chrom=currCoorList$chr, >> > > strand=currCoorList$strand,genome="mm9") >> > > >> > > I could then export the testTrack into .bed file and upload in the >> UCSC >> > > genome browser and get the sequences of the listed coordinate >> > > But I was thinking if there is a function from rtracklayer to retrieve >> > > sequence directly. >> > >> > If you're trying to get coordinates from some reference genome, you >> > can use the BSgenome.*.* packages and skip rtracklayer + internet >> > queries entirely. >> > >> > For instance, say I have an IRanges object `r1` which is a set of >> > intervals on chromosome 1 from hg18 that I want sequence information >> > for, I could do it like this: >> > >> > R> r1 >> > IRanges of length 589 >> > start end width >> > [1] 18016124 18016155 32 >> > [2] 18020749 18020780 32 >> > [3] 18024599 18024630 32 >> > [4] 18024830 18024861 32 >> > [5] 18051985 18052016 32 >> > [6] 18064760 18064791 32 >> > [7] 18088145 18088176 32 >> > [8] 18088200 18088231 32 >> > [9] 18110085 18110116 32 >> > ... >> > >> > R> library(BSgenome.Hsapiens.UCSC.hg18) >> > R> chr1 <- unmasked(HSapiens$chr1) >> > R> myseqs <- Views(chr1, start=start(r1), end=end(r1)) >> > R> myseqs >> > Views on a 247249719-letter DNAString subject >> > subject: >> > >> TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT AA...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNN >> > views: >> > start end width >> > [1] 18016124 18016155 32 [GTTTTTTGTTTTGTTTTGTTTTTTGTTTTTTA] >> > [2] 18020749 18020780 32 [CGCCTAGTCTGGAGTGCAGTGGCACAATCTTG] >> > [3] 18024599 18024630 32 [TCTCACCTCTTCACCGAAGTGGTCTTCTGAAC] >> > [4] 18024830 18024861 32 [GAATTCTCCCTCGTTGCAGATCCTGTTTGAGT] >> > [5] 18051985 18052016 32 [TTATTTTTTTTGAGGAAGAGTCTTGCTCTGTC] >> > [6] 18064760 18064791 32 [GTGGGAGGATCGCTTGAGCCCTGGAGGTTGGG] >> > [7] 18088145 18088176 32 [ATCATGGTGGGAGATGAAAGGCACTTCTTACA] >> > [8] 18088200 18088231 32 [GAAGAAGCAAAAGCGGAAACCCCTGCTAAACC] >> > [9] 18110085 18110116 32 [GGATCATGAGTTCAAGAGTTCGAGACCAGCCT] >> > [10] 18118454 18118485 32 [TGTTGCCTAGGCTGGTCTCAAACTCCTGCCCT] >> > ... >> > >> > calling "as.character" on the myseqs object will give you the sequence >> > as a character vector. >> > >> > Does that get you what you're after? >> > -steve >> > >> > -- >> > Steve Lianoglou >> > Graduate Student: Computational Systems Biology >> > | Memorial Sloan-Kettering Cancer Center >> > | Weill Medical College of Cornell University >> > Contact Info: http://cbio.mskcc.org/~lianos/contact<http: cbio.m="" skcc.org="" %7elianos="" contact=""> >> <http: cbio.mskcc.org="" %7elianos="" contact=""> >> > >> >> >> >> -- >> Sabrina >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@stat.math.ethz.ch >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- Sabrina [[alternative HTML version deleted]]

ADD REPLY • link 14.8 years ago sabrina.shao ▴ 220

0

Entering edit mode

Could you paste your sessionInfo() output please? On Mon, Mar 1, 2010 at 7:37 PM, sabrina s <sabrina.shao@gmail.com> wrote: > Hi, Michael: > > I tried as you suggested, but I had the following error: > > Error in IRanges:::.normargSEW(start, "start") : > 'start' must be a vector of integers > > my testTrack looks like this: > > RangedData with 76 rows and... > space ranges | strand > <character> <iranges> | <factor> > 1 chr1 [137159091, 137159290] | - > 2 chr1 [155302742, 155302941] | - > etc > > Any suggestions? Thanks! > > Sabrina > > > On Mon, Mar 1, 2010 at 6:02 PM, Michael Lawrence < > lawrence.michael@gene.com> wrote: > >> >> >> On Mon, Mar 1, 2010 at 1:02 PM, sabrina s <sabrina.shao@gmail.com> wrote: >> >>> Hi, Steve: >>> I am trying to get it worked without separating the chromosomes. I used >>> the >>> following code: >>> seqQuery<-data.frame( >>> chr=currCoorList$chr, >>> start=currCoorList$starts, >>> end=currCoorList$ends, >>> strand=currCoorList$strand) >>> mySeq<- getSeq(Mmusculus, seqQuery$chr, >>> start=seqQuery$start, end=seqQuery$end, >>> strand=seqQuery$strand ) >>> >>> >>> But then I got an error: >>> Error in reverseComplement(DNAStringSet(seqs[ii])) : >>> could not find function "copy" >>> >> >> Did I miss some depending package? >>> >>> >> Don't know what's up with that, but you don't need to use a data.frame >> there. You already had a RangedData above, just pass that to getSeq, i.e. >> >> getSeq(Mmusculus, testTrack) >> >> Sabrina >>> >>> On Mon, Mar 1, 2010 at 2:31 PM, Steve Lianoglou < >>> mailinglist.honeypot@gmail.com> wrote: >>> >>> > Hi Sabrina, >>> > >>> > On Mon, Mar 1, 2010 at 2:04 PM, sabrina s <sabrina.shao@gmail.com> >>> wrote: >>> > > Hi,all: >>> > > I have a long list of coordinates, which was created by the following >>> > code: >>> > > testIranges<- IRanges(start=currCoorList $starts, >>> > > end=currCoorLis$ends) >>> > > >>> > > testTrack<-GenomicData(testIranges,chrom=currCoorList$chr, >>> > > strand=currCoorList$strand,genome="mm9") >>> > > >>> > > I could then export the testTrack into .bed file and upload in the >>> UCSC >>> > > genome browser and get the sequences of the listed coordinate >>> > > But I was thinking if there is a function from rtracklayer to >>> retrieve >>> > > sequence directly. >>> > >>> > If you're trying to get coordinates from some reference genome, you >>> > can use the BSgenome.*.* packages and skip rtracklayer + internet >>> > queries entirely. >>> > >>> > For instance, say I have an IRanges object `r1` which is a set of >>> > intervals on chromosome 1 from hg18 that I want sequence information >>> > for, I could do it like this: >>> > >>> > R> r1 >>> > IRanges of length 589 >>> > start end width >>> > [1] 18016124 18016155 32 >>> > [2] 18020749 18020780 32 >>> > [3] 18024599 18024630 32 >>> > [4] 18024830 18024861 32 >>> > [5] 18051985 18052016 32 >>> > [6] 18064760 18064791 32 >>> > [7] 18088145 18088176 32 >>> > [8] 18088200 18088231 32 >>> > [9] 18110085 18110116 32 >>> > ... >>> > >>> > R> library(BSgenome.Hsapiens.UCSC.hg18) >>> > R> chr1 <- unmasked(HSapiens$chr1) >>> > R> myseqs <- Views(chr1, start=start(r1), end=end(r1)) >>> > R> myseqs >>> > Views on a 247249719-letter DNAString subject >>> > subject: >>> > >>> TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC TAA...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNN >>> > views: >>> > start end width >>> > [1] 18016124 18016155 32 [GTTTTTTGTTTTGTTTTGTTTTTTGTTTTTTA] >>> > [2] 18020749 18020780 32 [CGCCTAGTCTGGAGTGCAGTGGCACAATCTTG] >>> > [3] 18024599 18024630 32 [TCTCACCTCTTCACCGAAGTGGTCTTCTGAAC] >>> > [4] 18024830 18024861 32 [GAATTCTCCCTCGTTGCAGATCCTGTTTGAGT] >>> > [5] 18051985 18052016 32 [TTATTTTTTTTGAGGAAGAGTCTTGCTCTGTC] >>> > [6] 18064760 18064791 32 [GTGGGAGGATCGCTTGAGCCCTGGAGGTTGGG] >>> > [7] 18088145 18088176 32 [ATCATGGTGGGAGATGAAAGGCACTTCTTACA] >>> > [8] 18088200 18088231 32 [GAAGAAGCAAAAGCGGAAACCCCTGCTAAACC] >>> > [9] 18110085 18110116 32 [GGATCATGAGTTCAAGAGTTCGAGACCAGCCT] >>> > [10] 18118454 18118485 32 [TGTTGCCTAGGCTGGTCTCAAACTCCTGCCCT] >>> > ... >>> > >>> > calling "as.character" on the myseqs object will give you the sequence >>> > as a character vector. >>> > >>> > Does that get you what you're after? >>> > -steve >>> > >>> > -- >>> > Steve Lianoglou >>> > Graduate Student: Computational Systems Biology >>> > | Memorial Sloan-Kettering Cancer Center >>> > | Weill Medical College of Cornell University >>> > Contact Info: http://cbio.mskcc.org/~lianos/contact<http: cbio.="" mskcc.org="" %7elianos="" contact=""> >>> <http: cbio.mskcc.org="" %7elianos="" contact=""> >>> > >>> >>> >>> >>> -- >>> Sabrina >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@stat.math.ethz.ch >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> > > > -- > Sabrina > [[alternative HTML version deleted]]

ADD REPLY • link 14.8 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Hi, Michael: See bellow: Error in IRanges:::.normargSEW(start, "start") : 'start' must be a vector of integers > sessionInfo() R version 2.10.0 (2009-10-26) i386-pc-mingw32 locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] BSgenome.Mmusculus.UCSC.mm9_1.3.16 BSgenome_1.14.2 [3] Biostrings_2.14.5 rtracklayer_1.6.0 [5] RCurl_1.3-1 bitops_1.0-4.1 [7] IRanges_1.4.10 loaded via a namespace (and not attached): [1] Biobase_2.6.0 XML_2.6-0 Thanks! Sabrina On Tue, Mar 2, 2010 at 8:49 AM, Michael Lawrence <lawrence.michael@gene.com>wrote: > Could you paste your sessionInfo() output please? > > > On Mon, Mar 1, 2010 at 7:37 PM, sabrina s <sabrina.shao@gmail.com> wrote: > >> Hi, Michael: >> >> I tried as you suggested, but I had the following error: >> >> Error in IRanges:::.normargSEW(start, "start") : >> 'start' must be a vector of integers >> >> my testTrack looks like this: >> >> RangedData with 76 rows and... >> space ranges | strand >> <character> <iranges> | <factor> >> 1 chr1 [137159091, 137159290] | - >> 2 chr1 [155302742, 155302941] | - >> etc >> >> Any suggestions? Thanks! >> >> Sabrina >> >> >> On Mon, Mar 1, 2010 at 6:02 PM, Michael Lawrence < >> lawrence.michael@gene.com> wrote: >> >>> >>> >>> On Mon, Mar 1, 2010 at 1:02 PM, sabrina s <sabrina.shao@gmail.com>wrote: >>> >>>> Hi, Steve: >>>> I am trying to get it worked without separating the chromosomes. I used >>>> the >>>> following code: >>>> seqQuery<-data.frame( >>>> chr=currCoorList$chr, >>>> start=currCoorList$starts, >>>> end=currCoorList$ends, >>>> strand=currCoorList$strand) >>>> mySeq<- getSeq(Mmusculus, seqQuery$chr, >>>> start=seqQuery$start, end=seqQuery$end, >>>> strand=seqQuery$strand ) >>>> >>>> >>>> But then I got an error: >>>> Error in reverseComplement(DNAStringSet(seqs[ii])) : >>>> could not find function "copy" >>>> >>> >>> Did I miss some depending package? >>>> >>>> >>> Don't know what's up with that, but you don't need to use a data.frame >>> there. You already had a RangedData above, just pass that to getSeq, i.e. >>> >>> getSeq(Mmusculus, testTrack) >>> >>> Sabrina >>>> >>>> On Mon, Mar 1, 2010 at 2:31 PM, Steve Lianoglou < >>>> mailinglist.honeypot@gmail.com> wrote: >>>> >>>> > Hi Sabrina, >>>> > >>>> > On Mon, Mar 1, 2010 at 2:04 PM, sabrina s <sabrina.shao@gmail.com> >>>> wrote: >>>> > > Hi,all: >>>> > > I have a long list of coordinates, which was created by the >>>> following >>>> > code: >>>> > > testIranges<- IRanges(start=currCoorList $starts, >>>> > > end=currCoorLis$ends) >>>> > > >>>> > > testTrack<-GenomicData(testIranges,chrom=currCoorList$chr, >>>> > > strand=currCoorList$strand,genome="mm9") >>>> > > >>>> > > I could then export the testTrack into .bed file and upload in the >>>> UCSC >>>> > > genome browser and get the sequences of the listed coordinate >>>> > > But I was thinking if there is a function from rtracklayer to >>>> retrieve >>>> > > sequence directly. >>>> > >>>> > If you're trying to get coordinates from some reference genome, you >>>> > can use the BSgenome.*.* packages and skip rtracklayer + internet >>>> > queries entirely. >>>> > >>>> > For instance, say I have an IRanges object `r1` which is a set of >>>> > intervals on chromosome 1 from hg18 that I want sequence information >>>> > for, I could do it like this: >>>> > >>>> > R> r1 >>>> > IRanges of length 589 >>>> > start end width >>>> > [1] 18016124 18016155 32 >>>> > [2] 18020749 18020780 32 >>>> > [3] 18024599 18024630 32 >>>> > [4] 18024830 18024861 32 >>>> > [5] 18051985 18052016 32 >>>> > [6] 18064760 18064791 32 >>>> > [7] 18088145 18088176 32 >>>> > [8] 18088200 18088231 32 >>>> > [9] 18110085 18110116 32 >>>> > ... >>>> > >>>> > R> library(BSgenome.Hsapiens.UCSC.hg18) >>>> > R> chr1 <- unmasked(HSapiens$chr1) >>>> > R> myseqs <- Views(chr1, start=start(r1), end=end(r1)) >>>> > R> myseqs >>>> > Views on a 247249719-letter DNAString subject >>>> > subject: >>>> > >>>> TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC CTAA...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNN >>>> > views: >>>> > start end width >>>> > [1] 18016124 18016155 32 [GTTTTTTGTTTTGTTTTGTTTTTTGTTTTTTA] >>>> > [2] 18020749 18020780 32 [CGCCTAGTCTGGAGTGCAGTGGCACAATCTTG] >>>> > [3] 18024599 18024630 32 [TCTCACCTCTTCACCGAAGTGGTCTTCTGAAC] >>>> > [4] 18024830 18024861 32 [GAATTCTCCCTCGTTGCAGATCCTGTTTGAGT] >>>> > [5] 18051985 18052016 32 [TTATTTTTTTTGAGGAAGAGTCTTGCTCTGTC] >>>> > [6] 18064760 18064791 32 [GTGGGAGGATCGCTTGAGCCCTGGAGGTTGGG] >>>> > [7] 18088145 18088176 32 [ATCATGGTGGGAGATGAAAGGCACTTCTTACA] >>>> > [8] 18088200 18088231 32 [GAAGAAGCAAAAGCGGAAACCCCTGCTAAACC] >>>> > [9] 18110085 18110116 32 [GGATCATGAGTTCAAGAGTTCGAGACCAGCCT] >>>> > [10] 18118454 18118485 32 [TGTTGCCTAGGCTGGTCTCAAACTCCTGCCCT] >>>> > ... >>>> > >>>> > calling "as.character" on the myseqs object will give you the sequence >>>> > as a character vector. >>>> > >>>> > Does that get you what you're after? >>>> > -steve >>>> > >>>> > -- >>>> > Steve Lianoglou >>>> > Graduate Student: Computational Systems Biology >>>> > | Memorial Sloan-Kettering Cancer Center >>>> > | Weill Medical College of Cornell University >>>> > Contact Info: http://cbio.mskcc.org/~lianos/contact<http: cbio="" .mskcc.org="" %7elianos="" contact=""> >>>> <http: cbio.mskcc.org="" %7elianos="" contact=""> >>>> > >>>> >>>> >>>> >>>> -- >>>> Sabrina >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor@stat.math.ethz.ch >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> >> >> >> -- >> Sabrina >> > > -- Sabrina [[alternative HTML version deleted]]

ADD REPLY • link 14.8 years ago sabrina.shao ▴ 220

0

Entering edit mode

One thing you want to be careful with when you do stuff like this, is that the annotation matches up _exactly_ with the genome you are using. This really bit me in S. cerevisiae. Essentially even a small indel will make everything out of phase. And you really need to watch out for different versions of the genome. Sometimes extra stuff gets resolved at the end of the chromosomes. In this case it seems you are using UCSC annotation and BSgenome so it should work. But I urge you to double check. This is something that can be very hard to track down. Kasper On Tue, Mar 2, 2010 at 9:21 AM, sabrina s <sabrina.shao at="" gmail.com=""> wrote: > Hi, Michael: > > See bellow: > > > Error in IRanges:::.normargSEW(start, "start") : > ?'start' must be a vector of integers >> sessionInfo() > R version 2.10.0 (2009-10-26) > i386-pc-mingw32 > > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base > > other attached packages: > [1] BSgenome.Mmusculus.UCSC.mm9_1.3.16 BSgenome_1.14.2 > [3] Biostrings_2.14.5 ? ? ? ? ? ? ? ? ?rtracklayer_1.6.0 > [5] RCurl_1.3-1 ? ? ? ? ? ? ? ? ? ? ? ?bitops_1.0-4.1 > [7] IRanges_1.4.10 > > loaded via a namespace (and not attached): > [1] Biobase_2.6.0 XML_2.6-0 > > > Thanks! > > Sabrina > > > On Tue, Mar 2, 2010 at 8:49 AM, Michael Lawrence > <lawrence.michael at="" gene.com="">wrote: > >> Could you paste your sessionInfo() output please? >> >> >> On Mon, Mar 1, 2010 at 7:37 PM, sabrina s <sabrina.shao at="" gmail.com=""> wrote: >> >>> Hi, Michael: >>> >>> I tried as you suggested, but I had the following error: >>> >>> Error in IRanges:::.normargSEW(start, "start") : >>> ? 'start' must be a vector of integers >>> >>> my testTrack looks like this: >>> >>> RangedData with 76 rows and... >>> ? ? ? ? ?space ? ? ? ? ? ? ? ? ranges | ? ? ? ?strand >>> ? ?<character> ? ? ? ? ? ? ?<iranges> | ?<factor> >>> 1 ? ? ? ? chr1 [137159091, 137159290] | ? ? ? ?- >>> 2 ? ? ? ? chr1 [155302742, 155302941] | ? ? ? - >>> etc >>> >>> Any suggestions? Thanks! >>> >>> Sabrina >>> >>> >>> On Mon, Mar 1, 2010 at 6:02 PM, Michael Lawrence < >>> lawrence.michael at gene.com> wrote: >>> >>>> >>>> >>>> On Mon, Mar 1, 2010 at 1:02 PM, sabrina s <sabrina.shao at="" gmail.com="">wrote: >>>> >>>>> Hi, Steve: >>>>> I am trying to get it worked without separating the chromosomes. I used >>>>> the >>>>> following code: >>>>> seqQuery<-data.frame( >>>>> ? ? ? ?chr=currCoorList$chr, >>>>> ? ?start=currCoorList$starts, >>>>> ? ?end=currCoorList$ends, >>>>> ? ?strand=currCoorList$strand) >>>>> ? ?mySeq<- getSeq(Mmusculus, seqQuery$chr, >>>>> ? ? ? ? start=seqQuery$start, end=seqQuery$end, >>>>> ? ? ? ? ? ?strand=seqQuery$strand ) >>>>> >>>>> >>>>> But then I got an error: >>>>> Error in reverseComplement(DNAStringSet(seqs[ii])) : >>>>> ?could not find function "copy" >>>>> >>>> >>>> Did I miss some depending package? >>>>> >>>>> >>>> Don't know what's up with that, but you don't need to use a data.frame >>>> there. You already had a RangedData above, just pass that to getSeq, i.e. >>>> >>>> getSeq(Mmusculus, testTrack) >>>> >>>> ?Sabrina >>>>> >>>>> On Mon, Mar 1, 2010 at 2:31 PM, Steve Lianoglou < >>>>> mailinglist.honeypot at gmail.com> wrote: >>>>> >>>>> > Hi Sabrina, >>>>> > >>>>> > On Mon, Mar 1, 2010 at 2:04 PM, sabrina s <sabrina.shao at="" gmail.com=""> >>>>> wrote: >>>>> > > Hi,all: >>>>> > > I have a long list of coordinates, which was created by the >>>>> following >>>>> > code: >>>>> > > testIranges<- IRanges(start=currCoorList $starts, >>>>> > > ? ? ? ?end=currCoorLis$ends) >>>>> > > >>>>> > > ? testTrack<-GenomicData(testIranges,chrom=currCoorList$chr, >>>>> > > ? ? ? ?strand=currCoorList$strand,genome="mm9") >>>>> > > >>>>> > > I could then export the testTrack into .bed file and upload in the >>>>> UCSC >>>>> > > genome browser and get the sequences of the listed coordinate >>>>> > > But I was thinking if there is a function from rtracklayer to >>>>> retrieve >>>>> > > sequence directly. >>>>> > >>>>> > If you're trying to get coordinates from some reference genome, you >>>>> > can use the BSgenome.*.* packages and skip rtracklayer + internet >>>>> > queries entirely. >>>>> > >>>>> > For instance, say I have an IRanges object `r1` which is a set of >>>>> > intervals on chromosome 1 from hg18 that I want sequence information >>>>> > for, I could do it like this: >>>>> > >>>>> > R> r1 >>>>> > IRanges of length 589 >>>>> > ? ? ? ?start ? ? ?end width >>>>> > [1] ? 18016124 18016155 ? ?32 >>>>> > [2] ? 18020749 18020780 ? ?32 >>>>> > [3] ? 18024599 18024630 ? ?32 >>>>> > [4] ? 18024830 18024861 ? ?32 >>>>> > [5] ? 18051985 18052016 ? ?32 >>>>> > [6] ? 18064760 18064791 ? ?32 >>>>> > [7] ? 18088145 18088176 ? ?32 >>>>> > [8] ? 18088200 18088231 ? ?32 >>>>> > [9] ? 18110085 18110116 ? ?32 >>>>> > ... >>>>> > >>>>> > R> library(BSgenome.Hsapiens.UCSC.hg18) >>>>> > R> chr1 <- unmasked(HSapiens$chr1) >>>>> > R> myseqs <- Views(chr1, start=start(r1), end=end(r1)) >>>>> > R> myseqs >>>>> > ?Views on a 247249719-letter DNAString subject >>>>> > subject: >>>>> > >>>>> TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC CCTAA...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNN >>>>> > views: >>>>> > ? ? ? ?start ? ? ?end width >>>>> > [1] 18016124 18016155 ? ?32 [GTTTTTTGTTTTGTTTTGTTTTTTGTTTTTTA] >>>>> > [2] 18020749 18020780 ? ?32 [CGCCTAGTCTGGAGTGCAGTGGCACAATCTTG] >>>>> > [3] 18024599 18024630 ? ?32 [TCTCACCTCTTCACCGAAGTGGTCTTCTGAAC] >>>>> > [4] 18024830 18024861 ? ?32 [GAATTCTCCCTCGTTGCAGATCCTGTTTGAGT] >>>>> > [5] 18051985 18052016 ? ?32 [TTATTTTTTTTGAGGAAGAGTCTTGCTCTGTC] >>>>> > [6] 18064760 18064791 ? ?32 [GTGGGAGGATCGCTTGAGCCCTGGAGGTTGGG] >>>>> > [7] 18088145 18088176 ? ?32 [ATCATGGTGGGAGATGAAAGGCACTTCTTACA] >>>>> > [8] 18088200 18088231 ? ?32 [GAAGAAGCAAAAGCGGAAACCCCTGCTAAACC] >>>>> > [9] 18110085 18110116 ? ?32 [GGATCATGAGTTCAAGAGTTCGAGACCAGCCT] >>>>> > [10] 18118454 18118485 ? ?32 [TGTTGCCTAGGCTGGTCTCAAACTCCTGCCCT] >>>>> > ... >>>>> > >>>>> > calling "as.character" on the myseqs object will give you the sequence >>>>> > as a character vector. >>>>> > >>>>> > Does that get you what you're after? >>>>> > -steve >>>>> > >>>>> > -- >>>>> > Steve Lianoglou >>>>> > Graduate Student: Computational Systems Biology >>>>> > ?| Memorial Sloan-Kettering Cancer Center >>>>> > ?| Weill Medical College of Cornell University >>>>> > Contact Info: http://cbio.mskcc.org/~lianos/contact<http: cbi="" o.mskcc.org="" %7elianos="" contact=""> >>>>> <http: cbio.mskcc.org="" %7elianos="" contact=""> >>>>> > >>>>> >>>>> >>>>> >>>>> -- >>>>> Sabrina >>>>> >>>>> ? ? ? ?[[alternative HTML version deleted]] >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at stat.math.ethz.ch >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>> >>>> >>> >>> >>> -- >>> Sabrina >>> >> >> > > > -- > Sabrina > > ? ? ? ?[[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD REPLY • link 14.8 years ago Kasper Daniel Hansen ★ 6.5k

0

Entering edit mode

Thanks! Kasper, it is good to know that. I am using UCSC annotation :) Sabrina On Tue, Mar 2, 2010 at 9:27 AM, Kasper Daniel Hansen < kasperdanielhansen@gmail.com> wrote: > One thing you want to be careful with when you do stuff like this, is > that the annotation matches up _exactly_ with the genome you are > using. This really bit me in S. cerevisiae. Essentially even a small > indel will make everything out of phase. And you really need to watch > out for different versions of the genome. Sometimes extra stuff gets > resolved at the end of the chromosomes. > > In this case it seems you are using UCSC annotation and BSgenome so it > should work. But I urge you to double check. This is something that > can be very hard to track down. > > Kasper > > On Tue, Mar 2, 2010 at 9:21 AM, sabrina s <sabrina.shao@gmail.com> wrote: > > Hi, Michael: > > > > See bellow: > > > > > > Error in IRanges:::.normargSEW(start, "start") : > > 'start' must be a vector of integers > >> sessionInfo() > > R version 2.10.0 (2009-10-26) > > i386-pc-mingw32 > > > > locale: > > [1] LC_COLLATE=English_United States.1252 > > [2] LC_CTYPE=English_United States.1252 > > [3] LC_MONETARY=English_United States.1252 > > [4] LC_NUMERIC=C > > [5] LC_TIME=English_United States.1252 > > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > > > other attached packages: > > [1] BSgenome.Mmusculus.UCSC.mm9_1.3.16 BSgenome_1.14.2 > > [3] Biostrings_2.14.5 rtracklayer_1.6.0 > > [5] RCurl_1.3-1 bitops_1.0-4.1 > > [7] IRanges_1.4.10 > > > > loaded via a namespace (and not attached): > > [1] Biobase_2.6.0 XML_2.6-0 > > > > > > Thanks! > > > > Sabrina > > > > > > On Tue, Mar 2, 2010 at 8:49 AM, Michael Lawrence > > <lawrence.michael@gene.com>wrote: > > > >> Could you paste your sessionInfo() output please? > >> > >> > >> On Mon, Mar 1, 2010 at 7:37 PM, sabrina s <sabrina.shao@gmail.com> > wrote: > >> > >>> Hi, Michael: > >>> > >>> I tried as you suggested, but I had the following error: > >>> > >>> Error in IRanges:::.normargSEW(start, "start") : > >>> 'start' must be a vector of integers > >>> > >>> my testTrack looks like this: > >>> > >>> RangedData with 76 rows and... > >>> space ranges | strand > >>> <character> <iranges> | <factor> > >>> 1 chr1 [137159091, 137159290] | - > >>> 2 chr1 [155302742, 155302941] | - > >>> etc > >>> > >>> Any suggestions? Thanks! > >>> > >>> Sabrina > >>> > >>> > >>> On Mon, Mar 1, 2010 at 6:02 PM, Michael Lawrence < > >>> lawrence.michael@gene.com> wrote: > >>> > >>>> > >>>> > >>>> On Mon, Mar 1, 2010 at 1:02 PM, sabrina s <sabrina.shao@gmail.com> >wrote: > >>>> > >>>>> Hi, Steve: > >>>>> I am trying to get it worked without separating the chromosomes. I > used > >>>>> the > >>>>> following code: > >>>>> seqQuery<-data.frame( > >>>>> chr=currCoorList$chr, > >>>>> start=currCoorList$starts, > >>>>> end=currCoorList$ends, > >>>>> strand=currCoorList$strand) > >>>>> mySeq<- getSeq(Mmusculus, seqQuery$chr, > >>>>> start=seqQuery$start, end=seqQuery$end, > >>>>> strand=seqQuery$strand ) > >>>>> > >>>>> > >>>>> But then I got an error: > >>>>> Error in reverseComplement(DNAStringSet(seqs[ii])) : > >>>>> could not find function "copy" > >>>>> > >>>> > >>>> Did I miss some depending package? > >>>>> > >>>>> > >>>> Don't know what's up with that, but you don't need to use a data.frame > >>>> there. You already had a RangedData above, just pass that to getSeq, > i.e. > >>>> > >>>> getSeq(Mmusculus, testTrack) > >>>> > >>>> Sabrina > >>>>> > >>>>> On Mon, Mar 1, 2010 at 2:31 PM, Steve Lianoglou < > >>>>> mailinglist.honeypot@gmail.com> wrote: > >>>>> > >>>>> > Hi Sabrina, > >>>>> > > >>>>> > On Mon, Mar 1, 2010 at 2:04 PM, sabrina s <sabrina.shao@gmail.com> > >>>>> wrote: > >>>>> > > Hi,all: > >>>>> > > I have a long list of coordinates, which was created by the > >>>>> following > >>>>> > code: > >>>>> > > testIranges<- IRanges(start=currCoorList $starts, > >>>>> > > end=currCoorLis$ends) > >>>>> > > > >>>>> > > testTrack<-GenomicData(testIranges,chrom=currCoorList$chr, > >>>>> > > strand=currCoorList$strand,genome="mm9") > >>>>> > > > >>>>> > > I could then export the testTrack into .bed file and upload in > the > >>>>> UCSC > >>>>> > > genome browser and get the sequences of the listed coordinate > >>>>> > > But I was thinking if there is a function from rtracklayer to > >>>>> retrieve > >>>>> > > sequence directly. > >>>>> > > >>>>> > If you're trying to get coordinates from some reference genome, you > >>>>> > can use the BSgenome.*.* packages and skip rtracklayer + internet > >>>>> > queries entirely. > >>>>> > > >>>>> > For instance, say I have an IRanges object `r1` which is a set of > >>>>> > intervals on chromosome 1 from hg18 that I want sequence > information > >>>>> > for, I could do it like this: > >>>>> > > >>>>> > R> r1 > >>>>> > IRanges of length 589 > >>>>> > start end width > >>>>> > [1] 18016124 18016155 32 > >>>>> > [2] 18020749 18020780 32 > >>>>> > [3] 18024599 18024630 32 > >>>>> > [4] 18024830 18024861 32 > >>>>> > [5] 18051985 18052016 32 > >>>>> > [6] 18064760 18064791 32 > >>>>> > [7] 18088145 18088176 32 > >>>>> > [8] 18088200 18088231 32 > >>>>> > [9] 18110085 18110116 32 > >>>>> > ... > >>>>> > > >>>>> > R> library(BSgenome.Hsapiens.UCSC.hg18) > >>>>> > R> chr1 <- unmasked(HSapiens$chr1) > >>>>> > R> myseqs <- Views(chr1, start=start(r1), end=end(r1)) > >>>>> > R> myseqs > >>>>> > Views on a 247249719-letter DNAString subject > >>>>> > subject: > >>>>> > > >>>>> > TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA A...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNN > >>>>> > views: > >>>>> > start end width > >>>>> > [1] 18016124 18016155 32 [GTTTTTTGTTTTGTTTTGTTTTTTGTTTTTTA] > >>>>> > [2] 18020749 18020780 32 [CGCCTAGTCTGGAGTGCAGTGGCACAATCTTG] > >>>>> > [3] 18024599 18024630 32 [TCTCACCTCTTCACCGAAGTGGTCTTCTGAAC] > >>>>> > [4] 18024830 18024861 32 [GAATTCTCCCTCGTTGCAGATCCTGTTTGAGT] > >>>>> > [5] 18051985 18052016 32 [TTATTTTTTTTGAGGAAGAGTCTTGCTCTGTC] > >>>>> > [6] 18064760 18064791 32 [GTGGGAGGATCGCTTGAGCCCTGGAGGTTGGG] > >>>>> > [7] 18088145 18088176 32 [ATCATGGTGGGAGATGAAAGGCACTTCTTACA] > >>>>> > [8] 18088200 18088231 32 [GAAGAAGCAAAAGCGGAAACCCCTGCTAAACC] > >>>>> > [9] 18110085 18110116 32 [GGATCATGAGTTCAAGAGTTCGAGACCAGCCT] > >>>>> > [10] 18118454 18118485 32 [TGTTGCCTAGGCTGGTCTCAAACTCCTGCCCT] > >>>>> > ... > >>>>> > > >>>>> > calling "as.character" on the myseqs object will give you the > sequence > >>>>> > as a character vector. > >>>>> > > >>>>> > Does that get you what you're after? > >>>>> > -steve > >>>>> > > >>>>> > -- > >>>>> > Steve Lianoglou > >>>>> > Graduate Student: Computational Systems Biology > >>>>> > | Memorial Sloan-Kettering Cancer Center > >>>>> > | Weill Medical College of Cornell University > >>>>> > Contact Info: http://cbio.mskcc.org/~lianos/contact<http: c="" bio.mskcc.org="" %7elianos="" contact=""> > <http: cbio.mskcc.org="" %7elianos="" contact=""> > >>>>> <http: cbio.mskcc.org="" %7elianos="" contact=""> > >>>>> > > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Sabrina > >>>>> > >>>>> [[alternative HTML version deleted]] > >>>>> > >>>>> _______________________________________________ > >>>>> Bioconductor mailing list > >>>>> Bioconductor@stat.math.ethz.ch > >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor > >>>>> Search the archives: > >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor > >>>>> > >>>> > >>>> > >>> > >>> > >>> -- > >>> Sabrina > >>> > >> > >> > > > > > > -- > > Sabrina > > > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@stat.math.ethz.ch > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- Sabrina [[alternative HTML version deleted]]

ADD REPLY • link 14.8 years ago sabrina.shao ▴ 220

0

Entering edit mode

On Tue, Mar 2, 2010 at 6:21 AM, sabrina s <sabrina.shao@gmail.com> wrote: > Hi, Michael: > > See bellow: > > > > Error in IRanges:::.normargSEW(start, "start") : > 'start' must be a vector of integers > > sessionInfo() > R version 2.10.0 (2009-10-26) > i386-pc-mingw32 > > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] BSgenome.Mmusculus.UCSC.mm9_1.3.16 BSgenome_1.14.2 > [3] Biostrings_2.14.5 rtracklayer_1.6.0 > [5] RCurl_1.3-1 bitops_1.0-4.1 > [7] IRanges_1.4.10 > > loaded via a namespace (and not attached): > [1] Biobase_2.6.0 XML_2.6-0 > > > Thanks! > > Ok. It seems likely that the syntax I mentioned is only possible with R 2.11 and the corresponding Bioc package versions. Thus, I would recommend sticking with your original syntax. Not sure why that is broken. > Sabrina > > > > On Tue, Mar 2, 2010 at 8:49 AM, Michael Lawrence < > lawrence.michael@gene.com> wrote: > >> Could you paste your sessionInfo() output please? >> >> >> On Mon, Mar 1, 2010 at 7:37 PM, sabrina s <sabrina.shao@gmail.com> wrote: >> >>> Hi, Michael: >>> >>> I tried as you suggested, but I had the following error: >>> >>> Error in IRanges:::.normargSEW(start, "start") : >>> 'start' must be a vector of integers >>> >>> my testTrack looks like this: >>> >>> RangedData with 76 rows and... >>> space ranges | strand >>> <character> <iranges> | <factor> >>> 1 chr1 [137159091, 137159290] | - >>> 2 chr1 [155302742, 155302941] | - >>> etc >>> >>> Any suggestions? Thanks! >>> >>> Sabrina >>> >>> >>> On Mon, Mar 1, 2010 at 6:02 PM, Michael Lawrence < >>> lawrence.michael@gene.com> wrote: >>> >>>> >>>> >>>> On Mon, Mar 1, 2010 at 1:02 PM, sabrina s <sabrina.shao@gmail.com>wrote: >>>> >>>>> Hi, Steve: >>>>> I am trying to get it worked without separating the chromosomes. I used >>>>> the >>>>> following code: >>>>> seqQuery<-data.frame( >>>>> chr=currCoorList$chr, >>>>> start=currCoorList$starts, >>>>> end=currCoorList$ends, >>>>> strand=currCoorList$strand) >>>>> mySeq<- getSeq(Mmusculus, seqQuery$chr, >>>>> start=seqQuery$start, end=seqQuery$end, >>>>> strand=seqQuery$strand ) >>>>> >>>>> >>>>> But then I got an error: >>>>> Error in reverseComplement(DNAStringSet(seqs[ii])) : >>>>> could not find function "copy" >>>>> >>>> >>>> Did I miss some depending package? >>>>> >>>>> >>>> Don't know what's up with that, but you don't need to use a data.frame >>>> there. You already had a RangedData above, just pass that to getSeq, i.e. >>>> >>>> getSeq(Mmusculus, testTrack) >>>> >>>> Sabrina >>>>> >>>>> On Mon, Mar 1, 2010 at 2:31 PM, Steve Lianoglou < >>>>> mailinglist.honeypot@gmail.com> wrote: >>>>> >>>>> > Hi Sabrina, >>>>> > >>>>> > On Mon, Mar 1, 2010 at 2:04 PM, sabrina s <sabrina.shao@gmail.com> >>>>> wrote: >>>>> > > Hi,all: >>>>> > > I have a long list of coordinates, which was created by the >>>>> following >>>>> > code: >>>>> > > testIranges<- IRanges(start=currCoorList $starts, >>>>> > > end=currCoorLis$ends) >>>>> > > >>>>> > > testTrack<-GenomicData(testIranges,chrom=currCoorList$chr, >>>>> > > strand=currCoorList$strand,genome="mm9") >>>>> > > >>>>> > > I could then export the testTrack into .bed file and upload in the >>>>> UCSC >>>>> > > genome browser and get the sequences of the listed coordinate >>>>> > > But I was thinking if there is a function from rtracklayer to >>>>> retrieve >>>>> > > sequence directly. >>>>> > >>>>> > If you're trying to get coordinates from some reference genome, you >>>>> > can use the BSgenome.*.* packages and skip rtracklayer + internet >>>>> > queries entirely. >>>>> > >>>>> > For instance, say I have an IRanges object `r1` which is a set of >>>>> > intervals on chromosome 1 from hg18 that I want sequence information >>>>> > for, I could do it like this: >>>>> > >>>>> > R> r1 >>>>> > IRanges of length 589 >>>>> > start end width >>>>> > [1] 18016124 18016155 32 >>>>> > [2] 18020749 18020780 32 >>>>> > [3] 18024599 18024630 32 >>>>> > [4] 18024830 18024861 32 >>>>> > [5] 18051985 18052016 32 >>>>> > [6] 18064760 18064791 32 >>>>> > [7] 18088145 18088176 32 >>>>> > [8] 18088200 18088231 32 >>>>> > [9] 18110085 18110116 32 >>>>> > ... >>>>> > >>>>> > R> library(BSgenome.Hsapiens.UCSC.hg18) >>>>> > R> chr1 <- unmasked(HSapiens$chr1) >>>>> > R> myseqs <- Views(chr1, start=start(r1), end=end(r1)) >>>>> > R> myseqs >>>>> > Views on a 247249719-letter DNAString subject >>>>> > subject: >>>>> > >>>>> TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC CCTAA...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNN >>>>> > views: >>>>> > start end width >>>>> > [1] 18016124 18016155 32 [GTTTTTTGTTTTGTTTTGTTTTTTGTTTTTTA] >>>>> > [2] 18020749 18020780 32 [CGCCTAGTCTGGAGTGCAGTGGCACAATCTTG] >>>>> > [3] 18024599 18024630 32 [TCTCACCTCTTCACCGAAGTGGTCTTCTGAAC] >>>>> > [4] 18024830 18024861 32 [GAATTCTCCCTCGTTGCAGATCCTGTTTGAGT] >>>>> > [5] 18051985 18052016 32 [TTATTTTTTTTGAGGAAGAGTCTTGCTCTGTC] >>>>> > [6] 18064760 18064791 32 [GTGGGAGGATCGCTTGAGCCCTGGAGGTTGGG] >>>>> > [7] 18088145 18088176 32 [ATCATGGTGGGAGATGAAAGGCACTTCTTACA] >>>>> > [8] 18088200 18088231 32 [GAAGAAGCAAAAGCGGAAACCCCTGCTAAACC] >>>>> > [9] 18110085 18110116 32 [GGATCATGAGTTCAAGAGTTCGAGACCAGCCT] >>>>> > [10] 18118454 18118485 32 [TGTTGCCTAGGCTGGTCTCAAACTCCTGCCCT] >>>>> > ... >>>>> > >>>>> > calling "as.character" on the myseqs object will give you the >>>>> sequence >>>>> > as a character vector. >>>>> > >>>>> > Does that get you what you're after? >>>>> > -steve >>>>> > >>>>> > -- >>>>> > Steve Lianoglou >>>>> > Graduate Student: Computational Systems Biology >>>>> > | Memorial Sloan-Kettering Cancer Center >>>>> > | Weill Medical College of Cornell University >>>>> > Contact Info: http://cbio.mskcc.org/~lianos/contact<http: cbi="" o.mskcc.org="" %7elianos="" contact=""> >>>>> <http: cbio.mskcc.org="" %7elianos="" contact=""> >>>>> > >>>>> >>>>> >>>>> >>>>> -- >>>>> Sabrina >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor@stat.math.ethz.ch >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>> >>>> >>> >>> >>> -- >>> Sabrina >>> >> >> > > > -- > Sabrina > [[alternative HTML version deleted]]

ADD REPLY • link 14.8 years ago Michael Lawrence ★ 11k

0

Entering edit mode

Hi Sabrina, Michael, It doesn't work for me either with a RangedData: > testTrack <- RangedData( space=c("chr1", "chrY"), ranges=IRanges(start=1:1000, width=50), strand=c("+", "+", "-", "-")) > library(BSgenome.Mmusculus.UCSC.mm9) > getSeq(Mmusculus, testTrack) Error in IRanges:::.normargSEW(start, "start") : 'start' must be a vector of integers But you can work around this by using: seq <- getSeq(Mmusculus, space(testTrack), start=start(testTrack), end=end(testTrack), strand=strand(testTrack)) See ?getSeq for more information. BTW it seems that your installation is not up-to-date (for example you have Biostrings 2.14.5, but the current version is 2.14.12). I strongly recommend that you update your installation first with: > source("http://bioconductor.org/biocLite.R") > update.packages(repos=biocinstallRepos(), ask=FALSE) Cheers, H. sabrina s wrote: > Hi, Michael: > > See bellow: > > > Error in IRanges:::.normargSEW(start, "start") : > 'start' must be a vector of integers >> sessionInfo() > R version 2.10.0 (2009-10-26) > i386-pc-mingw32 > > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] BSgenome.Mmusculus.UCSC.mm9_1.3.16 BSgenome_1.14.2 > [3] Biostrings_2.14.5 rtracklayer_1.6.0 > [5] RCurl_1.3-1 bitops_1.0-4.1 > [7] IRanges_1.4.10 > > loaded via a namespace (and not attached): > [1] Biobase_2.6.0 XML_2.6-0 > > > Thanks! > > Sabrina > > > On Tue, Mar 2, 2010 at 8:49 AM, Michael Lawrence > <lawrence.michael at="" gene.com="">wrote: > >> Could you paste your sessionInfo() output please? >> >> >> On Mon, Mar 1, 2010 at 7:37 PM, sabrina s <sabrina.shao at="" gmail.com=""> wrote: >> >>> Hi, Michael: >>> >>> I tried as you suggested, but I had the following error: >>> >>> Error in IRanges:::.normargSEW(start, "start") : >>> 'start' must be a vector of integers >>> >>> my testTrack looks like this: >>> >>> RangedData with 76 rows and... >>> space ranges | strand >>> <character> <iranges> | <factor> >>> 1 chr1 [137159091, 137159290] | - >>> 2 chr1 [155302742, 155302941] | - >>> etc >>> >>> Any suggestions? Thanks! >>> >>> Sabrina >>> >>> >>> On Mon, Mar 1, 2010 at 6:02 PM, Michael Lawrence < >>> lawrence.michael at gene.com> wrote: >>> >>>> >>>> On Mon, Mar 1, 2010 at 1:02 PM, sabrina s <sabrina.shao at="" gmail.com="">wrote: >>>> >>>>> Hi, Steve: >>>>> I am trying to get it worked without separating the chromosomes. I used >>>>> the >>>>> following code: >>>>> seqQuery<-data.frame( >>>>> chr=currCoorList$chr, >>>>> start=currCoorList$starts, >>>>> end=currCoorList$ends, >>>>> strand=currCoorList$strand) >>>>> mySeq<- getSeq(Mmusculus, seqQuery$chr, >>>>> start=seqQuery$start, end=seqQuery$end, >>>>> strand=seqQuery$strand ) >>>>> >>>>> >>>>> But then I got an error: >>>>> Error in reverseComplement(DNAStringSet(seqs[ii])) : >>>>> could not find function "copy" >>>>> >>>> Did I miss some depending package? >>>>> >>>> Don't know what's up with that, but you don't need to use a data.frame >>>> there. You already had a RangedData above, just pass that to getSeq, i.e. >>>> >>>> getSeq(Mmusculus, testTrack) >>>> >>>> Sabrina >>>>> On Mon, Mar 1, 2010 at 2:31 PM, Steve Lianoglou < >>>>> mailinglist.honeypot at gmail.com> wrote: >>>>> >>>>>> Hi Sabrina, >>>>>> >>>>>> On Mon, Mar 1, 2010 at 2:04 PM, sabrina s <sabrina.shao at="" gmail.com=""> >>>>> wrote: >>>>>>> Hi,all: >>>>>>> I have a long list of coordinates, which was created by the >>>>> following >>>>>> code: >>>>>>> testIranges<- IRanges(start=currCoorList $starts, >>>>>>> end=currCoorLis$ends) >>>>>>> >>>>>>> testTrack<-GenomicData(testIranges,chrom=currCoorList$chr, >>>>>>> strand=currCoorList$strand,genome="mm9") >>>>>>> >>>>>>> I could then export the testTrack into .bed file and upload in the >>>>> UCSC >>>>>>> genome browser and get the sequences of the listed coordinate >>>>>>> But I was thinking if there is a function from rtracklayer to >>>>> retrieve >>>>>>> sequence directly. >>>>>> If you're trying to get coordinates from some reference genome, you >>>>>> can use the BSgenome.*.* packages and skip rtracklayer + internet >>>>>> queries entirely. >>>>>> >>>>>> For instance, say I have an IRanges object `r1` which is a set of >>>>>> intervals on chromosome 1 from hg18 that I want sequence information >>>>>> for, I could do it like this: >>>>>> >>>>>> R> r1 >>>>>> IRanges of length 589 >>>>>> start end width >>>>>> [1] 18016124 18016155 32 >>>>>> [2] 18020749 18020780 32 >>>>>> [3] 18024599 18024630 32 >>>>>> [4] 18024830 18024861 32 >>>>>> [5] 18051985 18052016 32 >>>>>> [6] 18064760 18064791 32 >>>>>> [7] 18088145 18088176 32 >>>>>> [8] 18088200 18088231 32 >>>>>> [9] 18110085 18110116 32 >>>>>> ... >>>>>> >>>>>> R> library(BSgenome.Hsapiens.UCSC.hg18) >>>>>> R> chr1 <- unmasked(HSapiens$chr1) >>>>>> R> myseqs <- Views(chr1, start=start(r1), end=end(r1)) >>>>>> R> myseqs >>>>>> Views on a 247249719-letter DNAString subject >>>>>> subject: >>>>>> >>>>> TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC CCTAA...NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNN >>>>>> views: >>>>>> start end width >>>>>> [1] 18016124 18016155 32 [GTTTTTTGTTTTGTTTTGTTTTTTGTTTTTTA] >>>>>> [2] 18020749 18020780 32 [CGCCTAGTCTGGAGTGCAGTGGCACAATCTTG] >>>>>> [3] 18024599 18024630 32 [TCTCACCTCTTCACCGAAGTGGTCTTCTGAAC] >>>>>> [4] 18024830 18024861 32 [GAATTCTCCCTCGTTGCAGATCCTGTTTGAGT] >>>>>> [5] 18051985 18052016 32 [TTATTTTTTTTGAGGAAGAGTCTTGCTCTGTC] >>>>>> [6] 18064760 18064791 32 [GTGGGAGGATCGCTTGAGCCCTGGAGGTTGGG] >>>>>> [7] 18088145 18088176 32 [ATCATGGTGGGAGATGAAAGGCACTTCTTACA] >>>>>> [8] 18088200 18088231 32 [GAAGAAGCAAAAGCGGAAACCCCTGCTAAACC] >>>>>> [9] 18110085 18110116 32 [GGATCATGAGTTCAAGAGTTCGAGACCAGCCT] >>>>>> [10] 18118454 18118485 32 [TGTTGCCTAGGCTGGTCTCAAACTCCTGCCCT] >>>>>> ... >>>>>> >>>>>> calling "as.character" on the myseqs object will give you the sequence >>>>>> as a character vector. >>>>>> >>>>>> Does that get you what you're after? >>>>>> -steve >>>>>> >>>>>> -- >>>>>> Steve Lianoglou >>>>>> Graduate Student: Computational Systems Biology >>>>>> | Memorial Sloan-Kettering Cancer Center >>>>>> | Weill Medical College of Cornell University >>>>>> Contact Info: http://cbio.mskcc.org/~lianos/contact<http: cbio="" .mskcc.org="" %7elianos="" contact=""> >>>>> <http: cbio.mskcc.org="" %7elianos="" contact=""> >>>>> >>>>> >>>>> -- >>>>> Sabrina >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at stat.math.ethz.ch >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>> >>> >>> -- >>> Sabrina >>> >> > > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319

ADD REPLY • link 14.8 years ago Hervé Pagès 16k

Login before adding your answer.