Question

question about how to get ranges in Biostrings

0

Entering edit mode

Hervé Pagès 16k

@herve-pages-1542

Last seen 9 hours ago

Seattle, WA, United States

Hi Andress, [cc'ing the list so others can benefit from or participate to the thread] On 02/21/2013 06:43 AM, Andres Weinfeld Avalos Figueroa wrote: > Dear Dr Pag?s > Greetings from Guatemala. > I am somewhat new in using BioC, and an intermediate R user. > > Last couple of days I have been working with the matchProbePair function > on Sorghum genome, testing around 3,500 SSR primer pairs generated in > 2011 for sugarcane for the ICSB. I am using Sorghum as a framework due > to its genome stability and ploidy (besides genetic closeness) I have > succeeded in evaluating the 10 chromosomes, and obtained a interesting > amount of stringviews. > > Now I want to plot those theoretical amplicons along S. bicolor genome > (chromosome by crhromosome), but I have not found the way to substract > the ranges from this list. Being "result" the matchprobepair outputI > know that >range(result[[1]]) does it, but I want to use lapply() > function... To extract the ranges (as an IRanges object) from the result (a Views object), just use ranges(): > result <- matchProbePair(Fprobe, Rprobe, subject) > rg <- ranges(result) > rg IRanges of length 9 start end width [1] 6523541 6524340 800 [2] 7268925 8284316 1015392 [3] 9019256 9295206 275951 [4] 13503602 15492126 1988525 [5] 17612494 19210625 1598132 [6] 21840840 21862022 21183 [7] 22281529 22329032 47504 [8] 23045531 23096640 51110 [9] 23759210 24263538 504329 If you want to plot those ranges, for example with the Gviz package: library(Gviz) plotTracks(list(GenomeAxisTrack(), AnnotationTrack(GRanges("chr3R", rg)))) If you want to use lapply(), then: lapply(seq_along(rg), function(i) { ... do something with start(rg)[i] and end(rg)[i] ... }) Note that the plotting code and lapply code above would work directly on the Views object (result) so there is no need to extract the ranges in 'rg'. If you want to extract the sequences of the amplicons, just pass 'result' to the DNAStringSet constructor: > amplicons <- DNAStringSet(result) > amplicons A DNAStringSet instance of length 9 width seq [1] 800 AGCTCCGAGTTCCTGCAATATTTGAACGAGGAC...CTTCCAAGCCATCCGCATATTTGTGAACAACG [2] 1015392 AGCTCCGAGTTGAGATGAGGCTACCCACTTAAT...TATTTGGCCTCGGTCAAATCGAACTCGGAGCT [3] 275951 AGCTCCGAGTTACGTGTATCGATTACCCCGATT...ACTAGGAAATAAATAAAGTAGAACTCGGAGCT [4] 1988525 CGTTGTTCACATCCCTCTGCTGTTCCCGCTGAC...ACCGATGATAGAAAGAGTGTTAACTCGGAGCT [5] 1598132 CGTTGTTCACAATCGTCAGATCGAAGAAGTGAC...CACTTGGAACTTGACACTTGGAACTCGGAGCT [6] 21183 AGCTCCGAGTTCGTCGAGAATCTGCACGAGAAG...GCCATGTGCATAAGGGCTCGGAACTCGGAGCT [7] 47504 AGCTCCGAGTTGCGCTTCCTGATGGGCACGGAT...CGGGTGTAAAAAGCTGAGAGCAACTCGGAGCT [8] 51110 CGTTGTTCACATTTTGCTCAGAGCATTTGCATT...AAAACTAAACCCGCAGACTGATGTGAACAACG [9] 504329 AGCTCCGAGTTCTAAAAAGAGCAGTTTTATCAG...GCTGCCAATAAAGGAAAAGGAAACTCGGAGCT If you want to look at them individually, use [[ either on 'result' or 'amplicons': > result[[3]] 275951-letter "DNAString" instance seq: AGCTCCGAGTTACGTGTATCGATTACCCCGATTTTA...AAAAACTAGGAAATAAATAAAGTAGAACTCG GAGCT > amplicons[[3]] 275951-letter "DNAString" instance seq: AGCTCCGAGTTACGTGTATCGATTACCCCGATTTTA...AAAAACTAGGAAATAAATAAAGTAGAACTCG GAGCT If you want to loop on the sequences of the amplicons, you can use lapply() directly on 'result' or 'amplicons': lapply(amplicons, ...) But if you have a lot of amplicons (e.g. more than 10-20k), that's probably going to take a long time. Note that a lot of basic operations are serialized and fast even on big DNAStringSet objects: > alphabetFrequency(amplicons) A C G T M R W S Y K V H D B N - + [1,] 216 196 218 170 0 0 0 0 0 0 0 0 0 0 0 0 0 [2,] 287220 220536 221447 286189 0 0 0 0 0 0 0 0 0 0 0 0 0 [3,] 79566 58099 58205 80081 0 0 0 0 0 0 0 0 0 0 0 0 0 [4,] 573547 421203 423538 570237 0 0 0 0 0 0 0 0 0 0 0 0 0 [5,] 453126 345037 344723 455246 0 0 0 0 0 0 0 0 0 0 0 0 0 [6,] 5902 4635 4783 5863 0 0 0 0 0 0 0 0 0 0 0 0 0 [7,] 13708 10336 10342 13118 0 0 0 0 0 0 0 0 0 0 0 0 0 [8,] 12589 12586 12343 13592 0 0 0 0 0 0 0 0 0 0 0 0 0 [9,] 142837 108612 110988 141892 0 0 0 0 0 0 0 0 0 0 0 0 0 Hope this helps, H. > > This is a really basic question, I hope you can find a second to push me > throug... I would really appreciate your help. I am not in my computer > right now... don't have my work handy but in a couple of hours. > > Kind regards > Andres Avalos > Biotecnologist -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319

Cancer IRanges Gviz Cancer IRanges Gviz • 839 views

ADD COMMENT • link 11.2 years ago Hervé Pagès 16k