Counting number of intervals with or without GRange
1
0
Entering edit mode
@hermann-norpois-5483
Last seen 9.6 years ago
I have data as GRange object test.gr GRanges with 20 ranges and 0 metadata columns: seqnames ranges strand <rle> <iranges> <rle> [1] chr1 [ 10246, 10283] * [2] chr1 [237718, 237887] * [3] chr1 [521553, 521613] * [4] chr1 [564413, 564686] * or as classical dataframe head (test) seqnames start end width strand 1 chr1 10246 10283 38 * 2 chr1 237718 237887 170 * 3 chr1 521553 521613 61 * 4 chr1 564413 564686 274 * and I wish to count how often do intervals of a certain range exist. I would be happy to get something like this: width occurence of the width sum of the occurence 0-50 2 2 50-100 4 6 100-150 5 11 I would prefer to do this using GRange objects directly but I would be happy to know a way how to do it via test (classical dataframe). Could you please give me a hint. Thanks Hermann P.S.: My data: > dput (test) structure(list(seqnames = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chrX", "chrY"), class = "factor"), start = c(10246L, 237718L, 521553L, 564413L, 565252L, 566538L, 567450L, 568777L, 569457L, 669190L, 713884L, 740270L, 752164L, 753266L, 754421L, 762051L, 785811L, 786878L, 791087L, 793485L), end = c(10283L, 237887L, 521613L, 564686L, 566081L, 567275L, 568583L, 569302L, 570125L, 669259L, 714603L, 740423L, 752782L, 753697L, 754711L, 763151L, 786025L, 787053L, 791138L, 793582L), width = c(38L, 170L, 61L, 274L, 830L, 738L, 1134L, 526L, 669L, 70L, 720L, 154L, 619L, 432L, 291L, 1101L, 215L, 176L, 52L, 98L), strand = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("+", "-", "*"), class = "factor")), .Names = c("seqnames", "start", "end", "width", "strand"), row.names = c(NA, 20L), class = "data.frame") > dput test.gr) new("GRanges" , seqnames = new("Rle" , values = structure(1L, .Label = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chrX", "chrY"), class = "factor") , lengths = 20L , elementMetadata = NULL , metadata = list() ) , ranges = new("IRanges" , start = c(10246L, 237718L, 521553L, 564413L, 565252L, 566538L, 567450L, 568777L, 569457L, 669190L, 713884L, 740270L, 752164L, 753266L, 754421L, 762051L, 785811L, 786878L, 791087L, 793485L) , width = c(38L, 170L, 61L, 274L, 830L, 738L, 1134L, 526L, 669L, 70L, 720L, 154L, 619L, 432L, 291L, 1101L, 215L, 176L, 52L, 98L) , NAMES = NULL , elementType = "integer" , elementMetadata = NULL , metadata = list() ) , strand = new("Rle" , values = structure(3L, .Label = c("+", "-", "*"), class = "factor") , lengths = 20L , elementMetadata = NULL , metadata = list() ) , elementMetadata = new("DataFrame" , rownames = NULL , nrows = 20L , listData = structure(list(), .Names = character(0)) , elementType = "ANY" , elementMetadata = NULL , metadata = list() ) , seqinfo = new("Seqinfo" , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chrX", "chrY") , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_) , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA) , genome = c(NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, NA_character_ ) ) , metadata = list() ) > [[alternative HTML version deleted]]
• 1.0k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 3 days ago
United States
On 12/23/2012 08:42 AM, Hermann Norpois wrote: > I have data as GRange object > > test.gr > GRanges with 20 ranges and 0 metadata columns: > seqnames ranges strand > <rle> <iranges> <rle> > [1] chr1 [ 10246, 10283] * > [2] chr1 [237718, 237887] * > [3] chr1 [521553, 521613] * > [4] chr1 [564413, 564686] * > > or as classical dataframe > > head (test) > seqnames start end width strand > 1 chr1 10246 10283 38 * > 2 chr1 237718 237887 170 * > 3 chr1 521553 521613 61 * > 4 chr1 564413 564686 274 * > > and I wish to count how often do intervals of a certain range exist. I > would be happy to get something like this: > > width occurence of the width sum of the occurence > 0-50 2 2 > 50-100 4 6 > 100-150 5 11 > > I would prefer to do this using GRange objects directly but I would be > happy to know a way how to do it via test (classical dataframe). > Could you please give me a hint. Hi Hermann -- Maybe > w = widthtest.gr) > as.data.frame(table(cut(w, seq(0, max(w), by=50)))) Var1 Freq 1 (0,50] 1 2 (50,100] 4 3 (100,150] 0 4 (150,200] 3 ... or a little more 'raw' > tabulate(cut(w, seq(0, max(w), by=50))) [1] 1 4 0 3 1 2 0 0 1 0 1 0 1 1 2 0 1 > cumsum(tabulate(cut(w, seq(0, max(w), by=50)))) [1] 1 5 5 8 9 11 11 11 12 12 13 13 14 15 17 17 18 ? Martin > > > Thanks Hermann > > > P.S.: My data: > >> dput (test) > structure(list(seqnames = structure(c(1L, 1L, 1L, 1L, 1L, 1L, > 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("chr1", > "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", > "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", "chr22", > "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chrX", > "chrY"), class = "factor"), start = c(10246L, 237718L, 521553L, > 564413L, 565252L, 566538L, 567450L, 568777L, 569457L, 669190L, > 713884L, 740270L, 752164L, 753266L, 754421L, 762051L, 785811L, > 786878L, 791087L, 793485L), end = c(10283L, 237887L, 521613L, > 564686L, 566081L, 567275L, 568583L, 569302L, 570125L, 669259L, > 714603L, 740423L, 752782L, 753697L, 754711L, 763151L, 786025L, > 787053L, 791138L, 793582L), width = c(38L, 170L, 61L, 274L, 830L, > 738L, 1134L, 526L, 669L, 70L, 720L, 154L, 619L, 432L, 291L, 1101L, > 215L, 176L, 52L, 98L), strand = structure(c(3L, 3L, 3L, 3L, 3L, > 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = > c("+", > "-", "*"), class = "factor")), .Names = c("seqnames", "start", > "end", "width", "strand"), row.names = c(NA, 20L), class = "data.frame") >> dput test.gr) > new("GRanges" > , seqnames = new("Rle" > , values = structure(1L, .Label = c("chr1", "chr10", "chr11", "chr12", > "chr13", > "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", > "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", > "chr8", "chr9", "chrX", "chrY"), class = "factor") > , lengths = 20L > , elementMetadata = NULL > , metadata = list() > ) > , ranges = new("IRanges" > , start = c(10246L, 237718L, 521553L, 564413L, 565252L, 566538L, > 567450L, > 568777L, 569457L, 669190L, 713884L, 740270L, 752164L, 753266L, > 754421L, 762051L, 785811L, 786878L, 791087L, 793485L) > , width = c(38L, 170L, 61L, 274L, 830L, 738L, 1134L, 526L, 669L, 70L, > 720L, > 154L, 619L, 432L, 291L, 1101L, 215L, 176L, 52L, 98L) > , NAMES = NULL > , elementType = "integer" > , elementMetadata = NULL > , metadata = list() > ) > , strand = new("Rle" > , values = structure(3L, .Label = c("+", "-", "*"), class = "factor") > , lengths = 20L > , elementMetadata = NULL > , metadata = list() > ) > , elementMetadata = new("DataFrame" > , rownames = NULL > , nrows = 20L > , listData = structure(list(), .Names = character(0)) > , elementType = "ANY" > , elementMetadata = NULL > , metadata = list() > ) > , seqinfo = new("Seqinfo" > , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", > "chr15", > "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", > "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", > "chrX", "chrY") > , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, > NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_) > , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, > NA, NA, > NA, NA, NA, NA, NA, NA, NA, NA, NA) > , genome = c(NA_character_, NA_character_, NA_character_, > NA_character_, > NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, > NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, > NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, > NA_character_, NA_character_, NA_character_, NA_character_, NA_character_ > ) > ) > , metadata = list() > ) >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENT
0
Entering edit mode
Woops, didn't see this, sorry .... this is the ticket, though :-) On Sunday, December 23, 2012, Martin Morgan wrote: > On 12/23/2012 08:42 AM, Hermann Norpois wrote: > >> I have data as GRange object >> >> test.gr >> GRanges with 20 ranges and 0 metadata columns: >> seqnames ranges strand >> <rle> <iranges> <rle> >> [1] chr1 [ 10246, 10283] * >> [2] chr1 [237718, 237887] * >> [3] chr1 [521553, 521613] * >> [4] chr1 [564413, 564686] * >> >> or as classical dataframe >> >> head (test) >> seqnames start end width strand >> 1 chr1 10246 10283 38 * >> 2 chr1 237718 237887 170 * >> 3 chr1 521553 521613 61 * >> 4 chr1 564413 564686 274 * >> >> and I wish to count how often do intervals of a certain range exist. I >> would be happy to get something like this: >> >> width occurence of the width sum of the occurence >> 0-50 2 2 >> 50-100 4 6 >> 100-150 5 11 >> >> I would prefer to do this using GRange objects directly but I would be >> happy to know a way how to do it via test (classical dataframe). >> Could you please give me a hint. >> > > Hi Hermann -- > > Maybe > > > w = widthtest.gr) > > as.data.frame(table(cut(w, seq(0, max(w), by=50)))) > Var1 Freq > 1 (0,50] 1 > 2 (50,100] 4 > 3 (100,150] 0 > 4 (150,200] 3 > ... > > or a little more 'raw' > > > tabulate(cut(w, seq(0, max(w), by=50))) > [1] 1 4 0 3 1 2 0 0 1 0 1 0 1 1 2 0 1 > > cumsum(tabulate(cut(w, seq(0, max(w), by=50)))) > [1] 1 5 5 8 9 11 11 11 12 12 13 13 14 15 17 17 18 > > ? Martin > > > > > > Thanks Hermann > > > P.S.: My data: > > dput (test) > > structure(list(seqnames = structure(c(1L, 1L, 1L, 1L, 1L, 1L, > 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("chr1", > "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16", > "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", "chr22", > "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", "chrX", > "chrY"), class = "factor"), start = c(10246L, 237718L, 521553L, > 564413L, 565252L, 566538L, 567450L, 568777L, 569457L, 669190L, > 713884L, 740270L, 752164L, 753266L, 754421L, 762051L, 785811L, > 786878L, 791087L, 793485L), end = c(10283L, 237887L, 521613L, > 564686L, 566081L, 567275L, 568583L, 569302L, 570125L, 669259L, > 714603L, 740423L, 752782L, 753697L, 754711L, 763151L, 786025L, > 787053L, 791138L, 793582L), width = c(38L, 170L, 61L, 274L, 830L, > 738L, 1134L, 526L, 669L, 70L, 720L, 154L, 619L, 432L, 291L, 1101L, > 215L, 176L, 52L, 98L), strand = structure(c(3L, 3L, 3L, 3L, 3L, > 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = > c("+", > "-", "*"), class = "factor")), .Names = c("seqnames", "start", > "end", "width", "strand"), row.names = c(NA, 20L), class = "data.frame") > > dput test.gr) > > new("GRanges" > , seqnames = new("Rle" > , values = structure(1L, .Label = c("chr1", "chr10", "chr11", "chr12", > "chr13", > "chr14", "chr15", "chr16", "chr17", "chr18", "chr19", "chr2", > "chr20", "chr21", "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", > "chr8", "chr9", "chrX", "chrY"), class = "factor") > , lengths = 20L > , elementMetadata = NULL > , metadata = list() > ) > , ranges = new("IRanges" > , start = c(10246L, 237718L, 521553L, 564413L, 565252L, 566538L, > 567450L, > 568777L, 569457L, 669190L, 713884L, 740270L, 752164L, 753266L, > 754421L, 762051L, 785811L, 786878L, 791087L, 793485L) > , width = c(38L, 170L, 61L, 274L, 830L, 738L, 1134L, 526L, 669L, 70L, > 720L, > 154L, 619L, 432L, 291L, 1101L, 215L, 176L, 52L, 98L) > , NAMES = NULL > , elementType = "integer" > , elementMetadata = NULL > , metadata = list() > ) > , strand = new("Rle" > , values = structure(3L, .Label = c("+", "-", "*"), class = "factor") > , lengths = 20L > , elementMetadata = NULL > , metadata = list() > ) > , elementMetadata = new("DataFrame" > , rownames = NULL > , nrows = 20L > , listData = structure(list(), .Names = character(0)) > , elementType = "ANY" > , elementMetadata = NULL > , metadata = list() > ) > , seqinfo = new("Seqinfo" > , seqnames = c("chr1", "chr10", "chr11", "chr12", "chr13", "chr14", > "chr15", > "chr16", "chr17", "chr18", "chr19", "chr2", "chr20", "chr21", > "chr22", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", "chr9", > "chrX", "chrY") > , seqlengths = c(NA_integer_, NA_integer_, NA_integer_, NA_integer_, > NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_, > NA_integer_, NA_integer_, NA_integer_, NA_integer_) > , is_circular = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, > NA, NA, > NA, NA, NA, NA, NA, NA, NA, NA, NA) > , genome = c(NA_character_, NA_character_, NA_character_, > NA_character_, > NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, > NA_character_, NA_character_, NA_character_, NA_character_, NA_character_, > NA_character_, NA_character_, NA_character_, NA_characte > > -- > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 > > ______________________________**_________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/**listinfo/bioconductor<https: stat.et="" hz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: http://news.gmane.org/gmane.** > science.biology.informatics.**conductor<http: news.gmane.org="" gmane.="" science.biology.informatics.conductor=""> > -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 961 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6