Search
Question: Using GenomicRanges with table data
0
gravatar for Tom Oates
4.7 years ago by
Tom Oates60
Tom Oates60 wrote:
Hi I am trying to use GenomicRanges as part of an anlalysis of sequencing data. I have a number of files which I wish to use to make GRanges objects. For example: chr1 579578 579804 CpG_12 chr1 630418 630623 CpG_11 chr1 804552 804763 CpG_9 chr1 1307051 1307362 CpG_16 chr1 1323599 1323808 CpG_9 chr1 1350549 1350758 CpG_12 chr1 1403287 1403637 CpG_20 chr1 1418906 1419488 CpG_28 This file is sorted such that chr1 is followed chr2, 3, 4, 5 etc to 20 (as opposed to chr10, 11...19, 2, 3 etc) I use to make the GRanges object cpgi_gr<-GRanges(seqnames=Rle(cpgi$V1), ranges=IRanges(start=cpgi$V2,end=cpgi$V3), UCSC_AL_ID=cpgi$V4) but then if I examine seqnames(cpgi_gr) I get factor-Rle of length 89611 with 21 runs Lengths: 8952 5602 6133 4973 5840 4260 ... 3132 3607 2793 3175 3842 1419 Values : chr1 chr2 chr3 chr4 chr5 chr6 ... chr16 chr17 chr18 chr19 chr20 chrX Levels(21): chr1 chr10 chr11 chr12 chr13 chr14 chr15 ... chr5 chr6 chr7 chr8 chr9 chrX So the Values & Levels are not matching. I hope to give the GRanges object seqlengths of the chr lengths in the genome so I can then perform flank etc tasks on the data so it is crucial that the values & lengths match. I imagine that this problem is based around my not understanding either IRanges or Rle sufficiently but I have read help on Rle objects & IRanges and can't work out how to ensure that the formation of the GRanges object leads to the chr values matching Thanks
ADD COMMENTlink modified 4.7 years ago by Steve Lianoglou12k • written 4.7 years ago by Tom Oates60
0
gravatar for Steve Lianoglou
4.7 years ago by
Genentech
Steve Lianoglou12k wrote:
Hi On Friday, January 11, 2013, Tom Oates wrote: > Hi > I am trying to use GenomicRanges as part of an anlalysis of sequencing > data. > I have a number of files which I wish to use to make GRanges objects. > For example: > > chr1 579578 579804 CpG_12 > chr1 630418 630623 CpG_11 > chr1 804552 804763 CpG_9 > chr1 1307051 1307362 CpG_16 > chr1 1323599 1323808 CpG_9 > chr1 1350549 1350758 CpG_12 > chr1 1403287 1403637 CpG_20 > chr1 1418906 1419488 CpG_28 > > This file is sorted such that chr1 is followed chr2, 3, 4, 5 etc to 20 > (as opposed to chr10, 11...19, 2, 3 etc) > > I use to make the GRanges object > cpgi_gr<-GRanges(seqnames=Rle(cpgi$V1), > ranges=IRanges(start=cpgi$V2,end=cpgi$V3), > UCSC_AL_ID=cpgi$V4) > > but then if I examine > > seqnames(cpgi_gr) > > > I get > factor-Rle of length 89611 with 21 runs > Lengths: 8952 5602 6133 4973 5840 4260 ... 3132 3607 2793 > 3175 3842 1419 > Values : chr1 chr2 chr3 chr4 chr5 chr6 ... chr16 chr17 chr18 > chr19 chr20 chrX > Levels(21): chr1 chr10 chr11 chr12 chr13 chr14 chr15 ... chr5 chr6 > chr7 chr8 chr9 chrX > > > So the Values & Levels are not matching. I'm not sure what you mean by this, but your object looks more or less fine to me. If by the "levels not matching" you mean that chr10 comes (instead of chr2) after chr1 in a call to "sea levels", then that should really affect your analysis ... HTH, Steve I hope to give the GRanges > object seqlengths of the chr lengths in the genome so I can then > perform flank etc tasks on the data so it is crucial that the values & > lengths match. I imagine that this problem is based around my not > understanding either IRanges or Rle sufficiently but I have read help > on Rle objects & IRanges and can't work out how to ensure that the > formation of the GRanges object leads to the chr values matching > > Thanks > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org <javascript:;> > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact [[alternative HTML version deleted]]
ADD COMMENTlink written 4.7 years ago by Steve Lianoglou12k
Sorry for not being clear That is indeed what I mean by "levels not matching" So is it true that the chr1..chr2 etc as it appears in Levels will have no effect on further analysis? even though Values & Levels for seqnames do not match? On Fri, Jan 11, 2013 at 1:54 PM, Steve Lianoglou <mailinglist.honeypot at="" gmail.com=""> wrote: > Hi > > On Friday, January 11, 2013, Tom Oates wrote: >> >> Hi >> I am trying to use GenomicRanges as part of an anlalysis of sequencing >> data. >> I have a number of files which I wish to use to make GRanges objects. >> For example: >> >> chr1 579578 579804 CpG_12 >> chr1 630418 630623 CpG_11 >> chr1 804552 804763 CpG_9 >> chr1 1307051 1307362 CpG_16 >> chr1 1323599 1323808 CpG_9 >> chr1 1350549 1350758 CpG_12 >> chr1 1403287 1403637 CpG_20 >> chr1 1418906 1419488 CpG_28 >> >> This file is sorted such that chr1 is followed chr2, 3, 4, 5 etc to 20 >> (as opposed to chr10, 11...19, 2, 3 etc) >> >> I use to make the GRanges object >> cpgi_gr<-GRanges(seqnames=Rle(cpgi$V1), >> ranges=IRanges(start=cpgi$V2,end=cpgi$V3), >> UCSC_AL_ID=cpgi$V4) >> >> but then if I examine >> >> seqnames(cpgi_gr) >> >> >> I get >> factor-Rle of length 89611 with 21 runs >> Lengths: 8952 5602 6133 4973 5840 4260 ... 3132 3607 2793 >> 3175 3842 1419 >> Values : chr1 chr2 chr3 chr4 chr5 chr6 ... chr16 chr17 chr18 >> chr19 chr20 chrX >> Levels(21): chr1 chr10 chr11 chr12 chr13 chr14 chr15 ... chr5 chr6 >> chr7 chr8 chr9 chrX >> >> >> So the Values & Levels are not matching. > > > I'm not sure what you mean by this, but your object looks more or less fine > to me. > > If by the "levels not matching" you mean that chr10 comes (instead of chr2) > after chr1 in a call to "sea levels", then that should really affect your > analysis ... > > > HTH, > Steve > >> I hope to give the GRanges >> object seqlengths of the chr lengths in the genome so I can then >> perform flank etc tasks on the data so it is crucial that the values & >> lengths match. I imagine that this problem is based around my not >> understanding either IRanges or Rle sufficiently but I have read help >> on Rle objects & IRanges and can't work out how to ensure that the >> formation of the GRanges object leads to the chr values matching >> >> Thanks >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact
ADD REPLYlink written 4.7 years ago by Tom Oates60
Ideally your seqlevels would be in their natural ordering, i.e., chr1, chr2, etc. You could for example get a nicely ordered Seqinfo object from a BSgenome package or similar. For example seqinfo(gr) <- seqinfo(Hsapiens) But it doesn't matter as far as the integrity of your data. Michael On Fri, Jan 11, 2013 at 6:37 AM, Tom Oates <toates19@gmail.com> wrote: > Sorry for not being clear > That is indeed what I mean by "levels not matching" > So is it true that the chr1..chr2 etc as it appears in Levels will > have no effect on further analysis? even though Values & Levels for > seqnames do not match? > > On Fri, Jan 11, 2013 at 1:54 PM, Steve Lianoglou > <mailinglist.honeypot@gmail.com> wrote: > > Hi > > > > On Friday, January 11, 2013, Tom Oates wrote: > >> > >> Hi > >> I am trying to use GenomicRanges as part of an anlalysis of sequencing > >> data. > >> I have a number of files which I wish to use to make GRanges objects. > >> For example: > >> > >> chr1 579578 579804 CpG_12 > >> chr1 630418 630623 CpG_11 > >> chr1 804552 804763 CpG_9 > >> chr1 1307051 1307362 CpG_16 > >> chr1 1323599 1323808 CpG_9 > >> chr1 1350549 1350758 CpG_12 > >> chr1 1403287 1403637 CpG_20 > >> chr1 1418906 1419488 CpG_28 > >> > >> This file is sorted such that chr1 is followed chr2, 3, 4, 5 etc to 20 > >> (as opposed to chr10, 11...19, 2, 3 etc) > >> > >> I use to make the GRanges object > >> cpgi_gr<-GRanges(seqnames=Rle(cpgi$V1), > >> ranges=IRanges(start=cpgi$V2,end=cpgi$V3), > >> UCSC_AL_ID=cpgi$V4) > >> > >> but then if I examine > >> > >> seqnames(cpgi_gr) > >> > >> > >> I get > >> factor-Rle of length 89611 with 21 runs > >> Lengths: 8952 5602 6133 4973 5840 4260 ... 3132 3607 2793 > >> 3175 3842 1419 > >> Values : chr1 chr2 chr3 chr4 chr5 chr6 ... chr16 chr17 chr18 > >> chr19 chr20 chrX > >> Levels(21): chr1 chr10 chr11 chr12 chr13 chr14 chr15 ... chr5 chr6 > >> chr7 chr8 chr9 chrX > >> > >> > >> So the Values & Levels are not matching. > > > > > > I'm not sure what you mean by this, but your object looks more or less > fine > > to me. > > > > If by the "levels not matching" you mean that chr10 comes (instead of > chr2) > > after chr1 in a call to "sea levels", then that should really affect your > > analysis ... > > > > > > HTH, > > Steve > > > >> I hope to give the GRanges > >> object seqlengths of the chr lengths in the genome so I can then > >> perform flank etc tasks on the data so it is crucial that the values & > >> lengths match. I imagine that this problem is based around my not > >> understanding either IRanges or Rle sufficiently but I have read help > >> on Rle objects & IRanges and can't work out how to ensure that the > >> formation of the GRanges object leads to the chr values matching > >> > >> Thanks > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > -- > > Steve Lianoglou > > Graduate Student: Computational Systems Biology > > | Memorial Sloan-Kettering Cancer Center > > | Weill Medical College of Cornell University > > Contact Info: http://cbio.mskcc.org/~lianos/contact > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLYlink written 4.7 years ago by Michael Lawrence9.5k
one day I found myself in this situation and wrote the Stupid Man's version of the above... i.e. "just do what I mean" (yes that's bad) addSeqinfo <- function (x) { require(rtracklayer) ## and assumes that genome(x) has been set stopifnot(is(x, "SummarizedExperiment") || is(x, "GenomicRanges")) gen <- ifelse(is(x, "SummarizedExperiment"), genome(rowData(x)), genome(x)) gen <- unique(gen) if is.na(gen)) stop("You must assign genome(x) first") if (is(x, "SummarizedExperiment")) { seqinfo(rowData(x)) <- SeqinfoForBSGenome(gen)[seqlevels(rowData(x))] } else { seqinfo(x) <- SeqinfoForBSGenome(gen)[seqlevels(x)] } return(x) } Granted it is not beautiful but when I automatically pull some HuEx, methylation array, eRRBS or summarized RNAseq data from GEO it can be nice for the whole chain to just "know what to do" based on the genome assembly of the annotations used to create the GRanges or SE: R> unique(genome(BcellALL)) ## [1] "hg19" R> head(as.data.frame(seqinfo(BcellALL))) ## data frame with 6 rows and 3 columns ## seqlengths isCircular genome ## <integer> <logical> <character> ## chr1 NA NA hg19 ## ... R> BcellALL <- addSeqinfo(BcellALL) R> head(as.data.frame(seqinfo(BcellALL))) ## data frame with 6 rows and 3 columns ## seqlengths isCircular genome ## <integer> <logical> <character> ## chr1 249250621 FALSE hg19 ## ... Needless to say I am a big fan of BSgenome and rtracklayer -- thank you Herve and Michael for putting them together (and yes I will get around to testing import.bigBed() when I can... I cheated and used bigBedToBed for the task originally motivating the patch, however). On Fri, Jan 11, 2013 at 7:14 AM, Michael Lawrence <lawrence.michael@gene.com> wrote: > Ideally your seqlevels would be in their natural ordering, i.e., chr1, > chr2, etc. You could for example get a nicely ordered Seqinfo object from a > BSgenome package or similar. > > For example > seqinfo(gr) <- seqinfo(Hsapiens) > > But it doesn't matter as far as the integrity of your data. > > Michael > > > On Fri, Jan 11, 2013 at 6:37 AM, Tom Oates <toates19@gmail.com> wrote: > > > Sorry for not being clear > > That is indeed what I mean by "levels not matching" > > So is it true that the chr1..chr2 etc as it appears in Levels will > > have no effect on further analysis? even though Values & Levels for > > seqnames do not match? > > > > On Fri, Jan 11, 2013 at 1:54 PM, Steve Lianoglou > > <mailinglist.honeypot@gmail.com> wrote: > > > Hi > > > > > > On Friday, January 11, 2013, Tom Oates wrote: > > >> > > >> Hi > > >> I am trying to use GenomicRanges as part of an anlalysis of sequencing > > >> data. > > >> I have a number of files which I wish to use to make GRanges objects. > > >> For example: > > >> > > >> chr1 579578 579804 CpG_12 > > >> chr1 630418 630623 CpG_11 > > >> chr1 804552 804763 CpG_9 > > >> chr1 1307051 1307362 CpG_16 > > >> chr1 1323599 1323808 CpG_9 > > >> chr1 1350549 1350758 CpG_12 > > >> chr1 1403287 1403637 CpG_20 > > >> chr1 1418906 1419488 CpG_28 > > >> > > >> This file is sorted such that chr1 is followed chr2, 3, 4, 5 etc to 20 > > >> (as opposed to chr10, 11...19, 2, 3 etc) > > >> > > >> I use to make the GRanges object > > >> cpgi_gr<-GRanges(seqnames=Rle(cpgi$V1), > > >> ranges=IRanges(start=cpgi$V2,end=cpgi$V3), > > >> UCSC_AL_ID=cpgi$V4) > > >> > > >> but then if I examine > > >> > > >> seqnames(cpgi_gr) > > >> > > >> > > >> I get > > >> factor-Rle of length 89611 with 21 runs > > >> Lengths: 8952 5602 6133 4973 5840 4260 ... 3132 3607 2793 > > >> 3175 3842 1419 > > >> Values : chr1 chr2 chr3 chr4 chr5 chr6 ... chr16 chr17 chr18 > > >> chr19 chr20 chrX > > >> Levels(21): chr1 chr10 chr11 chr12 chr13 chr14 chr15 ... chr5 chr6 > > >> chr7 chr8 chr9 chrX > > >> > > >> > > >> So the Values & Levels are not matching. > > > > > > > > > I'm not sure what you mean by this, but your object looks more or less > > fine > > > to me. > > > > > > If by the "levels not matching" you mean that chr10 comes (instead of > > chr2) > > > after chr1 in a call to "sea levels", then that should really affect > your > > > analysis ... > > > > > > > > > HTH, > > > Steve > > > > > >> I hope to give the GRanges > > >> object seqlengths of the chr lengths in the genome so I can then > > >> perform flank etc tasks on the data so it is crucial that the values & > > >> lengths match. I imagine that this problem is based around my not > > >> understanding either IRanges or Rle sufficiently but I have read help > > >> on Rle objects & IRanges and can't work out how to ensure that the > > >> formation of the GRanges object leads to the chr values matching > > >> > > >> Thanks > > >> > > >> _______________________________________________ > > >> Bioconductor mailing list > > >> Bioconductor@r-project.org > > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > > >> Search the archives: > > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > > > > -- > > > Steve Lianoglou > > > Graduate Student: Computational Systems Biology > > > | Memorial Sloan-Kettering Cancer Center > > > | Weill Medical College of Cornell University > > > Contact Info: http://cbio.mskcc.org/~lianos/contact > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor@r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: > > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > -- *A model is a lie that helps you see the truth.* * * Howard Skipper<http: cancerres.aacrjournals.org="" content="" 31="" 9="" 1173.full.pdf=""> [[alternative HTML version deleted]]
ADD REPLYlink written 4.7 years ago by Tim Triche4.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 122 users visited in the last hour