iranges

0

Entering edit mode

carol white ▴ 680

@carol-white-2174

Last seen 8.9 years ago

European Union

Hi, How does width with start and end in IRanges work? I thought that if I use the end with a width, then the sequence from the end with the length of width is taken. However, in my case when I use width for ex 20 and 10, the corresponding sequences with the length 20 and 10 are not the same from the end but from the beginning. Did I misunderstood some thing? Regards, Carol [[alternative HTML version deleted]]

IRanges IRanges • 1.3k views

ADD COMMENT • link updated 9.7 years ago by Valerie Obenchain ★ 6.8k • written 9.7 years ago by carol white ▴ 680

0

Entering edit mode

Valerie Obenchain ★ 6.8k

@valerie-obenchain-4275

Last seen 2.3 years ago

United States

Hi Carol, The 'end' is the end of the range. When you specify ranges with 'end' and 'width' the range will always end at the 'end' value. > IRanges(end = 10, width = c(5, 10)) IRanges of length 2 start end width [1] 6 10 5 [2] 1 10 10 Similar reasoning for 'start' and 'width': > IRanges(start = 10, width = c(5, 10)) IRanges of length 2 start end width [1] 10 14 5 [2] 10 19 10 Valerie On 08/08/2014 01:29 AM, carol white wrote: > Hi, > How does width with start and end in IRanges work? I thought that if I use the end with a width, then the sequence from the end with the length of width is taken. However, in my case when I use width for ex 20 and 10, the corresponding sequences with the length 20 and 10 are not the same from the end but from the beginning. Did I misunderstood some thing? > > Regards, > > Carol > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Valerie Obenchain Program in Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, Seattle, WA 98109 Email: vobencha at fhcrc.org Phone: (206) 667-3158

ADD COMMENT • link 9.7 years ago Valerie Obenchain ★ 6.8k

0

Entering edit mode

Did you provide 'start', 'end' and 'width' and get a confusing answer? If yes, please show your example. Thanks. Valerie On 08/08/2014 08:23 AM, Valerie Obenchain wrote: > Hi Carol, > > The 'end' is the end of the range. When you specify ranges with 'end' > and 'width' the range will always end at the 'end' value. > > > IRanges(end = 10, width = c(5, 10)) > IRanges of length 2 > start end width > [1] 6 10 5 > [2] 1 10 10 > > > Similar reasoning for 'start' and 'width': > > > IRanges(start = 10, width = c(5, 10)) > IRanges of length 2 > start end width > [1] 10 14 5 > [2] 10 19 10 > > > Valerie > > > > On 08/08/2014 01:29 AM, carol white wrote: >> Hi, >> How does width with start and end in IRanges work? I thought that if I >> use the end with a width, then the sequence from the end with the >> length of width is taken. However, in my case when I use width for ex >> 20 and 10, the corresponding sequences with the length 20 and 10 are >> not the same from the end but from the beginning. Did I misunderstood >> some thing? >> >> Regards, >> >> Carol >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > -- Valerie Obenchain Program in Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, Seattle, WA 98109 Email: vobencha at fhcrc.org Phone: (206) 667-3158

ADD REPLY • link 9.7 years ago Valerie Obenchain ★ 6.8k

0

Entering edit mode

Valerie Obenchain ★ 6.8k

@valerie-obenchain-4275

Last seen 2.3 years ago

United States

Hi, Please 'reply all' when responding so communication stays on the list. If you are working with stranded ranges you should use the GRanges container. IRanges is not strand-aware and does not have a strand argument. You can see the function signature on the man page by typing ?IRanges > Usage: > > ## IRanges constructor: > IRanges(start=NULL, end=NULL, width=NULL, names=NULL) > Load a Transcript Db object and extract transcripts by gene: library(TxDb.Dmelanogaster.UCSC.dm3.ensGene) tx <- transcriptsBy(TxDb.Dmelanogaster.UCSC.dm3.ensGene, "gene") Select a gene with transcripts on the negative strand: gene <- tx[[3]] >> gene > GRanges with 4 ranges and 2 metadata columns: > seqnames ranges strand | tx_id tx_name > <rle> <iranges> <rle> | <integer> <character> > [1] chr3R [12632936, 12655767] - | 21863 FBtr0306337 > [2] chr3R [12633349, 12653845] - | 21864 FBtr0083388 > [3] chr3R [12633349, 12655300] - | 21865 FBtr0083387 > [4] chr3R [12633349, 12655474] - | 21866 FBtr0300485 GRanges can be manipulated with resize(), trim(), shift(), flank(), narrow() and several other methods. To see them type (with the quotes) ?`intra-range-methods` and select the page for GRanges. It sounds like resize() is what you're looking for. resize(gene, width = 10) >> resize(gene, width = 10) > GRanges with 4 ranges and 2 metadata columns: > seqnames ranges strand | tx_id tx_name > <rle> <iranges> <rle> | <integer> <character> > [1] chr3R [12655758, 12655767] - | 21863 FBtr0306337 > [2] chr3R [12653836, 12653845] - | 21864 FBtr0083388 > [3] chr3R [12655291, 12655300] - | 21865 FBtr0083387 > [4] chr3R [12655465, 12655474] - | 21866 FBtr0300485 If you have sequence data instead of range data, the XStringSet family is more appropriate. For examples of manipulating sequences see Section E on the XStringSet man page. The functions you want are narrow() or subseq(). library(Biostrings) ?XStringSet Valerie On 08/08/2014 08:38 AM, carol white wrote: > I have the problem when i want to take the width from the end of a > sequence on a reverse strand. > if I take the nucleotide seq of a gene that is on the reverse strand on > the ncbi web site and extract for ex 10 or 20 bp from the end, i don't > get the same as I do with iranges. As I have already given the strand as > the parameter to the iranges function, I assume that it has already > reverse-complemented by iranges. I don't have this problem with the > genes that are on the forward strand nor when I take the sub sequence > from the beginning of the sequence. > > Regards, > On Friday, August 8, 2014 5:28 PM, Valerie Obenchain > <vobencha at="" fhcrc.org=""> wrote: > > > Did you provide 'start', 'end' and 'width' and get a confusing answer? > If yes, please show your example. > > Thanks. > Valerie > > > > On 08/08/2014 08:23 AM, Valerie Obenchain wrote: > > Hi Carol, > > > > The 'end' is the end of the range. When you specify ranges with 'end' > > and 'width' the range will always end at the 'end' value. > > > > > IRanges(end = 10, width = c(5, 10)) > > IRanges of length 2 > > start end width > > [1] 6 10 5 > > [2] 1 10 10 > > > > > > Similar reasoning for 'start' and 'width': > > > > > IRanges(start = 10, width = c(5, 10)) > > IRanges of length 2 > > start end width > > [1] 10 14 5 > > [2] 10 19 10 > > > > > > Valerie > > > > > > > > On 08/08/2014 01:29 AM, carol white wrote: > >> Hi, > >> How does width with start and end in IRanges work? I thought that if I > >> use the end with a width, then the sequence from the end with the > >> length of width is taken. However, in my case when I use width for ex > >> 20 and 10, the corresponding sequences with the length 20 and 10 are > >> not the same from the end but from the beginning. Did I misunderstood > >> some thing? > >> > >> Regards, > >> > >> Carol > >> [[alternative HTML version deleted]] > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > > > > > > -- > Valerie Obenchain > Program in Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, Seattle, WA 98109 > > Email: vobencha at fhcrc.org <mailto:vobencha at="" fhcrc.org=""> > Phone: (206) 667-3158 > >

ADD COMMENT • link 9.7 years ago Valerie Obenchain ★ 6.8k

0

Entering edit mode

1- suppose, I have seq with start and end that for some seq start > end and for some other start < end. can I create a GRanges with strand being defined as + if start > end and - otherwise and take start and end as they (on reverse strand, start > end and on forward, start < end) or should I swap start and end? 2- if I use resize with your example, why the start of the output of resize is the same gene's start although I take 10bp from the end? resize(gene, 10, fix="end") GRanges with 4 ranges and 2 metadata columns: seqnames ranges strand | tx_id tx_name <rle> <iranges> <rle> | <integer> <character> [1] chr3R [12632936, 12632945] - | 21863 FBtr0306337 [2] chr3R [12633349, 12633358] - | 21864 FBtr0083388 [3] chr3R [12633349, 12633358] - | 21865 FBtr0083387 [4] chr3R [12633349, 12633358] - | 21866 FBtr0300485 --- seqlengths: chr2L chr2R chr3L chr3R ... chrXHet chrYHet chrUextra 23011544 21146708 24543557 27905053 ... 204112 347038 29004656 > gene GRanges with 4 ranges and 2 metadata columns: seqnames ranges strand | tx_id tx_name <rle> <iranges> <rle> | <integer> <character> [1] chr3R [12632936, 12655767] - | 21863 FBtr0306337 [2] chr3R [12633349, 12653845] - | 21864 FBtr0083388 [3] chr3R [12633349, 12655300] - | 21865 FBtr0083387 [4] chr3R [12633349, 12655474] - | 21866 FBtr0300485 --- seqlengths: chr2L chr2R chr3L chr3R ... chrXHet chrYHet chrUextra 23011544 21146708 24543557 27905053 ... 204112 347038 29004656 #################################################### 3- why do I get err ms in using narrow narrow(gene, width=10, end= end(ranges(gene))) Error in .Call2("solve_user_SEW", refwidths, start, end, width, translate.negative.coord, : solving row 1: 'allow.nonnarrowing' is FALSE and the supplied end (12655767) is > refwidth Regards, On Friday, August 8, 2014 6:11 PM, Valerie Obenchain <vobencha@fhcrc.org> wrote: Hi, Please 'reply all' when responding so communication stays on the list. If you are working with stranded ranges you should use the GRanges container. IRanges is not strand-aware and does not have a strand argument. You can see the function signature on the man page by typing ?IRanges > Usage: > > ## IRanges constructor: > IRanges(start=NULL, end=NULL, width=NULL, names=NULL) > Load a Transcript Db object and extract transcripts by gene: library(TxDb.Dmelanogaster.UCSC.dm3.ensGene) tx <- transcriptsBy(TxDb.Dmelanogaster.UCSC.dm3.ensGene, "gene") Select a gene with transcripts on the negative strand: gene <- tx[[3]] >> gene > GRanges with 4 ranges and 2 metadata columns: > seqnames ranges strand | tx_id tx_name > <rle> <iranges> <rle> | <integer> <character> > [1] chr3R [12632936, 12655767] - | 21863 FBtr0306337 > [2] chr3R [12633349, 12653845] - | 21864 FBtr0083388 > [3] chr3R [12633349, 12655300] - | 21865 FBtr0083387 > [4] chr3R [12633349, 12655474] - | 21866 FBtr0300485 GRanges can be manipulated with resize(), trim(), shift(), flank(), narrow() and several other methods. To see them type (with the quotes) ?`intra-range-methods` and select the page for GRanges. It sounds like resize() is what you're looking for. resize(gene, width = 10) >> resize(gene, width = 10) > GRanges with 4 ranges and 2 metadata columns: > seqnames ranges strand | tx_id tx_name > <rle> <iranges> <rle> | <integer> <character> > [1] chr3R [12655758, 12655767] - | 21863 FBtr0306337 > [2] chr3R [12653836, 12653845] - | 21864 FBtr0083388 > [3] chr3R [12655291, 12655300] - | 21865 FBtr0083387 > [4] chr3R [12655465, 12655474] - | 21866 FBtr0300485 If you have sequence data instead of range data, the XStringSet family is more appropriate. For examples of manipulating sequences see Section E on the XStringSet man page. The functions you want are narrow() or subseq(). library(Biostrings) ?XStringSet Valerie On 08/08/2014 08:38 AM, carol white wrote: > I have the problem when i want to take the width from the end of a > sequence on a reverse strand. > if I take the nucleotide seq of a gene that is on the reverse strand on > the ncbi web site and extract for ex 10 or 20 bp from the end, i don't > get the same as I do with iranges. As I have already given the strand as > the parameter to the iranges function, I assume that it has already > reverse-complemented by iranges. I don't have this problem with the > genes that are on the forward strand nor when I take the sub sequence > from the beginning of the sequence. > > Regards, > On Friday, August 8, 2014 5:28 PM, Valerie Obenchain > <vobencha@fhcrc.org> wrote: > > > Did you provide 'start', 'end' and 'width' and get a confusing answer? > If yes, please show your example. > > Thanks. > Valerie > > > > On 08/08/2014 08:23 AM, Valerie Obenchain wrote: > > Hi Carol, > > > > The 'end' is the end of the range. When you specify ranges with 'end' > > and 'width' the range will always end at the 'end' value. > > > > > IRanges(end = 10, width = c(5, 10)) > > IRanges of length 2 > > start end width > > [1] 6 10 5 > > [2] 1 10 10 > > > > > > Similar reasoning for 'start' and 'width': > > > > > IRanges(start = 10, width = c(5, 10)) > > IRanges of length 2 > > start end width > > [1] 10 14 5 > > [2] 10 19 10 > > > > > > Valerie > > > > > > > > On 08/08/2014 01:29 AM, carol white wrote: > >> Hi, > >> How does width with start and end in IRanges work? I thought that if I > >> use the end with a width, then the sequence from the end with the > >> length of width is taken. However, in my case when I use width for ex > >> 20 and 10, the corresponding sequences with the length 20 and 10 are > >> not the same from the end but from the beginning. Did I misunderstood > >> some thing? > >> > >> Regards, > >> > >> Carol > >> [[alternative HTML version deleted]] > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org <mailto:bioconductor@r-project.org> > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > > > > > > -- > Valerie Obenchain > Program in Computational Biology > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, Seattle, WA 98109 > > Email: vobencha@fhcrc.org <mailto:vobencha@fhcrc.org> > Phone: (206) 667-3158 > > [[alternative HTML version deleted]]

ADD REPLY • link 9.7 years ago carol white ▴ 680

0

Entering edit mode

Hi, On 08/09/14 04:37, carol white wrote: > 1- suppose, I have seq with start and end that for some seq start > end > and for some other start < end. can I create a GRanges with strand being > defined as + if start > end and - otherwise and take start and end as > they (on reverse strand, start > end and on forward, start < end) or > should I swap start and end? The convention for storing negative ranges in a GRanges object is the same a positive; smallest ranges first. For example, the second and third elements of 'tx': library(TxDb.Dmelanogaster.UCSC.dm3.ensGene) tx <- transcriptsBy(TxDb.Dmelanogaster.UCSC.dm3.ensGene, "gene") > tx[2:3] GRangesList of length 2: $FBgn0000008 GRanges with 3 ranges and 2 metadata columns: seqnames ranges strand | tx_id tx_name <rle> <iranges> <rle> | <integer> <character> [1] chr2R [18024494, 18060339] + | 7681 FBtr0100521 [2] chr2R [18024496, 18060346] + | 7682 FBtr0071763 [3] chr2R [18024938, 18060346] + | 7683 FBtr0071764 $FBgn0000014 GRanges with 4 ranges and 2 metadata columns: seqnames ranges strand | tx_id tx_name [1] chr3R [12632936, 12655767] - | 21863 FBtr0306337 [2] chr3R [12633349, 12653845] - | 21864 FBtr0083388 [3] chr3R [12633349, 12655300] - | 21865 FBtr0083387 [4] chr3R [12633349, 12655474] - | 21866 FBtr0300485 IRanges / GRanges don't hold negative width ranges so you can't create a range with start > width. > GRanges("chr1", IRanges(10, 5)) Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE = "IRanges") : solving row 1: negative widths are not allowed The one exception is a zero-width range but that's not applicable here: > width(GRanges("chr1", IRanges(4,3))) [1] 0 > > 2- if I use resize with your example, why the start of the output of > resize is the same gene's start although I take 10bp from the end? resize() is strand-aware. In the GRanges the ranges are ordered left to right (smallest to largest). Because transcription is from 3' to 5' for neg ranges, these end values > end(gene) [1] 12655767 12653845 12655300 12655474 are considered the start values. You can see the difference when we change the strand. strand(gene) <- "+" > resize(gene, 10, fix="end") GRanges with 4 ranges and 2 metadata columns: seqnames ranges strand | tx_id tx_name <rle> <iranges> <rle> | <integer> <character> [1] chr3R [12655758, 12655767] + | 21863 FBtr0306337 [2] chr3R [12653836, 12653845] + | 21864 FBtr0083388 [3] chr3R [12655291, 12655300] + | 21865 FBtr0083387 [4] chr3R [12655465, 12655474] + | 21866 FBtr0300485 > > resize(gene, 10, fix="end") > GRanges with 4 ranges and 2 metadata columns: > seqnames ranges strand | tx_id tx_name > <rle> <iranges> <rle> | <integer> <character> > [1] chr3R [12632936, 12632945] - | 21863 FBtr0306337 > [2] chr3R [12633349, 12633358] - | 21864 FBtr0083388 > [3] chr3R [12633349, 12633358] - | 21865 FBtr0083387 > [4] chr3R [12633349, 12633358] - | 21866 FBtr0300485 > --- > seqlengths: > chr2L chr2R chr3L chr3R ... chrXHet chrYHet > chrUextra > 23011544 21146708 24543557 27905053 ... 204112 347038 > 29004656 >> gene > GRanges with 4 ranges and 2 metadata columns: > seqnames ranges strand | tx_id tx_name > <rle> <iranges> <rle> | <integer> <character> > [1] chr3R [12632936, 12655767] - | 21863 FBtr0306337 > [2] chr3R [12633349, 12653845] - | 21864 FBtr0083388 > [3] chr3R [12633349, 12655300] - | 21865 FBtr0083387 > [4] chr3R [12633349, 12655474] - | 21866 FBtr0300485 > --- > seqlengths: > chr2L chr2R chr3L chr3R ... chrXHet chrYHet > chrUextra > 23011544 21146708 24543557 27905053 ... 204112 347038 > 29004656 > #################################################### > 3- why do I get err ms in using narrow > > narrow(gene, width=10, end= end(ranges(gene))) > Error in .Call2("solve_user_SEW", refwidths, start, end, width, > translate.negative.coord, : > solving row 1: 'allow.nonnarrowing' is FALSE and the supplied end > (12655767) is > refwidth 'start' and 'end' are integers specifying the distance from the current start and end. 'end = -3' defines the end as 3 less than the current end; 'start = 5' defines the start as 5 greater than the current start. It's tricky to get ranges of length 10 anchored at the end with narrow(): newStart <- (end(gene) - 9) - (start(gene) - 1) narrowGene <- narrow(gene, start = newStart) width(narrowGene) > width(narrowGene) [1] 10 10 10 10 Several functions on the inter-range-methods man page are similar and you can often get the same result with different functions. In this case, it may be more straightforward to use restrict(): > restrictGene <- restrict(gene, start=end(gene) - 9, end=end(gene)) > width(restrictGene) [1] 10 10 10 10 > restrictGene GRanges with 4 ranges and 2 metadata columns: seqnames ranges strand | tx_id tx_name <rle> <iranges> <rle> | <integer> <character> [1] chr3R [12655758, 12655767] - | 21863 FBtr0306337 [2] chr3R [12653836, 12653845] - | 21864 FBtr0083388 [3] chr3R [12655291, 12655300] - | 21865 FBtr0083387 [4] chr3R [12655465, 12655474] - | 21866 FBtr0300485 Valerie > > Regards, > > > On Friday, August 8, 2014 6:11 PM, Valerie Obenchain > <vobencha at="" fhcrc.org=""> wrote: > > > Hi, > > Please 'reply all' when responding so communication stays on the list. > > If you are working with stranded ranges you should use the GRanges > container. IRanges is not strand-aware and does not have a strand > argument. You can see the function signature on the man page by typing > > ?IRanges > > > Usage: > > > > ## IRanges constructor: > > IRanges(start=NULL, end=NULL, width=NULL, names=NULL) > > > > Load a Transcript Db object and extract transcripts by gene: > library(TxDb.Dmelanogaster.UCSC.dm3.ensGene) > tx <- transcriptsBy(TxDb.Dmelanogaster.UCSC.dm3.ensGene, "gene") > > Select a gene with transcripts on the negative strand: > gene <- tx[[3]] > >> gene > > GRanges with 4 ranges and 2 metadata columns: > > seqnames ranges strand | tx_id tx_name > > <rle> <iranges> <rle> | <integer> <character> > > [1] chr3R [12632936, 12655767] - | 21863 FBtr0306337 > > [2] chr3R [12633349, 12653845] - | 21864 FBtr0083388 > > [3] chr3R [12633349, 12655300] - | 21865 FBtr0083387 > > [4] chr3R [12633349, 12655474] - | 21866 FBtr0300485 > > GRanges can be manipulated with resize(), trim(), shift(), flank(), > narrow() and several other methods. To see them type (with the quotes) > > ?`intra-range-methods` > > and select the page for GRanges. It sounds like resize() is what you're > looking for. > > resize(gene, width = 10) > >> resize(gene, width = 10) > > GRanges with 4 ranges and 2 metadata columns: > > seqnames ranges strand | tx_id tx_name > > <rle> <iranges> <rle> | <integer> <character> > > [1] chr3R [12655758, 12655767] - | 21863 FBtr0306337 > > [2] chr3R [12653836, 12653845] - | 21864 FBtr0083388 > > [3] chr3R [12655291, 12655300] - | 21865 FBtr0083387 > > [4] chr3R [12655465, 12655474] - | 21866 FBtr0300485 > > If you have sequence data instead of range data, the XStringSet family > is more appropriate. For examples of manipulating sequences see Section > E on the XStringSet man page. The functions you want are narrow() or > subseq(). > > library(Biostrings) > ?XStringSet > > > Valerie > > > On 08/08/2014 08:38 AM, carol white wrote: > > I have the problem when i want to take the width from the end of a > > sequence on a reverse strand. > > if I take the nucleotide seq of a gene that is on the reverse strand on > > the ncbi web site and extract for ex 10 or 20 bp from the end, i don't > > get the same as I do with iranges. As I have already given the strand as > > the parameter to the iranges function, I assume that it has already > > reverse-complemented by iranges. I don't have this problem with the > > genes that are on the forward strand nor when I take the sub sequence > > from the beginning of the sequence. > > > > Regards, > > On Friday, August 8, 2014 5:28 PM, Valerie Obenchain > > <vobencha at="" fhcrc.org="" <mailto:vobencha="" at="" fhcrc.org="">> wrote: > > > > > > Did you provide 'start', 'end' and 'width' and get a confusing answer? > > If yes, please show your example. > > > > Thanks. > > Valerie > > > > > > > > On 08/08/2014 08:23 AM, Valerie Obenchain wrote: > > > Hi Carol, > > > > > > The 'end' is the end of the range. When you specify ranges with 'end' > > > and 'width' the range will always end at the 'end' value. > > > > > > > IRanges(end = 10, width = c(5, 10)) > > > IRanges of length 2 > > > start end width > > > [1] 6 10 5 > > > [2] 1 10 10 > > > > > > > > > Similar reasoning for 'start' and 'width': > > > > > > > IRanges(start = 10, width = c(5, 10)) > > > IRanges of length 2 > > > start end width > > > [1] 10 14 5 > > > [2] 10 19 10 > > > > > > > > > Valerie > > > > > > > > > > > > On 08/08/2014 01:29 AM, carol white wrote: > > >> Hi, > > >> How does width with start and end in IRanges work? I thought that > if I > > >> use the end with a width, then the sequence from the end with the > > >> length of width is taken. However, in my case when I use width for ex > > >> 20 and 10, the corresponding sequences with the length 20 and 10 are > > >> not the same from the end but from the beginning. Did I misunderstood > > >> some thing? > > >> > > >> Regards, > > >> > > >> Carol > > >> [[alternative HTML version deleted]] > > >> > > >> _______________________________________________ > > >> Bioconductor mailing list > > >> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > <mailto:bioconductor at="" r-project.org="" <mailto:bioconductor="" at="" r-project.org="">> > > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > > >> Search the archives: > > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > >> > > > > > > > > > > > > -- > > Valerie Obenchain > > Program in Computational Biology > > Fred Hutchinson Cancer Research Center > > 1100 Fairview Ave. N, Seattle, WA 98109 > > > > Email: vobencha at fhcrc.org <mailto:vobencha at="" fhcrc.org=""> > <mailto:vobencha at="" fhcrc.org="" <mailto:vobencha="" at="" fhcrc.org="">> > > > Phone: (206) 667-3158 > > > > > > >

ADD REPLY • link 9.7 years ago Valerie Obenchain ★ 6.8k

0

Entering edit mode

Dear Valerie, It works now. Thank you for all your advices and help Best, Carol On Monday, August 11, 2014 6:21 PM, Valerie Obenchain <vobencha@fhcrc.org> wrote: Hi, On 08/09/14 04:37, carol white wrote: > 1- suppose, I have seq with start and end that for some seq start > end > and for some other start < end. can I create a GRanges with strand being > defined as + if start > end and - otherwise and take start and end as > they (on reverse strand, start > end and on forward, start < end) or > should I swap start and end? The convention for storing negative ranges in a GRanges object is the same a positive; smallest ranges first. For example, the second and third elements of 'tx': library(TxDb.Dmelanogaster.UCSC.dm3.ensGene) tx <- transcriptsBy(TxDb.Dmelanogaster.UCSC.dm3.ensGene, "gene") > tx[2:3] GRangesList of length 2: $FBgn0000008 GRanges with 3 ranges and 2 metadata columns: seqnames ranges strand | tx_id tx_name <rle> <iranges> <rle> | <integer> <character> [1] chr2R [18024494, 18060339] + | 7681 FBtr0100521 [2] chr2R [18024496, 18060346] + | 7682 FBtr0071763 [3] chr2R [18024938, 18060346] + | 7683 FBtr0071764 $FBgn0000014 GRanges with 4 ranges and 2 metadata columns: seqnames ranges strand | tx_id tx_name [1] chr3R [12632936, 12655767] - | 21863 FBtr0306337 [2] chr3R [12633349, 12653845] - | 21864 FBtr0083388 [3] chr3R [12633349, 12655300] - | 21865 FBtr0083387 [4] chr3R [12633349, 12655474] - | 21866 FBtr0300485 IRanges / GRanges don't hold negative width ranges so you can't create a range with start > width. > GRanges("chr1", IRanges(10, 5)) Error in .Call2("solve_user_SEW0", start, end, width, PACKAGE = "IRanges") : solving row 1: negative widths are not allowed The one exception is a zero-width range but that's not applicable here: > width(GRanges("chr1", IRanges(4,3))) [1] 0 > > 2- if I use resize with your example, why the start of the output of > resize is the same gene's start although I take 10bp from the end? resize() is strand-aware. In the GRanges the ranges are ordered left to right (smallest to largest). Because transcription is from 3' to 5' for neg ranges, these end values > end(gene) [1] 12655767 12653845 12655300 12655474 are considered the start values. You can see the difference when we change the strand. strand(gene) <- "+" > resize(gene, 10, fix="end") GRanges with 4 ranges and 2 metadata columns: seqnames ranges strand | tx_id tx_name <rle> <iranges> <rle> | <integer> <character> [1] chr3R [12655758, 12655767] + | 21863 FBtr0306337 [2] chr3R [12653836, 12653845] + | 21864 FBtr0083388 [3] chr3R [12655291, 12655300] + | 21865 FBtr0083387 [4] chr3R [12655465, 12655474] + | 21866 FBtr0300485 > > resize(gene, 10, fix="end") > GRanges with 4 ranges and 2 metadata columns: > seqnames ranges strand | tx_id tx_name > <rle> <iranges> <rle> | <integer> <character> > [1] chr3R [12632936, 12632945] - | 21863 FBtr0306337 > [2] chr3R [12633349, 12633358] - | 21864 FBtr0083388 > [3] chr3R [12633349, 12633358] - | 21865 FBtr0083387 > [4] chr3R [12633349, 12633358] - | 21866 FBtr0300485 > --- > seqlengths: > chr2L chr2R chr3L chr3R ... chrXHet chrYHet > chrUextra > 23011544 21146708 24543557 27905053 ... 204112 347038 > 29004656 >> gene > GRanges with 4 ranges and 2 metadata columns: > seqnames ranges strand | tx_id tx_name > <rle> <iranges> <rle> | <integer> <character> > [1] chr3R [12632936, 12655767] - | 21863 FBtr0306337 > [2] chr3R [12633349, 12653845] - | 21864 FBtr0083388 > [3] chr3R [12633349, 12655300] - | 21865 FBtr0083387 > [4] chr3R [12633349, 12655474] - | 21866 FBtr0300485 > --- > seqlengths: > chr2L chr2R chr3L chr3R ... chrXHet chrYHet > chrUextra > 23011544 21146708 24543557 27905053 ... 204112 347038 > 29004656 > #################################################### > 3- why do I get err ms in using narrow > > narrow(gene, width=10, end= end(ranges(gene))) > Error in .Call2("solve_user_SEW", refwidths, start, end, width, > translate.negative.coord, : > solving row 1: 'allow.nonnarrowing' is FALSE and the supplied end > (12655767) is > refwidth 'start' and 'end' are integers specifying the distance from the current start and end. 'end = -3' defines the end as 3 less than the current end; 'start = 5' defines the start as 5 greater than the current start. It's tricky to get ranges of length 10 anchored at the end with narrow(): newStart <- (end(gene) - 9) - (start(gene) - 1) narrowGene <- narrow(gene, start = newStart) width(narrowGene) > width(narrowGene) [1] 10 10 10 10 Several functions on the inter-range-methods man page are similar and you can often get the same result with different functions. In this case, it may be more straightforward to use restrict(): > restrictGene <- restrict(gene, start=end(gene) - 9, end=end(gene)) > width(restrictGene) [1] 10 10 10 10 > restrictGene GRanges with 4 ranges and 2 metadata columns: seqnames ranges strand | tx_id tx_name <rle> <iranges> <rle> | <integer> <character> [1] chr3R [12655758, 12655767] - | 21863 FBtr0306337 [2] chr3R [12653836, 12653845] - | 21864 FBtr0083388 [3] chr3R [12655291, 12655300] - | 21865 FBtr0083387 [4] chr3R [12655465, 12655474] - | 21866 FBtr0300485 Valerie > > Regards, > > > On Friday, August 8, 2014 6:11 PM, Valerie Obenchain > <vobencha@fhcrc.org> wrote: > > > Hi, > > Please 'reply all' when responding so communication stays on the list. > > If you are working with stranded ranges you should use the GRanges > container. IRanges is not strand-aware and does not have a strand > argument. You can see the function signature on the man page by typing > > ?IRanges > > > Usage: > > > > ## IRanges constructor: > > IRanges(start=NULL, end=NULL, width=NULL, names=NULL) > > > > Load a Transcript Db object and extract transcripts by gene: > library(TxDb.Dmelanogaster.UCSC.dm3.ensGene) > tx <- transcriptsBy(TxDb.Dmelanogaster.UCSC.dm3.ensGene, "gene") > > Select a gene with transcripts on the negative strand: > gene <- tx[[3]] > >> gene > > GRanges with 4 ranges and 2 metadata columns: > > seqnames ranges strand | tx_id tx_name > > <rle> <iranges> <rle> | <integer> <character> > > [1] chr3R [12632936, 12655767] - | 21863 FBtr0306337 > > [2] chr3R [12633349, 12653845] - | 21864 FBtr0083388 > > [3] chr3R [12633349, 12655300] - | 21865 FBtr0083387 > > [4] chr3R [12633349, 12655474] - | 21866 FBtr0300485 > > GRanges can be manipulated with resize(), trim(), shift(), flank(), > narrow() and several other methods. To see them type (with the quotes) > > ?`intra-range-methods` > > and select the page for GRanges. It sounds like resize() is what you're > looking for. > > resize(gene, width = 10) > >> resize(gene, width = 10) > > GRanges with 4 ranges and 2 metadata columns: > > seqnames ranges strand | tx_id tx_name > > <rle> <iranges> <rle> | <integer> <character> > > [1] chr3R [12655758, 12655767] - | 21863 FBtr0306337 > > [2] chr3R [12653836, 12653845] - | 21864 FBtr0083388 > > [3] chr3R [12655291, 12655300] - | 21865 FBtr0083387 > > [4] chr3R [12655465, 12655474] - | 21866 FBtr0300485 > > If you have sequence data instead of range data, the XStringSet family > is more appropriate. For examples of manipulating sequences see Section > E on the XStringSet man page. The functions you want are narrow() or > subseq(). > > library(Biostrings) > ?XStringSet > > > Valerie > > > On 08/08/2014 08:38 AM, carol white wrote: > > I have the problem when i want to take the width from the end of a > > sequence on a reverse strand. > > if I take the nucleotide seq of a gene that is on the reverse strand on > > the ncbi web site and extract for ex 10 or 20 bp from the end, i don't > > get the same as I do with iranges. As I have already given the strand as > > the parameter to the iranges function, I assume that it has already > > reverse-complemented by iranges. I don't have this problem with the > > genes that are on the forward strand nor when I take the sub sequence > > from the beginning of the sequence. > > > > Regards, > > On Friday, August 8, 2014 5:28 PM, Valerie Obenchain > > <vobencha@fhcrc.org <mailto:vobencha@fhcrc.org="">> wrote: > > > > > > Did you provide 'start', 'end' and 'width' and get a confusing answer? > > If yes, please show your example. > > > > Thanks. > > Valerie > > > > > > > > On 08/08/2014 08:23 AM, Valerie Obenchain wrote: > > > Hi Carol, > > > > > > The 'end' is the end of the range. When you specify ranges with 'end' > > > and 'width' the range will always end at the 'end' value. > > > > > > > IRanges(end = 10, width = c(5, 10)) > > > IRanges of length 2 > > > start end width > > > [1] 6 10 5 > > > [2] 1 10 10 > > > > > > > > > Similar reasoning for 'start' and 'width': > > > > > > > IRanges(start = 10, width = c(5, 10)) > > > IRanges of length 2 > > > start end width > > > [1] 10 14 5 > > > [2] 10 19 10 > > > > > > > > > Valerie > > > > > > > > > > > > On 08/08/2014 01:29 AM, carol white wrote: > > >> Hi, > > >> How does width with start and end in IRanges work? I thought that > if I > > >> use the end with a width, then the sequence from the end with the > > >> length of width is taken. However, in my case when I use width for ex > > >> 20 and 10, the corresponding sequences with the length 20 and 10 are > > >> not the same from the end but from the beginning. Did I misunderstood > > >> some thing? > > >> > > >> Regards, > > >> > > >> Carol > > >> [[alternative HTML version deleted]] > > >> > > >> _______________________________________________ > > >> Bioconductor mailing list > > >> Bioconductor@r-project.org <mailto:bioconductor@r-project.org> > <mailto:bioconductor@r-project.org <mailto:bioconductor@r-project.org="">> > > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > > >> Search the archives: > > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > >> > > > > > > > > > > > > -- > > Valerie Obenchain > > Program in Computational Biology > > Fred Hutchinson Cancer Research Center > > 1100 Fairview Ave. N, Seattle, WA 98109 > > > > Email: vobencha@fhcrc.org <mailto:vobencha@fhcrc.org> > <mailto:vobencha@fhcrc.org <mailto:vobencha@fhcrc.org="">> > > > Phone: (206) 667-3158 > > > > > > > [[alternative HTML version deleted]]

ADD REPLY • link 9.7 years ago carol white ▴ 680

Login before adding your answer.