flip strand of GappedAlignmentPairs Object
1
0
Entering edit mode
Stefanie ▴ 360
@stefanie-5192
Last seen 10.2 years ago
Dear list, Let's say I have single-end strand-specific RNA-Seq data. Then I could read in the data with: library(GenomicRanges) > reads = readGappedAlignments("data.bam") As I typically don't know which strand has been sequenced, it might be necessary to flip the strand of the reads before doing any kind of summarizeOverlap: > strand(reads) = ifelse(strand(reads) == "+", "-", "+") Now I could do a 'summarizeOverlaps' between the reads and any kind of transcript database. Now, if I have paired-end strand-specific RNA-Seq data. I read the data: > reads = readGappedAlignmentPairs("data.bam") How can I flip the strand of the reads? As it's now paired-end data, it does not work as above. I could imagine several work-arounds, I just wonder if there is any straight approach which I miss at the moment.. best regards, Stefanie [[alternative HTML version deleted]]
• 1.2k views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.9 years ago
United States
Hi Stephanie, A GappedAlignmentPairs object is made up of a 'left' and a 'right' GappedAlignments. > ex1_file <- system.file("extdata", "ex1.bam", package="Rsamtools") > galp <- readGappedAlignmentPairs(ex1_file, use.names=TRUE) You can extract the left/right parts with an accessor then look at strand, cigars, gaps etc. > left <- left(galp) > class(left) [1] "GappedAlignments" attr(,"package") [1] "GenomicRanges" > strand(left) factor-Rle of length 1572 with 665 runs Lengths: 3 25 1 1 1 3 1 1 3 2 2 ... 1 6 1 2 3 2 1 5 1 51 Values : - + - + - + - + - + - ... + - + - + - + - + - Levels(3): + - * > head(cigar(left)) [1] "35M" "36M" "35M" "36M" "35M" "35M" > head(ngap(left)) [1] 0 0 0 0 0 0 It's on these left/right pairs that you can reset the strand. > strand(left)[1:3] factor-Rle of length 3 with 1 run Lengths: 3 Values : + Levels(3): + - * > strand(left)[1:3] <- "-" > strand(left)[1:3] factor-Rle of length 3 with 1 run Lengths: 3 Values : - Levels(3): + - * Please be sure to read the details on the man page regarding how the strands were assigned. ?GappedAlignmentPairs Valerie On 02/27/13 05:55, Stefanie Tauber wrote: > Dear list, > > Let's say I have single-end strand-specific RNA-Seq data. > Then I could read in the data with: > > library(GenomicRanges) > >> reads = readGappedAlignments("data.bam") > As I typically don't know which strand has been sequenced, it might be necessary to flip the strand of the reads before doing any kind > of summarizeOverlap: > >> strand(reads) = ifelse(strand(reads) == "+", "-", "+") > Now I could do a 'summarizeOverlaps' between the reads and any kind of transcript database. > > Now, if I have paired-end strand-specific RNA-Seq data. > I read the data: >> reads = readGappedAlignmentPairs("data.bam") > How can I flip the strand of the reads? As it's now paired-end data, it does not work as above. > I could imagine several work-arounds, I just wonder if there is any straight approach which I miss at the moment.. > > > best regards, > Stefanie > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD COMMENT
0
Entering edit mode
Hi Valerie, thanks for the quick answer. I have already checked how the strand is assigned for paired-end data. After setting the strand for the left and right parts, can I somehow rebuild the gappedalignmentPairs object? best, Stefanie Am 27.02.2013 um 16:06 schrieb Valerie Obenchain <vobencha@fhcrc.org>: > Hi Stephanie, > > A GappedAlignmentPairs object is made up of a 'left' and a 'right' GappedAlignments. > > > ex1_file <- system.file("extdata", "ex1.bam", package="Rsamtools") > > galp <- readGappedAlignmentPairs(ex1_file, use.names=TRUE) > > You can extract the left/right parts with an accessor then look at strand, cigars, gaps etc. > > > left <- left(galp) > > class(left) > [1] "GappedAlignments" > attr(,"package") > [1] "GenomicRanges" > > > strand(left) > factor-Rle of length 1572 with 665 runs > Lengths: 3 25 1 1 1 3 1 1 3 2 2 ... 1 6 1 2 3 2 1 5 1 51 > Values : - + - + - + - + - + - ... + - + - + - + - + - > Levels(3): + - * > > > head(cigar(left)) > [1] "35M" "36M" "35M" "36M" "35M" "35M" > > > head(ngap(left)) > [1] 0 0 0 0 0 0 > > It's on these left/right pairs that you can reset the strand. > > > strand(left)[1:3] > factor-Rle of length 3 with 1 run > Lengths: 3 > Values : + > Levels(3): + - * > > strand(left)[1:3] <- "-" > > strand(left)[1:3] > factor-Rle of length 3 with 1 run > Lengths: 3 > Values : - > Levels(3): + - * > > Please be sure to read the details on the man page regarding how the strands were assigned. > > ?GappedAlignmentPairs > > > Valerie > > On 02/27/13 05:55, Stefanie Tauber wrote: >> Dear list, >> >> Let's say I have single-end strand-specific RNA-Seq data. >> Then I could read in the data with: >> >> library(GenomicRanges) >> >>> reads = readGappedAlignments("data.bam") >> As I typically don't know which strand has been sequenced, it might be necessary to flip the strand of the reads before doing any kind >> of summarizeOverlap: >> >>> strand(reads) = ifelse(strand(reads) == "+", "-", "+") >> Now I could do a 'summarizeOverlaps' between the reads and any kind of transcript database. >> >> Now, if I have paired-end strand-specific RNA-Seq data. >> I read the data: >>> reads = readGappedAlignmentPairs("data.bam") >> How can I flip the strand of the reads? As it's now paired-end data, it does not work as above. >> I could imagine several work-arounds, I just wonder if there is any straight approach which I miss at the moment.. >> >> >> best regards, >> Stefanie >> >> >> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > DI Stefanie Tauber Center for Integrative Bioinformatics Vienna (CIBIV) (CIBIV is a joint institute of Vienna University and Medical University) Max F. Perutz Laboratories (MFPL) Campus Vienna Biocenter 5 (VBC5), Ebene 1, Room 1812.2 Dr. Bohr Gasse 9 A-1030 Wien, Austria Phone: ++43 +1 / 42772-4030 Fax: ++43 +1 / 42772-4098 email: stefanie.tauber@univie.ac.at www.cibiv.at [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
You could use the low-level constructor GappedAlignmentPairs(first, last, isProperPair, names=NULL) 'isProperPair' is a logical vector the same length as 'first' and 'last'. For example, GappedAlignmentPairs(left, right, !logical(length(left))) Valerie On 02/27/13 07:12, Stefanie Tauber wrote: > Hi Valerie, > > thanks for the quick answer. > I have already checked how the strand is assigned for paired-end data. > After setting the strand for the left and right parts, > can I somehow rebuild the gappedalignmentPairs object? > > best, > Stefanie > > Am 27.02.2013 um 16:06 schrieb Valerie Obenchain <vobencha at="" fhcrc.org=""> <mailto:vobencha at="" fhcrc.org="">>: > >> Hi Stephanie, >> >> A GappedAlignmentPairs object is made up of a 'left' and a 'right' >> GappedAlignments. >> >> > ex1_file <- system.file("extdata", "ex1.bam", package="Rsamtools") >> > galp <- readGappedAlignmentPairs(ex1_file, use.names=TRUE) >> >> You can extract the left/right parts with an accessor then look at >> strand, cigars, gaps etc. >> >> > left <- left(galp) >> > class(left) >> [1] "GappedAlignments" >> attr(,"package") >> [1] "GenomicRanges" >> >> > strand(left) >> factor-Rle of length 1572 with 665 runs >> Lengths: 3 25 1 1 1 3 1 1 3 2 2 ... 1 6 1 2 3 2 1 5 >> 1 51 >> Values : - + - + - + - + - + - ... + - + - + - + - >> + - >> Levels(3): + - * >> >> > head(cigar(left)) >> [1] "35M" "36M" "35M" "36M" "35M" "35M" >> >> > head(ngap(left)) >> [1] 0 0 0 0 0 0 >> >> It's on these left/right pairs that you can reset the strand. >> >> > strand(left)[1:3] >> factor-Rle of length 3 with 1 run >> Lengths: 3 >> Values : + >> Levels(3): + - * >> > strand(left)[1:3] <- "-" >> > strand(left)[1:3] >> factor-Rle of length 3 with 1 run >> Lengths: 3 >> Values : - >> Levels(3): + - * >> >> Please be sure to read the details on the man page regarding how the >> strands were assigned. >> >> ?GappedAlignmentPairs >> >> >> Valerie >> >> On 02/27/13 05:55, Stefanie Tauber wrote: >>> Dear list, >>> >>> Let's say I have single-end strand-specific RNA-Seq data. >>> Then I could read in the data with: >>> >>> library(GenomicRanges) >>> >>>> reads = readGappedAlignments("data.bam") >>> As I typically don't know which strand has been sequenced, it might >>> be necessary to flip the strand of the reads before doing any kind >>> of summarizeOverlap: >>> >>>> strand(reads) = ifelse(strand(reads) == "+", "-", "+") >>> Now I could do a 'summarizeOverlaps' between the reads and any kind >>> of transcript database. >>> >>> Now, if I have paired-end strand-specific RNA-Seq data. >>> I read the data: >>>> reads = readGappedAlignmentPairs("data.bam") >>> How can I flip the strand of the reads? As it's now paired-end data, >>> it does not work as above. >>> I could imagine several work-arounds, I just wonder if there is any >>> straight approach which I miss at the moment.. >>> >>> >>> best regards, >>> Stefanie >>> >>> >>> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > DI Stefanie Tauber > > Center for Integrative Bioinformatics Vienna (CIBIV) > (CIBIV is a joint institute of Vienna University and Medical University) > Max F. Perutz Laboratories (MFPL) > Campus Vienna Biocenter 5 (VBC5), Ebene 1, Room 1812.2 > Dr. Bohr Gasse 9 > A-1030 Wien, Austria > Phone: ++43 +1 / 42772-4030 > Fax: ++43 +1 / 42772-4098 > email: stefanie.tauber at univie.ac.at <mailto:stefanie.tauber at="" univie.ac.at=""> > www.cibiv.at <http: www.cibiv.at=""> >
ADD REPLY
0
Entering edit mode
Hi Stefanie, Val, Makes sense to have a strand setter for GappedAlignmentPairs objects, like we have for GappedAlignments objects. I've added it to GenomicRanges devel: > galp GappedAlignmentPairs with 5 alignment pairs and 0 metadata columns: seqnames strand : ranges -- ranges <rle> <rle> : <iranges> -- <iranges> EAS54_65:6:277:590:364 chr1 - : [ 681, 715] -- [ 503, 537] EAS139_11:5:61:38:1182 chr1 - : [1388, 1422] -- [1205, 1239] EAS188_7:4:21:443:404 chr1 + : [1345, 1379] -- [1529, 1563] EAS1_105:6:23:885:274 chr2 + : [1089, 1123] -- [1289, 1323] EAS1_108:1:65:787:74 chr1 - : [ 213, 247] -- [ 61, 95] --- seqlengths: chr1 chr2 1575 1584 > strand(first(galp)) factor-Rle of length 5 with 3 runs Lengths: 2 2 1 Values : - + - Levels(3): + - * > strand(last(galp)) factor-Rle of length 5 with 3 runs Lengths: 2 2 1 Values : + - + Levels(3): + - * Flip the strand: > strand(galp) <- ifelse(strand(galp) == "+", "-", "+") > galp GappedAlignmentPairs with 5 alignment pairs and 0 metadata columns: seqnames strand : ranges -- ranges <rle> <rle> : <iranges> -- <iranges> EAS54_65:6:277:590:364 chr1 + : [ 681, 715] -- [ 503, 537] EAS139_11:5:61:38:1182 chr1 + : [1388, 1422] -- [1205, 1239] EAS188_7:4:21:443:404 chr1 - : [1345, 1379] -- [1529, 1563] EAS1_105:6:23:885:274 chr2 - : [1089, 1123] -- [1289, 1323] EAS1_108:1:65:787:74 chr1 + : [ 213, 247] -- [ 61, 95] --- seqlengths: chr1 chr2 1575 1584 > strand(first(galp)) factor-Rle of length 5 with 3 runs Lengths: 2 2 1 Values : + - + Levels(3): + - * > strand(last(galp)) factor-Rle of length 5 with 3 runs Lengths: 2 2 1 Values : - + - Levels(3): + - * Note that a slightly more efficient way to compute the flipped strand is: runValue(strand(galp)) <- ifelse(runValue(strand(galp)) == "+", "-", "+") This will only make a difference on big GappedAlignmentPairs objects of course. > sessionInfo() R Under development (unstable) (2013-02-19 r62008) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base other attached packages: [1] Rsamtools_1.11.18 Biostrings_2.27.11 GenomicRanges_1.11.33 [4] IRanges_1.17.35 BiocGenerics_0.5.6 loaded via a namespace (and not attached): [1] bitops_1.0-5 stats4_3.0.0 tools_3.0.0 zlibbioc_1.5.0 GenomicRanges 1.11.33, will become available via biocLite(0 in the next 24 hours or so. Cheers, H. On 02/27/2013 07:48 AM, Valerie Obenchain wrote: > You could use the low-level constructor > > GappedAlignmentPairs(first, last, isProperPair, names=NULL) > > 'isProperPair' is a logical vector the same length as 'first' and > 'last'. For example, > > GappedAlignmentPairs(left, right, !logical(length(left))) > > > Valerie > > On 02/27/13 07:12, Stefanie Tauber wrote: >> Hi Valerie, >> >> thanks for the quick answer. >> I have already checked how the strand is assigned for paired-end data. >> After setting the strand for the left and right parts, >> can I somehow rebuild the gappedalignmentPairs object? >> >> best, >> Stefanie >> >> Am 27.02.2013 um 16:06 schrieb Valerie Obenchain <vobencha at="" fhcrc.org="">> <mailto:vobencha at="" fhcrc.org="">>: >> >>> Hi Stephanie, >>> >>> A GappedAlignmentPairs object is made up of a 'left' and a 'right' >>> GappedAlignments. >>> >>> > ex1_file <- system.file("extdata", "ex1.bam", package="Rsamtools") >>> > galp <- readGappedAlignmentPairs(ex1_file, use.names=TRUE) >>> >>> You can extract the left/right parts with an accessor then look at >>> strand, cigars, gaps etc. >>> >>> > left <- left(galp) >>> > class(left) >>> [1] "GappedAlignments" >>> attr(,"package") >>> [1] "GenomicRanges" >>> >>> > strand(left) >>> factor-Rle of length 1572 with 665 runs >>> Lengths: 3 25 1 1 1 3 1 1 3 2 2 ... 1 6 1 2 3 2 1 5 >>> 1 51 >>> Values : - + - + - + - + - + - ... + - + - + - + - >>> + - >>> Levels(3): + - * >>> >>> > head(cigar(left)) >>> [1] "35M" "36M" "35M" "36M" "35M" "35M" >>> >>> > head(ngap(left)) >>> [1] 0 0 0 0 0 0 >>> >>> It's on these left/right pairs that you can reset the strand. >>> >>> > strand(left)[1:3] >>> factor-Rle of length 3 with 1 run >>> Lengths: 3 >>> Values : + >>> Levels(3): + - * >>> > strand(left)[1:3] <- "-" >>> > strand(left)[1:3] >>> factor-Rle of length 3 with 1 run >>> Lengths: 3 >>> Values : - >>> Levels(3): + - * >>> >>> Please be sure to read the details on the man page regarding how the >>> strands were assigned. >>> >>> ?GappedAlignmentPairs >>> >>> >>> Valerie >>> >>> On 02/27/13 05:55, Stefanie Tauber wrote: >>>> Dear list, >>>> >>>> Let's say I have single-end strand-specific RNA-Seq data. >>>> Then I could read in the data with: >>>> >>>> library(GenomicRanges) >>>> >>>>> reads = readGappedAlignments("data.bam") >>>> As I typically don't know which strand has been sequenced, it might >>>> be necessary to flip the strand of the reads before doing any kind >>>> of summarizeOverlap: >>>> >>>>> strand(reads) = ifelse(strand(reads) == "+", "-", "+") >>>> Now I could do a 'summarizeOverlaps' between the reads and any kind >>>> of transcript database. >>>> >>>> Now, if I have paired-end strand-specific RNA-Seq data. >>>> I read the data: >>>>> reads = readGappedAlignmentPairs("data.bam") >>>> How can I flip the strand of the reads? As it's now paired-end data, >>>> it does not work as above. >>>> I could imagine several work-arounds, I just wonder if there is any >>>> straight approach which I miss at the moment.. >>>> >>>> >>>> best regards, >>>> Stefanie >>>> >>>> >>>> >>>> >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> DI Stefanie Tauber >> >> Center for Integrative Bioinformatics Vienna (CIBIV) >> (CIBIV is a joint institute of Vienna University and Medical University) >> Max F. Perutz Laboratories (MFPL) >> Campus Vienna Biocenter 5 (VBC5), Ebene 1, Room 1812.2 >> Dr. Bohr Gasse 9 >> A-1030 Wien, Austria >> Phone: ++43 +1 / 42772-4030 >> Fax: ++43 +1 / 42772-4098 >> email: stefanie.tauber at univie.ac.at <mailto:stefanie.tauber at="" univie.ac.at=""> >> www.cibiv.at <http: www.cibiv.at=""> >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY
0
Entering edit mode
Thanks Herve. On 02/27/13 11:23, Hervé Pagès wrote: > Hi Stefanie, Val, > > Makes sense to have a strand setter for GappedAlignmentPairs > objects, like we have for GappedAlignments objects. I've added > it to GenomicRanges devel: > > > galp > GappedAlignmentPairs with 5 alignment pairs and 0 metadata columns: > seqnames strand : ranges -- ranges > <rle> <rle> : <iranges> -- <iranges> > EAS54_65:6:277:590:364 chr1 - : [ 681, 715] -- [ 503, 537] > EAS139_11:5:61:38:1182 chr1 - : [1388, 1422] -- [1205, 1239] > EAS188_7:4:21:443:404 chr1 + : [1345, 1379] -- [1529, 1563] > EAS1_105:6:23:885:274 chr2 + : [1089, 1123] -- [1289, 1323] > EAS1_108:1:65:787:74 chr1 - : [ 213, 247] -- [ 61, 95] > --- > seqlengths: > chr1 chr2 > 1575 1584 > > > strand(first(galp)) > factor-Rle of length 5 with 3 runs > Lengths: 2 2 1 > Values : - + - > Levels(3): + - * > > > strand(last(galp)) > factor-Rle of length 5 with 3 runs > Lengths: 2 2 1 > Values : + - + > Levels(3): + - * > > Flip the strand: > > > strand(galp) <- ifelse(strand(galp) == "+", "-", "+") > > galp > GappedAlignmentPairs with 5 alignment pairs and 0 metadata columns: > seqnames strand : ranges -- ranges > <rle> <rle> : <iranges> -- <iranges> > EAS54_65:6:277:590:364 chr1 + : [ 681, 715] -- [ 503, 537] > EAS139_11:5:61:38:1182 chr1 + : [1388, 1422] -- [1205, 1239] > EAS188_7:4:21:443:404 chr1 - : [1345, 1379] -- [1529, 1563] > EAS1_105:6:23:885:274 chr2 - : [1089, 1123] -- [1289, 1323] > EAS1_108:1:65:787:74 chr1 + : [ 213, 247] -- [ 61, 95] > --- > seqlengths: > chr1 chr2 > 1575 1584 > > > strand(first(galp)) > factor-Rle of length 5 with 3 runs > Lengths: 2 2 1 > Values : + - + > Levels(3): + - * > > > strand(last(galp)) > factor-Rle of length 5 with 3 runs > Lengths: 2 2 1 > Values : - + - > Levels(3): + - * > > Note that a slightly more efficient way to compute the flipped strand > is: > > runValue(strand(galp)) <- ifelse(runValue(strand(galp)) == "+", > "-", "+") > > This will only make a difference on big GappedAlignmentPairs objects of > course. > > > sessionInfo() > R Under development (unstable) (2013-02-19 r62008) > Platform: x86_64-unknown-linux-gnu (64-bit) > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=C LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] parallel stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] Rsamtools_1.11.18 Biostrings_2.27.11 GenomicRanges_1.11.33 > [4] IRanges_1.17.35 BiocGenerics_0.5.6 > > loaded via a namespace (and not attached): > [1] bitops_1.0-5 stats4_3.0.0 tools_3.0.0 zlibbioc_1.5.0 > > > GenomicRanges 1.11.33, will become available via biocLite(0 in the next > 24 hours or so. > > Cheers, > H. > > > On 02/27/2013 07:48 AM, Valerie Obenchain wrote: >> You could use the low-level constructor >> >> GappedAlignmentPairs(first, last, isProperPair, names=NULL) >> >> 'isProperPair' is a logical vector the same length as 'first' and >> 'last'. For example, >> >> GappedAlignmentPairs(left, right, !logical(length(left))) >> >> >> Valerie >> >> On 02/27/13 07:12, Stefanie Tauber wrote: >>> Hi Valerie, >>> >>> thanks for the quick answer. >>> I have already checked how the strand is assigned for paired-end data. >>> After setting the strand for the left and right parts, >>> can I somehow rebuild the gappedalignmentPairs object? >>> >>> best, >>> Stefanie >>> >>> Am 27.02.2013 um 16:06 schrieb Valerie Obenchain <vobencha at="" fhcrc.org="">>> <mailto:vobencha at="" fhcrc.org="">>: >>> >>>> Hi Stephanie, >>>> >>>> A GappedAlignmentPairs object is made up of a 'left' and a 'right' >>>> GappedAlignments. >>>> >>>> > ex1_file <- system.file("extdata", "ex1.bam", package="Rsamtools") >>>> > galp <- readGappedAlignmentPairs(ex1_file, use.names=TRUE) >>>> >>>> You can extract the left/right parts with an accessor then look at >>>> strand, cigars, gaps etc. >>>> >>>> > left <- left(galp) >>>> > class(left) >>>> [1] "GappedAlignments" >>>> attr(,"package") >>>> [1] "GenomicRanges" >>>> >>>> > strand(left) >>>> factor-Rle of length 1572 with 665 runs >>>> Lengths: 3 25 1 1 1 3 1 1 3 2 2 ... 1 6 1 2 3 2 1 5 >>>> 1 51 >>>> Values : - + - + - + - + - + - ... + - + - + - + - >>>> + - >>>> Levels(3): + - * >>>> >>>> > head(cigar(left)) >>>> [1] "35M" "36M" "35M" "36M" "35M" "35M" >>>> >>>> > head(ngap(left)) >>>> [1] 0 0 0 0 0 0 >>>> >>>> It's on these left/right pairs that you can reset the strand. >>>> >>>> > strand(left)[1:3] >>>> factor-Rle of length 3 with 1 run >>>> Lengths: 3 >>>> Values : + >>>> Levels(3): + - * >>>> > strand(left)[1:3] <- "-" >>>> > strand(left)[1:3] >>>> factor-Rle of length 3 with 1 run >>>> Lengths: 3 >>>> Values : - >>>> Levels(3): + - * >>>> >>>> Please be sure to read the details on the man page regarding how the >>>> strands were assigned. >>>> >>>> ?GappedAlignmentPairs >>>> >>>> >>>> Valerie >>>> >>>> On 02/27/13 05:55, Stefanie Tauber wrote: >>>>> Dear list, >>>>> >>>>> Let's say I have single-end strand-specific RNA-Seq data. >>>>> Then I could read in the data with: >>>>> >>>>> library(GenomicRanges) >>>>> >>>>>> reads = readGappedAlignments("data.bam") >>>>> As I typically don't know which strand has been sequenced, it might >>>>> be necessary to flip the strand of the reads before doing any kind >>>>> of summarizeOverlap: >>>>> >>>>>> strand(reads) = ifelse(strand(reads) == "+", "-", "+") >>>>> Now I could do a 'summarizeOverlaps' between the reads and any kind >>>>> of transcript database. >>>>> >>>>> Now, if I have paired-end strand-specific RNA-Seq data. >>>>> I read the data: >>>>>> reads = readGappedAlignmentPairs("data.bam") >>>>> How can I flip the strand of the reads? As it's now paired-end data, >>>>> it does not work as above. >>>>> I could imagine several work-arounds, I just wonder if there is any >>>>> straight approach which I miss at the moment.. >>>>> >>>>> >>>>> best regards, >>>>> Stefanie >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> DI Stefanie Tauber >>> >>> Center for Integrative Bioinformatics Vienna (CIBIV) >>> (CIBIV is a joint institute of Vienna University and Medical >>> University) >>> Max F. Perutz Laboratories (MFPL) >>> Campus Vienna Biocenter 5 (VBC5), Ebene 1, Room 1812.2 >>> Dr. Bohr Gasse 9 >>> A-1030 Wien, Austria >>> Phone: ++43 +1 / 42772-4030 >>> Fax: ++43 +1 / 42772-4098 >>> email: stefanie.tauber at univie.ac.at >>> <mailto:stefanie.tauber at="" univie.ac.at=""> >>> www.cibiv.at <http: www.cibiv.at=""> >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >
ADD REPLY
0
Entering edit mode
Thanks a lot to both of you! Best, Stefanie Am 27.02.2013 um 20:47 schrieb Valerie Obenchain <vobencha at="" fhcrc.org="">: > Thanks Herve. > > On 02/27/13 11:23, Hervé Pagès wrote: >> Hi Stefanie, Val, >> >> Makes sense to have a strand setter for GappedAlignmentPairs >> objects, like we have for GappedAlignments objects. I've added >> it to GenomicRanges devel: >> >> > galp >> GappedAlignmentPairs with 5 alignment pairs and 0 metadata columns: >> seqnames strand : ranges -- ranges >> <rle> <rle> : <iranges> -- <iranges> >> EAS54_65:6:277:590:364 chr1 - : [ 681, 715] -- [ 503, 537] >> EAS139_11:5:61:38:1182 chr1 - : [1388, 1422] -- [1205, 1239] >> EAS188_7:4:21:443:404 chr1 + : [1345, 1379] -- [1529, 1563] >> EAS1_105:6:23:885:274 chr2 + : [1089, 1123] -- [1289, 1323] >> EAS1_108:1:65:787:74 chr1 - : [ 213, 247] -- [ 61, 95] >> --- >> seqlengths: >> chr1 chr2 >> 1575 1584 >> >> > strand(first(galp)) >> factor-Rle of length 5 with 3 runs >> Lengths: 2 2 1 >> Values : - + - >> Levels(3): + - * >> >> > strand(last(galp)) >> factor-Rle of length 5 with 3 runs >> Lengths: 2 2 1 >> Values : + - + >> Levels(3): + - * >> >> Flip the strand: >> >> > strand(galp) <- ifelse(strand(galp) == "+", "-", "+") >> > galp >> GappedAlignmentPairs with 5 alignment pairs and 0 metadata columns: >> seqnames strand : ranges -- ranges >> <rle> <rle> : <iranges> -- <iranges> >> EAS54_65:6:277:590:364 chr1 + : [ 681, 715] -- [ 503, 537] >> EAS139_11:5:61:38:1182 chr1 + : [1388, 1422] -- [1205, 1239] >> EAS188_7:4:21:443:404 chr1 - : [1345, 1379] -- [1529, 1563] >> EAS1_105:6:23:885:274 chr2 - : [1089, 1123] -- [1289, 1323] >> EAS1_108:1:65:787:74 chr1 + : [ 213, 247] -- [ 61, 95] >> --- >> seqlengths: >> chr1 chr2 >> 1575 1584 >> >> > strand(first(galp)) >> factor-Rle of length 5 with 3 runs >> Lengths: 2 2 1 >> Values : + - + >> Levels(3): + - * >> >> > strand(last(galp)) >> factor-Rle of length 5 with 3 runs >> Lengths: 2 2 1 >> Values : - + - >> Levels(3): + - * >> >> Note that a slightly more efficient way to compute the flipped strand >> is: >> >> runValue(strand(galp)) <- ifelse(runValue(strand(galp)) == "+", >> "-", "+") >> >> This will only make a difference on big GappedAlignmentPairs objects of >> course. >> >> > sessionInfo() >> R Under development (unstable) (2013-02-19 r62008) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=C LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] parallel stats graphics grDevices utils datasets methods >> [8] base >> >> other attached packages: >> [1] Rsamtools_1.11.18 Biostrings_2.27.11 GenomicRanges_1.11.33 >> [4] IRanges_1.17.35 BiocGenerics_0.5.6 >> >> loaded via a namespace (and not attached): >> [1] bitops_1.0-5 stats4_3.0.0 tools_3.0.0 zlibbioc_1.5.0 >> >> >> GenomicRanges 1.11.33, will become available via biocLite(0 in the next >> 24 hours or so. >> >> Cheers, >> H. >> >> >> On 02/27/2013 07:48 AM, Valerie Obenchain wrote: >>> You could use the low-level constructor >>> >>> GappedAlignmentPairs(first, last, isProperPair, names=NULL) >>> >>> 'isProperPair' is a logical vector the same length as 'first' and >>> 'last'. For example, >>> >>> GappedAlignmentPairs(left, right, !logical(length(left))) >>> >>> >>> Valerie >>> >>> On 02/27/13 07:12, Stefanie Tauber wrote: >>>> Hi Valerie, >>>> >>>> thanks for the quick answer. >>>> I have already checked how the strand is assigned for paired-end data. >>>> After setting the strand for the left and right parts, >>>> can I somehow rebuild the gappedalignmentPairs object? >>>> >>>> best, >>>> Stefanie >>>> >>>> Am 27.02.2013 um 16:06 schrieb Valerie Obenchain <vobencha at="" fhcrc.org="">>>> <mailto:vobencha at="" fhcrc.org="">>: >>>> >>>>> Hi Stephanie, >>>>> >>>>> A GappedAlignmentPairs object is made up of a 'left' and a 'right' >>>>> GappedAlignments. >>>>> >>>>> > ex1_file <- system.file("extdata", "ex1.bam", package="Rsamtools") >>>>> > galp <- readGappedAlignmentPairs(ex1_file, use.names=TRUE) >>>>> >>>>> You can extract the left/right parts with an accessor then look at >>>>> strand, cigars, gaps etc. >>>>> >>>>> > left <- left(galp) >>>>> > class(left) >>>>> [1] "GappedAlignments" >>>>> attr(,"package") >>>>> [1] "GenomicRanges" >>>>> >>>>> > strand(left) >>>>> factor-Rle of length 1572 with 665 runs >>>>> Lengths: 3 25 1 1 1 3 1 1 3 2 2 ... 1 6 1 2 3 2 1 5 >>>>> 1 51 >>>>> Values : - + - + - + - + - + - ... + - + - + - + - >>>>> + - >>>>> Levels(3): + - * >>>>> >>>>> > head(cigar(left)) >>>>> [1] "35M" "36M" "35M" "36M" "35M" "35M" >>>>> >>>>> > head(ngap(left)) >>>>> [1] 0 0 0 0 0 0 >>>>> >>>>> It's on these left/right pairs that you can reset the strand. >>>>> >>>>> > strand(left)[1:3] >>>>> factor-Rle of length 3 with 1 run >>>>> Lengths: 3 >>>>> Values : + >>>>> Levels(3): + - * >>>>> > strand(left)[1:3] <- "-" >>>>> > strand(left)[1:3] >>>>> factor-Rle of length 3 with 1 run >>>>> Lengths: 3 >>>>> Values : - >>>>> Levels(3): + - * >>>>> >>>>> Please be sure to read the details on the man page regarding how the >>>>> strands were assigned. >>>>> >>>>> ?GappedAlignmentPairs >>>>> >>>>> >>>>> Valerie >>>>> >>>>> On 02/27/13 05:55, Stefanie Tauber wrote: >>>>>> Dear list, >>>>>> >>>>>> Let's say I have single-end strand-specific RNA-Seq data. >>>>>> Then I could read in the data with: >>>>>> >>>>>> library(GenomicRanges) >>>>>> >>>>>>> reads = readGappedAlignments("data.bam") >>>>>> As I typically don't know which strand has been sequenced, it might >>>>>> be necessary to flip the strand of the reads before doing any kind >>>>>> of summarizeOverlap: >>>>>> >>>>>>> strand(reads) = ifelse(strand(reads) == "+", "-", "+") >>>>>> Now I could do a 'summarizeOverlaps' between the reads and any kind >>>>>> of transcript database. >>>>>> >>>>>> Now, if I have paired-end strand-specific RNA-Seq data. >>>>>> I read the data: >>>>>>> reads = readGappedAlignmentPairs("data.bam") >>>>>> How can I flip the strand of the reads? As it's now paired-end data, >>>>>> it does not work as above. >>>>>> I could imagine several work-arounds, I just wonder if there is any >>>>>> straight approach which I miss at the moment.. >>>>>> >>>>>> >>>>>> best regards, >>>>>> Stefanie >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> [[alternative HTML version deleted]] >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: >>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>
ADD REPLY

Login before adding your answer.

Traffic: 487 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6