WIG coverage for Paired-end strand-specific RNA-Seq (PE ssRNA-Seq)
1
0
Entering edit mode
bach le ▴ 40
@bach-le-6148
Last seen 9.6 years ago
Hi, I would like to ask if any of you have experience with generating *WIG coverage files from PE ssRNA-Seq* data. I got some discussion around this ( http://seqanswers.com/forums/showthread.php?t=20545) but it seems not to be solved yet. Also, does *countOverlaps* function work with GappedAlignmentPairs object? In that case, will the first or last alignment of each pair be used for calculating the overlap? Thanks much in advance. Best, [[alternative HTML version deleted]]
Alignment Alignment • 1.7k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 2.4 years ago
United States
On Thu, Nov 7, 2013 at 5:43 AM, bach le <le.bach.07@gmail.com> wrote: > Hi, > I would like to ask if any of you have experience with generating *WIG > coverage files from PE ssRNA-Seq* data. I got some discussion around this ( > http://seqanswers.com/forums/showthread.php?t=20545) but it seems not to > be > solved yet. > I've no experience with strand-specific protocols, but it sounds like the strand of the first read in the pair indicates the strand. If that is true, please confirm as it would inform our design. As suggested in the thread, just flip the strand of the second ends and use the code earlier in that thread. GAlignmentPairs makes this easy, the accessors first() and last() get you the #1 and #2 reads, respectively, as GAlignments. > Also, does *countOverlaps* function work with GappedAlignmentPairs object? > In that case, will the first or last alignment of each pair be used for > calculating the overlap? > GappedAlignmentPairs is now called GAlignmentPairs. Whenever you want to know whether there is a method for a certain signature, use showMethods() to see the list or selectMethod() to retrieve the method object for the signature. In this case: > showMethods(countOverlaps) [snip] query="GAlignmentPairs", subject="GAlignmentPairs" query="GAlignmentPairs", subject="Vector" [snip] This means we can call countOverlaps with a GAlignmentPairs and any Vector (of ranges), including a GRangesList of transcripts. The method works by first converting GAlignmentPairs into a GRangesList, where each element corresponds to a pair and contains the exonic ranges from both ends. The problem for you is that the strand will not be correct for the second end. Thus, you will need to call grglist() to get the GRangesList manually and modify the strand before passing it to countOverlaps. The strand modification would look like: strand(grl) <- relist(rep(strand(first(reads)), elementLengths(grl)), grl) There is some experimental functionality in GenomicRanges for computing isoform-specific overlaps: findSpliceOverlaps. These functions look for the BAM XS tag and if found use it to determine the strand in a way similar to above. If you were to set the XS mcol on your GAlignmentPairs to strand(first(reads)), then these functions should work the way you want. Michael Thanks much in advance. > Best, > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Hi Michael, Thank you very much for prompt reply. With our ssRNA-Seq protocol, strand info of the second read represents the whole fragment. Also, even the code can be applied for calculating coverage, it may not work effectively with large BAM file because of requiring to load the whole file at once. Is there any trick to avoid this? Regarding countOverlaps function, as you suggested to convert GappedAlignmentPairs into a GRangeList corresponding to two reads, it happens that one fragment may be counted twice. Does it cause any bias on the output, assuming that the read counts are used for DEG analysis? Looking forward to your reply. Best regards, On Thu, Nov 7, 2013 at 11:32 PM, Michael Lawrence <lawrence.michael@gene.com> wrote: > > > > On Thu, Nov 7, 2013 at 5:43 AM, bach le <le.bach.07@gmail.com> wrote: > >> Hi, >> I would like to ask if any of you have experience with generating *WIG >> coverage files from PE ssRNA-Seq* data. I got some discussion around this >> ( >> >> http://seqanswers.com/forums/showthread.php?t=20545) but it seems not to >> be >> solved yet. >> > > I've no experience with strand-specific protocols, but it sounds like the > strand of the first read in the pair indicates the strand. If that is true, > please confirm as it would inform our design. As suggested in the thread, > just flip the strand of the second ends and use the code earlier in that > thread. GAlignmentPairs makes this easy, the accessors first() and last() > get you the #1 and #2 reads, respectively, as GAlignments. > > > >> Also, does *countOverlaps* function work with GappedAlignmentPairs object? >> >> In that case, will the first or last alignment of each pair be used for >> calculating the overlap? >> > > GappedAlignmentPairs is now called GAlignmentPairs. Whenever you want to > know whether there is a method for a certain signature, use showMethods() > to see the list or selectMethod() to retrieve the method object for the > signature. In this case: > > > showMethods(countOverlaps) > [snip] > query="GAlignmentPairs", subject="GAlignmentPairs" > query="GAlignmentPairs", subject="Vector" > [snip] > > This means we can call countOverlaps with a GAlignmentPairs and any Vector > (of ranges), including a GRangesList of transcripts. The method works by > first converting GAlignmentPairs into a GRangesList, where each element > corresponds to a pair and contains the exonic ranges from both ends. The > problem for you is that the strand will not be correct for the second end. > Thus, you will need to call grglist() to get the GRangesList manually and > modify the strand before passing it to countOverlaps. The strand > modification would look like: > > strand(grl) <- relist(rep(strand(first(reads)), elementLengths(grl)), grl) > > There is some experimental functionality in GenomicRanges for computing > isoform-specific overlaps: findSpliceOverlaps. These functions look for the > BAM XS tag and if found use it to determine the strand in a way similar to > above. If you were to set the XS mcol on your GAlignmentPairs to > strand(first(reads)), then these functions should work the way you want. > > Michael > > Thanks much in advance. >> Best, >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
On Thu, Nov 7, 2013 at 7:05 AM, bach le <le.bach.07@gmail.com> wrote: > Hi Michael, > Thank you very much for prompt reply. > > With our ssRNA-Seq protocol, strand info of the second read represents the > whole fragment. Also, even the code can be applied for calculating > coverage, it may not work effectively with large BAM file because of > requiring to load the whole file at once. Is there any trick to avoid this? > > There is a way to tell Rsamtools to yield data in chunks. # read 10 million reads at once bf <- BamFile(file, yieldSize = 10e6) ga <- readGAlignments(ga) cov <- coverage(ga) # you would want to split this by strand Do something like that in a loop until "ga" comes back empty. Regarding countOverlaps function, as you suggested to convert > GappedAlignmentPairs into a GRangeList corresponding to two reads, it > happens that one fragment may be counted twice. Does it cause any bias on > the output, assuming that the read counts are used for DEG analysis? > > With GRangesList, one fragment is only counted once for one transcript (the two reads are in the same element). There are features in summarizeOverlaps for dealing with cases where one fragment overlaps multiple transcripts, if you care about that. > Looking forward to your reply. > Best regards, > > > On Thu, Nov 7, 2013 at 11:32 PM, Michael Lawrence < > lawrence.michael@gene.com > > wrote: > > > > > > > > > On Thu, Nov 7, 2013 at 5:43 AM, bach le <le.bach.07@gmail.com> wrote: > > > >> Hi, > >> I would like to ask if any of you have experience with generating *WIG > >> coverage files from PE ssRNA-Seq* data. I got some discussion around > this > >> ( > >> > >> http://seqanswers.com/forums/showthread.php?t=20545) but it seems not > to > >> be > >> solved yet. > >> > > > > I've no experience with strand-specific protocols, but it sounds like the > > strand of the first read in the pair indicates the strand. If that is > true, > > please confirm as it would inform our design. As suggested in the thread, > > just flip the strand of the second ends and use the code earlier in that > > thread. GAlignmentPairs makes this easy, the accessors first() and last() > > get you the #1 and #2 reads, respectively, as GAlignments. > > > > > > > >> Also, does *countOverlaps* function work with GappedAlignmentPairs > object? > >> > >> In that case, will the first or last alignment of each pair be used for > >> calculating the overlap? > >> > > > > GappedAlignmentPairs is now called GAlignmentPairs. Whenever you want to > > know whether there is a method for a certain signature, use showMethods() > > to see the list or selectMethod() to retrieve the method object for the > > signature. In this case: > > > > > showMethods(countOverlaps) > > [snip] > > query="GAlignmentPairs", subject="GAlignmentPairs" > > query="GAlignmentPairs", subject="Vector" > > [snip] > > > > This means we can call countOverlaps with a GAlignmentPairs and any > Vector > > (of ranges), including a GRangesList of transcripts. The method works by > > first converting GAlignmentPairs into a GRangesList, where each element > > corresponds to a pair and contains the exonic ranges from both ends. The > > problem for you is that the strand will not be correct for the second > end. > > Thus, you will need to call grglist() to get the GRangesList manually and > > modify the strand before passing it to countOverlaps. The strand > > modification would look like: > > > > strand(grl) <- relist(rep(strand(first(reads)), elementLengths(grl)), > grl) > > > > There is some experimental functionality in GenomicRanges for computing > > isoform-specific overlaps: findSpliceOverlaps. These functions look for > the > > BAM XS tag and if found use it to determine the strand in a way similar > to > > above. If you were to set the XS mcol on your GAlignmentPairs to > > strand(first(reads)), then these functions should work the way you want. > > > > Michael > > > > Thanks much in advance. > >> Best, > >> > >> [[alternative HTML version deleted]] > >> > >> _______________________________________________ > >> Bioconductor mailing list > >> Bioconductor@r-project.org > >> https://stat.ethz.ch/mailman/listinfo/bioconductor > >> Search the archives: > >> http://news.gmane.org/gmane.science.biology.informatics.conductor > >> > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Thank Michael, it works:). On Fri, Nov 8, 2013 at 3:49 AM, Michael Lawrence <lawrence.michael@gene.com>wrote: > > > > On Thu, Nov 7, 2013 at 7:05 AM, bach le <le.bach.07@gmail.com> wrote: > >> Hi Michael, >> Thank you very much for prompt reply. >> >> With our ssRNA-Seq protocol, strand info of the second read represents the >> whole fragment. Also, even the code can be applied for calculating >> coverage, it may not work effectively with large BAM file because of >> requiring to load the whole file at once. Is there any trick to avoid >> this? >> >> > There is a way to tell Rsamtools to yield data in chunks. > > # read 10 million reads at once > bf <- BamFile(file, yieldSize = 10e6) > ga <- readGAlignments(ga) > cov <- coverage(ga) # you would want to split this by strand > > Do something like that in a loop until "ga" comes back empty. > > Regarding countOverlaps function, as you suggested to convert >> GappedAlignmentPairs into a GRangeList corresponding to two reads, it >> happens that one fragment may be counted twice. Does it cause any bias on >> the output, assuming that the read counts are used for DEG analysis? >> >> > With GRangesList, one fragment is only counted once for one transcript > (the two reads are in the same element). There are features in > summarizeOverlaps for dealing with cases where one fragment overlaps > multiple transcripts, if you care about that. > > > >> Looking forward to your reply. >> Best regards, >> >> >> On Thu, Nov 7, 2013 at 11:32 PM, Michael Lawrence < >> lawrence.michael@gene.com >> > wrote: >> >> > >> > >> > >> > On Thu, Nov 7, 2013 at 5:43 AM, bach le <le.bach.07@gmail.com> wrote: >> > >> >> Hi, >> >> I would like to ask if any of you have experience with generating *WIG >> >> coverage files from PE ssRNA-Seq* data. I got some discussion around >> this >> >> ( >> >> >> >> http://seqanswers.com/forums/showthread.php?t=20545) but it seems not >> to >> >> be >> >> solved yet. >> >> >> > >> > I've no experience with strand-specific protocols, but it sounds like >> the >> > strand of the first read in the pair indicates the strand. If that is >> true, >> > please confirm as it would inform our design. As suggested in the >> thread, >> > just flip the strand of the second ends and use the code earlier in that >> > thread. GAlignmentPairs makes this easy, the accessors first() and >> last() >> > get you the #1 and #2 reads, respectively, as GAlignments. >> > >> > >> > >> >> Also, does *countOverlaps* function work with GappedAlignmentPairs >> object? >> >> >> >> In that case, will the first or last alignment of each pair be used for >> >> calculating the overlap? >> >> >> > >> > GappedAlignmentPairs is now called GAlignmentPairs. Whenever you want to >> > know whether there is a method for a certain signature, use >> showMethods() >> > to see the list or selectMethod() to retrieve the method object for the >> > signature. In this case: >> > >> > > showMethods(countOverlaps) >> > [snip] >> > query="GAlignmentPairs", subject="GAlignmentPairs" >> > query="GAlignmentPairs", subject="Vector" >> > [snip] >> > >> > This means we can call countOverlaps with a GAlignmentPairs and any >> Vector >> > (of ranges), including a GRangesList of transcripts. The method works by >> > first converting GAlignmentPairs into a GRangesList, where each element >> > corresponds to a pair and contains the exonic ranges from both ends. The >> > problem for you is that the strand will not be correct for the second >> end. >> > Thus, you will need to call grglist() to get the GRangesList manually >> and >> > modify the strand before passing it to countOverlaps. The strand >> > modification would look like: >> > >> > strand(grl) <- relist(rep(strand(first(reads)), elementLengths(grl)), >> grl) >> > >> > There is some experimental functionality in GenomicRanges for computing >> > isoform-specific overlaps: findSpliceOverlaps. These functions look for >> the >> > BAM XS tag and if found use it to determine the strand in a way similar >> to >> > above. If you were to set the XS mcol on your GAlignmentPairs to >> > strand(first(reads)), then these functions should work the way you want. >> > >> > Michael >> > >> > Thanks much in advance. >> >> Best, >> >> >> >> [[alternative HTML version deleted]] >> >> >> >> _______________________________________________ >> >> Bioconductor mailing list >> >> Bioconductor@r-project.org >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> >> Search the archives: >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> >> > >> > >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > [[alternative HTML version deleted]]
ADD REPLY

Login before adding your answer.

Traffic: 592 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6