A question about the function readGAlignmentPairs in GenomicRnages package
3
0
Entering edit mode
@niu-liang-nihniehs-e-6477
Last seen 9.6 years ago
Dear Herve, I found the reason for the mate_status error: it is because that the read 1 in each pair were reverse complemented before the alignment. Now I have a question: when I use readGAlignmentsList() to read in the pairs, how can I preserve the order of the two reads, i.e., keep read 1 as the first alignment and read 2 as the second alignment in a element (with two alignments) of the alignments list? is it the default order of each element of the list? Also, if I use grglist() to convert the above GAlignmentsList into a GRangesList, do I need to set the order.as.in.query=TRUE for the above purpose? Best, Liang ________________________________________ From: Hervé Pagès [hpages@fhcrc.org] Sent: Sunday, March 30, 2014 4:45 PM To: Niu, Liang (NIH/NIEHS) [E]; bioconductor at r-project.org Subject: Re: A question about the function readGAlignmentPairs in GenomicRnages package On 03/30/2014 01:31 PM, Niu, Liang (NIH/NIEHS) [E] wrote: > Dear Herve, > > Here is the code and output: > >> bam.path<-"test.bam" >> bam.file<-BamFile(bam.path,asMates=TRUE) >> >> param<-ScanBamParam(flag=scanBamFlag(isPaired=TRUE,hasUnmappedMate= FALSE,isUnmappedQuery=FALSE,isNotPassingQualityControls=FALSE,isDuplic ate=FALSE,isNotPrimaryRead=FALSE)) >> alignment<-readGAlignmentsList(bam.file,param=param) >> length(alignment) > [1] 18041910 >> table(mcols(unlist(alignment, use.names=FALSE))$mates[end(PartitioningByEnd(alignment))]) > > FALSE > 18041910 > > What is the problem? Whaoo. No idea :-/ I'm on week-end right now and have to turn off the computer due to other obligations, sorry. I'll try to get to this later. In the mean time it would be great if you could install R-3.1.0 + BioC 2.14 (you will need the GenomicAlignments package) and try this again. Thanks, H. > > Thanks! > > Liang > > ________________________________________ > From: Hervé Pagès [hpages at fhcrc.org] > Sent: Sunday, March 30, 2014 4:13 PM > To: Niu, Liang (NIH/NIEHS) [E]; bioconductor at r-project.org > Subject: Re: A question about the function readGAlignmentPairs in GenomicRnages package > > Dear Liang, > > Please keep the communication on the mailing list (by using > the "Reply All" button). > > On 03/30/2014 12:13 PM, Niu, Liang (NIH/NIEHS) [E] wrote: >> Dear Herve, >> >> I use the following command to read in the alignments: >> >> bam.path<-"test.bam" >> bam.file<-BamFile(bam.path, yieldSize=1E6, asMates=TRUE) >> >> param<-ScanBamParam(flag=scanBamFlag(isPaired=TRUE,hasUnmappedMate= FALSE,isUnmappedQuery=FALSE,isNotPassingQualityControls=FALSE,isDuplic ate=FALSE,isNotPrimaryRead=FALSE)) >> alignment<-readGAlignmentsList(bam.file,param=param) >> >> However, each element ((a pair of alignments)) of the GAlignmentsList "alignment" has meta field Mates as "FALSE" for both reads. Is it a problem? > > I'm surprised that *each* list element in your GAlignmentsList object > 'alignment' looks like this. How do you know? Why not show us the > object? > > My understanding is that even though your BAM file contains paired- end > reads, when you read it with readGAlignmentsList(), you end up with a > GAlignmentsList object where some fraction of the list elements have > the "mates" field set to FALSE. The reads in such list element have the > same QNAME but are not mates. Note that all list elements with the > "mates" field set to TRUE are guaranteed to contain exactly 2 reads. > But there is no such expectation for list elements with the "mates" > field set to FALSE: they can contain 1, 2, 3, or more reads. > > So if *each* list element in your GAlignmentsList object > 'alignment' really has the "mates" field set to FALSE, that > means none of the reads in your file could be mated. That could > actually happen if your data was not paired-end and if you didn't > use isPaired=TRUE but that doesn't seem to be the case here. > > Here is how you can summarize how many list elements have the > "mates" field set to FALSE and how many have it set to TRUE: > > table(mcols(unlist(alignment, > use.names=FALSE))$mates[end(PartitioningByEnd(alignment))]) > > Finally note that in BioC devel, the "mates" field has been replaced > by the "mate_status" field and that this field is now an attribute of > the list elements rather than of the individual reads. This means that > it's a top-level metadata column (i.e. directly accessible with > mcols(alignment)$mate_status) instead of an inner metadata column > (which are more complicated to access). So it's much easier now to get > the above count: > > table(mcols(alignment)$mate_status) > > This will give you something like: > > > table(mcols(galist1)$mate_status) > > mated ambiguous unmated > 75346 0 21286 > > As you can see, another change is that this field is now a 3-level > factor (levels are explained in ?readGAlignmentsList). > > Hope this helps, > H. > > >> >> Liang >> >> >> ________________________________________ >> From: Hervé Pagès [hpages at fhcrc.org] >> Sent: Sunday, March 30, 2014 2:54 PM >> To: Niu, Liang (NIH/NIEHS) [E]; bioconductor at r-project.org >> Subject: Re: A question about the function readGAlignmentPairs in GenomicRnages package >> >> On 03/30/2014 11:52 AM, Niu, Liang (NIH/NIEHS) [E] wrote: >>> Sorry for the typo in the previous email. What I mean is "I do NEED (not NOT) the pairing information". >> >> I see. So yes, readGAlignmentsList() should do it. >> >> Cheers, >> H. >> >>> >>> Liang >>> ________________________________________ >>> From: Hervé Pagès [hpages at fhcrc.org] >>> Sent: Sunday, March 30, 2014 2:51 PM >>> To: Niu, Liang (NIH/NIEHS) [E]; bioconductor at r-project.org >>> Subject: Re: A question about the function readGAlignmentPairs in GenomicRnages package >>> >>> On 03/30/2014 11:31 AM, Niu, Liang (NIH/NIEHS) [E] wrote: >>>> Dear Herve, >>>> >>>> Thanks for your suggestion. I do not the paring information in the downstream analysis, therefore, readGAlignmentsList() is good. >>> >>> Maybe I was not clear enough but if you don't need the pairing, then >>> you can just use readGAlignments(). readGAlignmentsList() will pair >>> everything that can be paired and this has a cost (in terms of >>> performance and memory usage). By using readGAlignments() you avoid >>> that cost and you also end up with an object that is a little bit >>> simpler to manipulate (i.e. a GAlignments instead of GAlignmentsList >>> object). >>> >>> Hope that makes sense, >>> H. >>> >>>> >>>> Liang >>>> ________________________________________ >>>> From: Hervé Pagès [hpages at fhcrc.org] >>>> Sent: Sunday, March 30, 2014 2:12 PM >>>> To: Niu, Liang (NIH/NIEHS) [E] >>>> Cc: bioconductor at r-project.org >>>> Subject: Re: A question about the function readGAlignmentPairs in GenomicRnages package >>>> >>>> Hi Liang, >>>> >>>> Hope you don't mind that I'm cc'ing the Bioconductor mailing list, so >>>> other can give suggestions. >>>> >>>> On 03/30/2014 09:32 AM, Niu, Liang (NIH/NIEHS) [E] wrote: >>>>> Dear Mr. Pages, >>>>> >>>>> This is Liang Niu, a research fellow at the National Institute of Environmental Health Sciences. >>>>> >>>>> I am using R to read .bam files for chromatin interaction data sets. Such data sets contains alignments for paired-end reads from ChIA-PET experiment, thus it has pairs in which two reads on different chromosomes and/or on same strand, and those pairs are valid pairs. I want to use your function readGAlignmentPairs in GenomicRnages package to read the pairs, but the manual says that it will discard those pairs. Do you have any suggestion? >>>> >>>> You could use readGAlignmentsList() instead of readGAlignmentPairs(). >>>> readGAlignmentsList() will keep these "discordant pairs". It will also >>>> keep the reads that cannot be paired. >>>> Note that, depending on what you will do downstream, you don't >>>> necessarily need to pair the reads (e.g. if you're going to compute >>>> the read coverage). In that case you can just load the reads with >>>> readGAlignments(). >>>> >>>> Cheers, >>>> H. >>>> >>>>> >>>>> Best, >>>>> Liang >>>>> >>>> >>>> -- >>>> Hervé Pagès >>>> >>>> Program in Computational Biology >>>> Division of Public Health Sciences >>>> Fred Hutchinson Cancer Research Center >>>> 1100 Fairview Ave. N, M1-B514 >>>> P.O. Box 19024 >>>> Seattle, WA 98109-1024 >>>> >>>> E-mail: hpages at fhcrc.org >>>> Phone: (206) 667-5791 >>>> Fax: (206) 667-1319 >>>> >>> >>> -- >>> Hervé Pagès >>> >>> Program in Computational Biology >>> Division of Public Health Sciences >>> Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N, M1-B514 >>> P.O. Box 19024 >>> Seattle, WA 98109-1024 >>> >>> E-mail: hpages at fhcrc.org >>> Phone: (206) 667-5791 >>> Fax: (206) 667-1319 >>> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpages at fhcrc.org >> Phone: (206) 667-5791 >> Fax: (206) 667-1319 >> > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
Alignment Cancer convert GenomicAlignments Alignment Cancer convert GenomicAlignments • 1.7k views
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 13 hours ago
Seattle, WA, United States
Hi Liang, On 04/01/2014 12:12 PM, Niu, Liang (NIH/NIEHS) [E] wrote: > Dear Herve, > > I found the reason for the mate_status error: it is because that the read 1 in each pair were reverse complemented before the alignment. > > Now I have a question: when I use readGAlignmentsList() to read in the pairs, how can I preserve the order of the two reads, i.e., keep read 1 as the first alignment and read 2 as the second alignment in a element (with two alignments) of the alignments list? is it the default order of each element of the list? Yes. Even though I cannot find this information in the man page for readGAlignmentsList(), my understanding is that the 1st mate will always be 1st in the GAlignmentsList list elements that have mate_status set to "mated", and that the 2nd mate will always be 2nd. Val please correct me if I'm wrong. > > Also, if I use grglist() to convert the above GAlignmentsList into a GRangesList, do I need to set the order.as.in.query=TRUE for the above purpose? I didn't know the "grglist" method for GAlignmentsList objects supported this argument (and it doesn't seem to be documented). But yes, if you care about the order of the ranges within each list element of the returned GRangesList, you should use 'order.as.in.query=TRUE'. At least that's what you should do on a GAlignmentPairs object. I'm just assuming it's the same for a GAlignmentsList object but I don't know how this translates for list elements with mate_status other than "mated" (you will need to check if that matters for your case). However, note that if you're going to find or count overlaps (with findOverlaps(), countOverlaps(), or summarizeOverlaps()), how you set 'order.as.in.query' is irrelevant. The only situation I know where this matters is when you want to compute the "overlap encodings" (with encodeOverlaps()), which is actually what motivated the addition of the 'order.as.in.query' arg. HTH, H. > > Best, > Liang > > > ________________________________________ > From: Hervé Pagès [hpages at fhcrc.org] > Sent: Sunday, March 30, 2014 4:45 PM > To: Niu, Liang (NIH/NIEHS) [E]; bioconductor at r-project.org > Subject: Re: A question about the function readGAlignmentPairs in GenomicRnages package > > On 03/30/2014 01:31 PM, Niu, Liang (NIH/NIEHS) [E] wrote: >> Dear Herve, >> >> Here is the code and output: >> >>> bam.path<-"test.bam" >>> bam.file<-BamFile(bam.path,asMates=TRUE) >>> >>> param<-ScanBamParam(flag=scanBamFlag(isPaired=TRUE,hasUnmappedMate =FALSE,isUnmappedQuery=FALSE,isNotPassingQualityControls=FALSE,isDupli cate=FALSE,isNotPrimaryRead=FALSE)) >>> alignment<-readGAlignmentsList(bam.file,param=param) >>> length(alignment) >> [1] 18041910 >>> table(mcols(unlist(alignment, use.names=FALSE))$mates[end(PartitioningByEnd(alignment))]) >> >> FALSE >> 18041910 >> >> What is the problem? > > Whaoo. No idea :-/ I'm on week-end right now and have to turn off > the computer due to other obligations, sorry. I'll try to get to this > later. In the mean time it would be great if you could install R-3.1.0 > + BioC 2.14 (you will need the GenomicAlignments package) and try > this again. > > Thanks, > H. > >> >> Thanks! >> >> Liang >> >> ________________________________________ >> From: Hervé Pagès [hpages at fhcrc.org] >> Sent: Sunday, March 30, 2014 4:13 PM >> To: Niu, Liang (NIH/NIEHS) [E]; bioconductor at r-project.org >> Subject: Re: A question about the function readGAlignmentPairs in GenomicRnages package >> >> Dear Liang, >> >> Please keep the communication on the mailing list (by using >> the "Reply All" button). >> >> On 03/30/2014 12:13 PM, Niu, Liang (NIH/NIEHS) [E] wrote: >>> Dear Herve, >>> >>> I use the following command to read in the alignments: >>> >>> bam.path<-"test.bam" >>> bam.file<-BamFile(bam.path, yieldSize=1E6, asMates=TRUE) >>> >>> param<-ScanBamParam(flag=scanBamFlag(isPaired=TRUE,hasUnmappedMate =FALSE,isUnmappedQuery=FALSE,isNotPassingQualityControls=FALSE,isDupli cate=FALSE,isNotPrimaryRead=FALSE)) >>> alignment<-readGAlignmentsList(bam.file,param=param) >>> >>> However, each element ((a pair of alignments)) of the GAlignmentsList "alignment" has meta field Mates as "FALSE" for both reads. Is it a problem? >> >> I'm surprised that *each* list element in your GAlignmentsList object >> 'alignment' looks like this. How do you know? Why not show us the >> object? >> >> My understanding is that even though your BAM file contains paired- end >> reads, when you read it with readGAlignmentsList(), you end up with a >> GAlignmentsList object where some fraction of the list elements have >> the "mates" field set to FALSE. The reads in such list element have the >> same QNAME but are not mates. Note that all list elements with the >> "mates" field set to TRUE are guaranteed to contain exactly 2 reads. >> But there is no such expectation for list elements with the "mates" >> field set to FALSE: they can contain 1, 2, 3, or more reads. >> >> So if *each* list element in your GAlignmentsList object >> 'alignment' really has the "mates" field set to FALSE, that >> means none of the reads in your file could be mated. That could >> actually happen if your data was not paired-end and if you didn't >> use isPaired=TRUE but that doesn't seem to be the case here. >> >> Here is how you can summarize how many list elements have the >> "mates" field set to FALSE and how many have it set to TRUE: >> >> table(mcols(unlist(alignment, >> use.names=FALSE))$mates[end(PartitioningByEnd(alignment))]) >> >> Finally note that in BioC devel, the "mates" field has been replaced >> by the "mate_status" field and that this field is now an attribute of >> the list elements rather than of the individual reads. This means that >> it's a top-level metadata column (i.e. directly accessible with >> mcols(alignment)$mate_status) instead of an inner metadata column >> (which are more complicated to access). So it's much easier now to get >> the above count: >> >> table(mcols(alignment)$mate_status) >> >> This will give you something like: >> >> > table(mcols(galist1)$mate_status) >> >> mated ambiguous unmated >> 75346 0 21286 >> >> As you can see, another change is that this field is now a 3-level >> factor (levels are explained in ?readGAlignmentsList). >> >> Hope this helps, >> H. >> >> >>> >>> Liang >>> >>> >>> ________________________________________ >>> From: Hervé Pagès [hpages at fhcrc.org] >>> Sent: Sunday, March 30, 2014 2:54 PM >>> To: Niu, Liang (NIH/NIEHS) [E]; bioconductor at r-project.org >>> Subject: Re: A question about the function readGAlignmentPairs in GenomicRnages package >>> >>> On 03/30/2014 11:52 AM, Niu, Liang (NIH/NIEHS) [E] wrote: >>>> Sorry for the typo in the previous email. What I mean is "I do NEED (not NOT) the pairing information". >>> >>> I see. So yes, readGAlignmentsList() should do it. >>> >>> Cheers, >>> H. >>> >>>> >>>> Liang >>>> ________________________________________ >>>> From: Hervé Pagès [hpages at fhcrc.org] >>>> Sent: Sunday, March 30, 2014 2:51 PM >>>> To: Niu, Liang (NIH/NIEHS) [E]; bioconductor at r-project.org >>>> Subject: Re: A question about the function readGAlignmentPairs in GenomicRnages package >>>> >>>> On 03/30/2014 11:31 AM, Niu, Liang (NIH/NIEHS) [E] wrote: >>>>> Dear Herve, >>>>> >>>>> Thanks for your suggestion. I do not the paring information in the downstream analysis, therefore, readGAlignmentsList() is good. >>>> >>>> Maybe I was not clear enough but if you don't need the pairing, then >>>> you can just use readGAlignments(). readGAlignmentsList() will pair >>>> everything that can be paired and this has a cost (in terms of >>>> performance and memory usage). By using readGAlignments() you avoid >>>> that cost and you also end up with an object that is a little bit >>>> simpler to manipulate (i.e. a GAlignments instead of GAlignmentsList >>>> object). >>>> >>>> Hope that makes sense, >>>> H. >>>> >>>>> >>>>> Liang >>>>> ________________________________________ >>>>> From: Hervé Pagès [hpages at fhcrc.org] >>>>> Sent: Sunday, March 30, 2014 2:12 PM >>>>> To: Niu, Liang (NIH/NIEHS) [E] >>>>> Cc: bioconductor at r-project.org >>>>> Subject: Re: A question about the function readGAlignmentPairs in GenomicRnages package >>>>> >>>>> Hi Liang, >>>>> >>>>> Hope you don't mind that I'm cc'ing the Bioconductor mailing list, so >>>>> other can give suggestions. >>>>> >>>>> On 03/30/2014 09:32 AM, Niu, Liang (NIH/NIEHS) [E] wrote: >>>>>> Dear Mr. Pages, >>>>>> >>>>>> This is Liang Niu, a research fellow at the National Institute of Environmental Health Sciences. >>>>>> >>>>>> I am using R to read .bam files for chromatin interaction data sets. Such data sets contains alignments for paired-end reads from ChIA-PET experiment, thus it has pairs in which two reads on different chromosomes and/or on same strand, and those pairs are valid pairs. I want to use your function readGAlignmentPairs in GenomicRnages package to read the pairs, but the manual says that it will discard those pairs. Do you have any suggestion? >>>>> >>>>> You could use readGAlignmentsList() instead of readGAlignmentPairs(). >>>>> readGAlignmentsList() will keep these "discordant pairs". It will also >>>>> keep the reads that cannot be paired. >>>>> Note that, depending on what you will do downstream, you don't >>>>> necessarily need to pair the reads (e.g. if you're going to compute >>>>> the read coverage). In that case you can just load the reads with >>>>> readGAlignments(). >>>>> >>>>> Cheers, >>>>> H. >>>>> >>>>>> >>>>>> Best, >>>>>> Liang >>>>>> >>>>> >>>>> -- >>>>> Hervé Pagès >>>>> >>>>> Program in Computational Biology >>>>> Division of Public Health Sciences >>>>> Fred Hutchinson Cancer Research Center >>>>> 1100 Fairview Ave. N, M1-B514 >>>>> P.O. Box 19024 >>>>> Seattle, WA 98109-1024 >>>>> >>>>> E-mail: hpages at fhcrc.org >>>>> Phone: (206) 667-5791 >>>>> Fax: (206) 667-1319 >>>>> >>>> >>>> -- >>>> Hervé Pagès >>>> >>>> Program in Computational Biology >>>> Division of Public Health Sciences >>>> Fred Hutchinson Cancer Research Center >>>> 1100 Fairview Ave. N, M1-B514 >>>> P.O. Box 19024 >>>> Seattle, WA 98109-1024 >>>> >>>> E-mail: hpages at fhcrc.org >>>> Phone: (206) 667-5791 >>>> Fax: (206) 667-1319 >>>> >>> >>> -- >>> Hervé Pagès >>> >>> Program in Computational Biology >>> Division of Public Health Sciences >>> Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N, M1-B514 >>> P.O. Box 19024 >>> Seattle, WA 98109-1024 >>> >>> E-mail: hpages at fhcrc.org >>> Phone: (206) 667-5791 >>> Fax: (206) 667-1319 >>> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpages at fhcrc.org >> Phone: (206) 667-5791 >> Fax: (206) 667-1319 >> > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENT
0
Entering edit mode
Thanks! ________________________________________ From: Hervé Pagès [hpages@fhcrc.org] Sent: Wednesday, April 02, 2014 1:27 AM To: Niu, Liang (NIH/NIEHS) [E]; bioconductor at r-project.org Subject: Re: A question about the function readGAlignmentPairs in GenomicRnages package Hi Liang, On 04/01/2014 12:12 PM, Niu, Liang (NIH/NIEHS) [E] wrote: > Dear Herve, > > I found the reason for the mate_status error: it is because that the read 1 in each pair were reverse complemented before the alignment. > > Now I have a question: when I use readGAlignmentsList() to read in the pairs, how can I preserve the order of the two reads, i.e., keep read 1 as the first alignment and read 2 as the second alignment in a element (with two alignments) of the alignments list? is it the default order of each element of the list? Yes. Even though I cannot find this information in the man page for readGAlignmentsList(), my understanding is that the 1st mate will always be 1st in the GAlignmentsList list elements that have mate_status set to "mated", and that the 2nd mate will always be 2nd. Val please correct me if I'm wrong. > > Also, if I use grglist() to convert the above GAlignmentsList into a GRangesList, do I need to set the order.as.in.query=TRUE for the above purpose? I didn't know the "grglist" method for GAlignmentsList objects supported this argument (and it doesn't seem to be documented). But yes, if you care about the order of the ranges within each list element of the returned GRangesList, you should use 'order.as.in.query=TRUE'. At least that's what you should do on a GAlignmentPairs object. I'm just assuming it's the same for a GAlignmentsList object but I don't know how this translates for list elements with mate_status other than "mated" (you will need to check if that matters for your case). However, note that if you're going to find or count overlaps (with findOverlaps(), countOverlaps(), or summarizeOverlaps()), how you set 'order.as.in.query' is irrelevant. The only situation I know where this matters is when you want to compute the "overlap encodings" (with encodeOverlaps()), which is actually what motivated the addition of the 'order.as.in.query' arg. HTH, H. > > Best, > Liang > > > ________________________________________ > From: Hervé Pagès [hpages at fhcrc.org] > Sent: Sunday, March 30, 2014 4:45 PM > To: Niu, Liang (NIH/NIEHS) [E]; bioconductor at r-project.org > Subject: Re: A question about the function readGAlignmentPairs in GenomicRnages package > > On 03/30/2014 01:31 PM, Niu, Liang (NIH/NIEHS) [E] wrote: >> Dear Herve, >> >> Here is the code and output: >> >>> bam.path<-"test.bam" >>> bam.file<-BamFile(bam.path,asMates=TRUE) >>> >>> param<-ScanBamParam(flag=scanBamFlag(isPaired=TRUE,hasUnmappedMate =FALSE,isUnmappedQuery=FALSE,isNotPassingQualityControls=FALSE,isDupli cate=FALSE,isNotPrimaryRead=FALSE)) >>> alignment<-readGAlignmentsList(bam.file,param=param) >>> length(alignment) >> [1] 18041910 >>> table(mcols(unlist(alignment, use.names=FALSE))$mates[end(PartitioningByEnd(alignment))]) >> >> FALSE >> 18041910 >> >> What is the problem? > > Whaoo. No idea :-/ I'm on week-end right now and have to turn off > the computer due to other obligations, sorry. I'll try to get to this > later. In the mean time it would be great if you could install R-3.1.0 > + BioC 2.14 (you will need the GenomicAlignments package) and try > this again. > > Thanks, > H. > >> >> Thanks! >> >> Liang >> >> ________________________________________ >> From: Hervé Pagès [hpages at fhcrc.org] >> Sent: Sunday, March 30, 2014 4:13 PM >> To: Niu, Liang (NIH/NIEHS) [E]; bioconductor at r-project.org >> Subject: Re: A question about the function readGAlignmentPairs in GenomicRnages package >> >> Dear Liang, >> >> Please keep the communication on the mailing list (by using >> the "Reply All" button). >> >> On 03/30/2014 12:13 PM, Niu, Liang (NIH/NIEHS) [E] wrote: >>> Dear Herve, >>> >>> I use the following command to read in the alignments: >>> >>> bam.path<-"test.bam" >>> bam.file<-BamFile(bam.path, yieldSize=1E6, asMates=TRUE) >>> >>> param<-ScanBamParam(flag=scanBamFlag(isPaired=TRUE,hasUnmappedMate =FALSE,isUnmappedQuery=FALSE,isNotPassingQualityControls=FALSE,isDupli cate=FALSE,isNotPrimaryRead=FALSE)) >>> alignment<-readGAlignmentsList(bam.file,param=param) >>> >>> However, each element ((a pair of alignments)) of the GAlignmentsList "alignment" has meta field Mates as "FALSE" for both reads. Is it a problem? >> >> I'm surprised that *each* list element in your GAlignmentsList object >> 'alignment' looks like this. How do you know? Why not show us the >> object? >> >> My understanding is that even though your BAM file contains paired- end >> reads, when you read it with readGAlignmentsList(), you end up with a >> GAlignmentsList object where some fraction of the list elements have >> the "mates" field set to FALSE. The reads in such list element have the >> same QNAME but are not mates. Note that all list elements with the >> "mates" field set to TRUE are guaranteed to contain exactly 2 reads. >> But there is no such expectation for list elements with the "mates" >> field set to FALSE: they can contain 1, 2, 3, or more reads. >> >> So if *each* list element in your GAlignmentsList object >> 'alignment' really has the "mates" field set to FALSE, that >> means none of the reads in your file could be mated. That could >> actually happen if your data was not paired-end and if you didn't >> use isPaired=TRUE but that doesn't seem to be the case here. >> >> Here is how you can summarize how many list elements have the >> "mates" field set to FALSE and how many have it set to TRUE: >> >> table(mcols(unlist(alignment, >> use.names=FALSE))$mates[end(PartitioningByEnd(alignment))]) >> >> Finally note that in BioC devel, the "mates" field has been replaced >> by the "mate_status" field and that this field is now an attribute of >> the list elements rather than of the individual reads. This means that >> it's a top-level metadata column (i.e. directly accessible with >> mcols(alignment)$mate_status) instead of an inner metadata column >> (which are more complicated to access). So it's much easier now to get >> the above count: >> >> table(mcols(alignment)$mate_status) >> >> This will give you something like: >> >> > table(mcols(galist1)$mate_status) >> >> mated ambiguous unmated >> 75346 0 21286 >> >> As you can see, another change is that this field is now a 3-level >> factor (levels are explained in ?readGAlignmentsList). >> >> Hope this helps, >> H. >> >> >>> >>> Liang >>> >>> >>> ________________________________________ >>> From: Hervé Pagès [hpages at fhcrc.org] >>> Sent: Sunday, March 30, 2014 2:54 PM >>> To: Niu, Liang (NIH/NIEHS) [E]; bioconductor at r-project.org >>> Subject: Re: A question about the function readGAlignmentPairs in GenomicRnages package >>> >>> On 03/30/2014 11:52 AM, Niu, Liang (NIH/NIEHS) [E] wrote: >>>> Sorry for the typo in the previous email. What I mean is "I do NEED (not NOT) the pairing information". >>> >>> I see. So yes, readGAlignmentsList() should do it. >>> >>> Cheers, >>> H. >>> >>>> >>>> Liang >>>> ________________________________________ >>>> From: Hervé Pagès [hpages at fhcrc.org] >>>> Sent: Sunday, March 30, 2014 2:51 PM >>>> To: Niu, Liang (NIH/NIEHS) [E]; bioconductor at r-project.org >>>> Subject: Re: A question about the function readGAlignmentPairs in GenomicRnages package >>>> >>>> On 03/30/2014 11:31 AM, Niu, Liang (NIH/NIEHS) [E] wrote: >>>>> Dear Herve, >>>>> >>>>> Thanks for your suggestion. I do not the paring information in the downstream analysis, therefore, readGAlignmentsList() is good. >>>> >>>> Maybe I was not clear enough but if you don't need the pairing, then >>>> you can just use readGAlignments(). readGAlignmentsList() will pair >>>> everything that can be paired and this has a cost (in terms of >>>> performance and memory usage). By using readGAlignments() you avoid >>>> that cost and you also end up with an object that is a little bit >>>> simpler to manipulate (i.e. a GAlignments instead of GAlignmentsList >>>> object). >>>> >>>> Hope that makes sense, >>>> H. >>>> >>>>> >>>>> Liang >>>>> ________________________________________ >>>>> From: Hervé Pagès [hpages at fhcrc.org] >>>>> Sent: Sunday, March 30, 2014 2:12 PM >>>>> To: Niu, Liang (NIH/NIEHS) [E] >>>>> Cc: bioconductor at r-project.org >>>>> Subject: Re: A question about the function readGAlignmentPairs in GenomicRnages package >>>>> >>>>> Hi Liang, >>>>> >>>>> Hope you don't mind that I'm cc'ing the Bioconductor mailing list, so >>>>> other can give suggestions. >>>>> >>>>> On 03/30/2014 09:32 AM, Niu, Liang (NIH/NIEHS) [E] wrote: >>>>>> Dear Mr. Pages, >>>>>> >>>>>> This is Liang Niu, a research fellow at the National Institute of Environmental Health Sciences. >>>>>> >>>>>> I am using R to read .bam files for chromatin interaction data sets. Such data sets contains alignments for paired-end reads from ChIA-PET experiment, thus it has pairs in which two reads on different chromosomes and/or on same strand, and those pairs are valid pairs. I want to use your function readGAlignmentPairs in GenomicRnages package to read the pairs, but the manual says that it will discard those pairs. Do you have any suggestion? >>>>> >>>>> You could use readGAlignmentsList() instead of readGAlignmentPairs(). >>>>> readGAlignmentsList() will keep these "discordant pairs". It will also >>>>> keep the reads that cannot be paired. >>>>> Note that, depending on what you will do downstream, you don't >>>>> necessarily need to pair the reads (e.g. if you're going to compute >>>>> the read coverage). In that case you can just load the reads with >>>>> readGAlignments(). >>>>> >>>>> Cheers, >>>>> H. >>>>> >>>>>> >>>>>> Best, >>>>>> Liang >>>>>> >>>>> >>>>> -- >>>>> Hervé Pagès >>>>> >>>>> Program in Computational Biology >>>>> Division of Public Health Sciences >>>>> Fred Hutchinson Cancer Research Center >>>>> 1100 Fairview Ave. N, M1-B514 >>>>> P.O. Box 19024 >>>>> Seattle, WA 98109-1024 >>>>> >>>>> E-mail: hpages at fhcrc.org >>>>> Phone: (206) 667-5791 >>>>> Fax: (206) 667-1319 >>>>> >>>> >>>> -- >>>> Hervé Pagès >>>> >>>> Program in Computational Biology >>>> Division of Public Health Sciences >>>> Fred Hutchinson Cancer Research Center >>>> 1100 Fairview Ave. N, M1-B514 >>>> P.O. Box 19024 >>>> Seattle, WA 98109-1024 >>>> >>>> E-mail: hpages at fhcrc.org >>>> Phone: (206) 667-5791 >>>> Fax: (206) 667-1319 >>>> >>> >>> -- >>> Hervé Pagès >>> >>> Program in Computational Biology >>> Division of Public Health Sciences >>> Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N, M1-B514 >>> P.O. Box 19024 >>> Seattle, WA 98109-1024 >>> >>> E-mail: hpages at fhcrc.org >>> Phone: (206) 667-5791 >>> Fax: (206) 667-1319 >>> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpages at fhcrc.org >> Phone: (206) 667-5791 >> Fax: (206) 667-1319 >> > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY
0
Entering edit mode
Hi Herve, I used readGAlignmentsList() to read in pairs with reads possibly on different chromosomes: > bam.path<-"test.bam" > bam.file<-BamFile(bam.path,asMates=TRUE) > param<-ScanBamParam(flag=scanBamFlag(isPaired=TRUE,hasUnmappedMate=F ALSE,isUnmappedQuery=FALSE,isNotPassingQualityControls=FALSE,isDuplica te=FALSE,isNotPrimaryRead=FALSE)) > alignment<-readGAlignmentsList(bam.file,param=param) Then I also read in the pairs in the usual way: alignment.pair<-readGAlignmentPairs(bam.file,param=param) The two objects are different in length: > length(alignment) [1] 18041910 > length(alignment.pair) [1] 423127 However, I found the elements are not consistent. For example, the first element of alignment.pair is: > alignment.pair[[1]] GAlignments with 2 alignments and 0 metadata columns: seqnames strand cigar qwidth start end width <rle> <rle> <character> <integer> <integer> <integer> <integer> [1] chr2 - 20M 20 136479847 136479866 20 [2] chr2 + 20M 20 97483596 97483615 20 ngap <integer> [1] 0 [2] 0 And there are 4 corresponding elements in alignment (index is the index of the corresponding 4 elements): > alignment[index] GAlignmentsList of length 4: $8 GAlignments with 2 alignments and 1 metadata column: seqnames strand cigar qwidth start end width ngap | mates [1] chr2 + 20M 20 97483596 97483615 20 0 | FALSE [2] chr2 - 20M 20 136479847 136479866 20 0 | FALSE $4481122 GAlignments with 2 alignments and 1 metadata column: seqnames strand cigar qwidth start end width ngap | mates [1] chr2 + 20M 20 97483596 97483615 20 0 | FALSE [2] chr2 - 20M 20 136479847 136479866 20 0 | FALSE $11430011 GAlignments with 2 alignments and 1 metadata column: seqnames strand cigar qwidth start end width ngap | mates [1] chr2 + 20M 20 97483596 97483615 20 0 | FALSE [2] chr2 - 20M 20 136479847 136479866 20 0 | FALSE ... <1 more element> --- seqlengths: chr10 chr11 chr12 chr13 ... chrM chrX chrY 135534747 135006516 133851895 115169878 ... 16571 155270560 59373566 Notice that we have 4 identical elements in alignment and the order of the two reads are reversed. Why? How to get unique records? Since the order of the reads in each element is important in the project, how can I keep the original order (read 1 followed by read 2)? Thanks! Liang ________________________________________ From: Hervé Pagès [hpages@fhcrc.org] Sent: Wednesday, April 02, 2014 1:27 AM To: Niu, Liang (NIH/NIEHS) [E]; bioconductor at r-project.org Subject: Re: A question about the function readGAlignmentPairs in GenomicRnages package Hi Liang, On 04/01/2014 12:12 PM, Niu, Liang (NIH/NIEHS) [E] wrote: > Dear Herve, > > I found the reason for the mate_status error: it is because that the read 1 in each pair were reverse complemented before the alignment. > > Now I have a question: when I use readGAlignmentsList() to read in the pairs, how can I preserve the order of the two reads, i.e., keep read 1 as the first alignment and read 2 as the second alignment in a element (with two alignments) of the alignments list? is it the default order of each element of the list? Yes. Even though I cannot find this information in the man page for readGAlignmentsList(), my understanding is that the 1st mate will always be 1st in the GAlignmentsList list elements that have mate_status set to "mated", and that the 2nd mate will always be 2nd. Val please correct me if I'm wrong. > > Also, if I use grglist() to convert the above GAlignmentsList into a GRangesList, do I need to set the order.as.in.query=TRUE for the above purpose? I didn't know the "grglist" method for GAlignmentsList objects supported this argument (and it doesn't seem to be documented). But yes, if you care about the order of the ranges within each list element of the returned GRangesList, you should use 'order.as.in.query=TRUE'. At least that's what you should do on a GAlignmentPairs object. I'm just assuming it's the same for a GAlignmentsList object but I don't know how this translates for list elements with mate_status other than "mated" (you will need to check if that matters for your case). However, note that if you're going to find or count overlaps (with findOverlaps(), countOverlaps(), or summarizeOverlaps()), how you set 'order.as.in.query' is irrelevant. The only situation I know where this matters is when you want to compute the "overlap encodings" (with encodeOverlaps()), which is actually what motivated the addition of the 'order.as.in.query' arg. HTH, H. > > Best, > Liang > > > ________________________________________ > From: Hervé Pagès [hpages at fhcrc.org] > Sent: Sunday, March 30, 2014 4:45 PM > To: Niu, Liang (NIH/NIEHS) [E]; bioconductor at r-project.org > Subject: Re: A question about the function readGAlignmentPairs in GenomicRnages package > > On 03/30/2014 01:31 PM, Niu, Liang (NIH/NIEHS) [E] wrote: >> Dear Herve, >> >> Here is the code and output: >> >>> bam.path<-"test.bam" >>> bam.file<-BamFile(bam.path,asMates=TRUE) >>> >>> param<-ScanBamParam(flag=scanBamFlag(isPaired=TRUE,hasUnmappedMate =FALSE,isUnmappedQuery=FALSE,isNotPassingQualityControls=FALSE,isDupli cate=FALSE,isNotPrimaryRead=FALSE)) >>> alignment<-readGAlignmentsList(bam.file,param=param) >>> length(alignment) >> [1] 18041910 >>> table(mcols(unlist(alignment, use.names=FALSE))$mates[end(PartitioningByEnd(alignment))]) >> >> FALSE >> 18041910 >> >> What is the problem? > > Whaoo. No idea :-/ I'm on week-end right now and have to turn off > the computer due to other obligations, sorry. I'll try to get to this > later. In the mean time it would be great if you could install R-3.1.0 > + BioC 2.14 (you will need the GenomicAlignments package) and try > this again. > > Thanks, > H. > >> >> Thanks! >> >> Liang >> >> ________________________________________ >> From: Hervé Pagès [hpages at fhcrc.org] >> Sent: Sunday, March 30, 2014 4:13 PM >> To: Niu, Liang (NIH/NIEHS) [E]; bioconductor at r-project.org >> Subject: Re: A question about the function readGAlignmentPairs in GenomicRnages package >> >> Dear Liang, >> >> Please keep the communication on the mailing list (by using >> the "Reply All" button). >> >> On 03/30/2014 12:13 PM, Niu, Liang (NIH/NIEHS) [E] wrote: >>> Dear Herve, >>> >>> I use the following command to read in the alignments: >>> >>> bam.path<-"test.bam" >>> bam.file<-BamFile(bam.path, yieldSize=1E6, asMates=TRUE) >>> >>> param<-ScanBamParam(flag=scanBamFlag(isPaired=TRUE,hasUnmappedMate =FALSE,isUnmappedQuery=FALSE,isNotPassingQualityControls=FALSE,isDupli cate=FALSE,isNotPrimaryRead=FALSE)) >>> alignment<-readGAlignmentsList(bam.file,param=param) >>> >>> However, each element ((a pair of alignments)) of the GAlignmentsList "alignment" has meta field Mates as "FALSE" for both reads. Is it a problem? >> >> I'm surprised that *each* list element in your GAlignmentsList object >> 'alignment' looks like this. How do you know? Why not show us the >> object? >> >> My understanding is that even though your BAM file contains paired- end >> reads, when you read it with readGAlignmentsList(), you end up with a >> GAlignmentsList object where some fraction of the list elements have >> the "mates" field set to FALSE. The reads in such list element have the >> same QNAME but are not mates. Note that all list elements with the >> "mates" field set to TRUE are guaranteed to contain exactly 2 reads. >> But there is no such expectation for list elements with the "mates" >> field set to FALSE: they can contain 1, 2, 3, or more reads. >> >> So if *each* list element in your GAlignmentsList object >> 'alignment' really has the "mates" field set to FALSE, that >> means none of the reads in your file could be mated. That could >> actually happen if your data was not paired-end and if you didn't >> use isPaired=TRUE but that doesn't seem to be the case here. >> >> Here is how you can summarize how many list elements have the >> "mates" field set to FALSE and how many have it set to TRUE: >> >> table(mcols(unlist(alignment, >> use.names=FALSE))$mates[end(PartitioningByEnd(alignment))]) >> >> Finally note that in BioC devel, the "mates" field has been replaced >> by the "mate_status" field and that this field is now an attribute of >> the list elements rather than of the individual reads. This means that >> it's a top-level metadata column (i.e. directly accessible with >> mcols(alignment)$mate_status) instead of an inner metadata column >> (which are more complicated to access). So it's much easier now to get >> the above count: >> >> table(mcols(alignment)$mate_status) >> >> This will give you something like: >> >> > table(mcols(galist1)$mate_status) >> >> mated ambiguous unmated >> 75346 0 21286 >> >> As you can see, another change is that this field is now a 3-level >> factor (levels are explained in ?readGAlignmentsList). >> >> Hope this helps, >> H. >> >> >>> >>> Liang >>> >>> >>> ________________________________________ >>> From: Hervé Pagès [hpages at fhcrc.org] >>> Sent: Sunday, March 30, 2014 2:54 PM >>> To: Niu, Liang (NIH/NIEHS) [E]; bioconductor at r-project.org >>> Subject: Re: A question about the function readGAlignmentPairs in GenomicRnages package >>> >>> On 03/30/2014 11:52 AM, Niu, Liang (NIH/NIEHS) [E] wrote: >>>> Sorry for the typo in the previous email. What I mean is "I do NEED (not NOT) the pairing information". >>> >>> I see. So yes, readGAlignmentsList() should do it. >>> >>> Cheers, >>> H. >>> >>>> >>>> Liang >>>> ________________________________________ >>>> From: Hervé Pagès [hpages at fhcrc.org] >>>> Sent: Sunday, March 30, 2014 2:51 PM >>>> To: Niu, Liang (NIH/NIEHS) [E]; bioconductor at r-project.org >>>> Subject: Re: A question about the function readGAlignmentPairs in GenomicRnages package >>>> >>>> On 03/30/2014 11:31 AM, Niu, Liang (NIH/NIEHS) [E] wrote: >>>>> Dear Herve, >>>>> >>>>> Thanks for your suggestion. I do not the paring information in the downstream analysis, therefore, readGAlignmentsList() is good. >>>> >>>> Maybe I was not clear enough but if you don't need the pairing, then >>>> you can just use readGAlignments(). readGAlignmentsList() will pair >>>> everything that can be paired and this has a cost (in terms of >>>> performance and memory usage). By using readGAlignments() you avoid >>>> that cost and you also end up with an object that is a little bit >>>> simpler to manipulate (i.e. a GAlignments instead of GAlignmentsList >>>> object). >>>> >>>> Hope that makes sense, >>>> H. >>>> >>>>> >>>>> Liang >>>>> ________________________________________ >>>>> From: Hervé Pagès [hpages at fhcrc.org] >>>>> Sent: Sunday, March 30, 2014 2:12 PM >>>>> To: Niu, Liang (NIH/NIEHS) [E] >>>>> Cc: bioconductor at r-project.org >>>>> Subject: Re: A question about the function readGAlignmentPairs in GenomicRnages package >>>>> >>>>> Hi Liang, >>>>> >>>>> Hope you don't mind that I'm cc'ing the Bioconductor mailing list, so >>>>> other can give suggestions. >>>>> >>>>> On 03/30/2014 09:32 AM, Niu, Liang (NIH/NIEHS) [E] wrote: >>>>>> Dear Mr. Pages, >>>>>> >>>>>> This is Liang Niu, a research fellow at the National Institute of Environmental Health Sciences. >>>>>> >>>>>> I am using R to read .bam files for chromatin interaction data sets. Such data sets contains alignments for paired-end reads from ChIA-PET experiment, thus it has pairs in which two reads on different chromosomes and/or on same strand, and those pairs are valid pairs. I want to use your function readGAlignmentPairs in GenomicRnages package to read the pairs, but the manual says that it will discard those pairs. Do you have any suggestion? >>>>> >>>>> You could use readGAlignmentsList() instead of readGAlignmentPairs(). >>>>> readGAlignmentsList() will keep these "discordant pairs". It will also >>>>> keep the reads that cannot be paired. >>>>> Note that, depending on what you will do downstream, you don't >>>>> necessarily need to pair the reads (e.g. if you're going to compute >>>>> the read coverage). In that case you can just load the reads with >>>>> readGAlignments(). >>>>> >>>>> Cheers, >>>>> H. >>>>> >>>>>> >>>>>> Best, >>>>>> Liang >>>>>> >>>>> >>>>> -- >>>>> Hervé Pagès >>>>> >>>>> Program in Computational Biology >>>>> Division of Public Health Sciences >>>>> Fred Hutchinson Cancer Research Center >>>>> 1100 Fairview Ave. N, M1-B514 >>>>> P.O. Box 19024 >>>>> Seattle, WA 98109-1024 >>>>> >>>>> E-mail: hpages at fhcrc.org >>>>> Phone: (206) 667-5791 >>>>> Fax: (206) 667-1319 >>>>> >>>> >>>> -- >>>> Hervé Pagès >>>> >>>> Program in Computational Biology >>>> Division of Public Health Sciences >>>> Fred Hutchinson Cancer Research Center >>>> 1100 Fairview Ave. N, M1-B514 >>>> P.O. Box 19024 >>>> Seattle, WA 98109-1024 >>>> >>>> E-mail: hpages at fhcrc.org >>>> Phone: (206) 667-5791 >>>> Fax: (206) 667-1319 >>>> >>> >>> -- >>> Hervé Pagès >>> >>> Program in Computational Biology >>> Division of Public Health Sciences >>> Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N, M1-B514 >>> P.O. Box 19024 >>> Seattle, WA 98109-1024 >>> >>> E-mail: hpages at fhcrc.org >>> Phone: (206) 667-5791 >>> Fax: (206) 667-1319 >>> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpages at fhcrc.org >> Phone: (206) 667-5791 >> Fax: (206) 667-1319 >> > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD REPLY
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.3 years ago
United States
Hi, On 04/01/14 22:27, Hervé Pagès wrote: > Hi Liang, > > On 04/01/2014 12:12 PM, Niu, Liang (NIH/NIEHS) [E] wrote: >> Dear Herve, >> >> I found the reason for the mate_status error: it is because that the >> read 1 in each pair were reverse complemented before the alignment. >> >> Now I have a question: when I use readGAlignmentsList() to read in the >> pairs, how can I preserve the order of the two reads, i.e., keep read >> 1 as the first alignment and read 2 as the second alignment in a >> element (with two alignments) of the alignments list? is it the >> default order of each element of the list? > > Yes. Even though I cannot find this information in the man page > for readGAlignmentsList(), my understanding is that the 1st mate > will always be 1st in the GAlignmentsList list elements that have > mate_status set to "mated", and that the 2nd mate will always be > 2nd. > > Val please correct me if I'm wrong. You are correct about the ordering of the pairs. (fyi, this occurs in add_to_complete() in Template.h) I've updated the man page in GenomicAlignments 0.99.37. Val > >> >> Also, if I use grglist() to convert the above GAlignmentsList into a >> GRangesList, do I need to set the order.as.in.query=TRUE for the above >> purpose? > > I didn't know the "grglist" method for GAlignmentsList objects supported > this argument (and it doesn't seem to be documented). But yes, if you > care about the order of the ranges within each list element of the > returned GRangesList, you should use 'order.as.in.query=TRUE'. At least > that's what you should do on a GAlignmentPairs object. I'm just assuming > it's the same for a GAlignmentsList object but I don't know how this > translates for list elements with mate_status other than "mated" (you > will need to check if that matters for your case). > However, note that if you're going to find or count overlaps (with > findOverlaps(), countOverlaps(), or summarizeOverlaps()), how you set > 'order.as.in.query' is irrelevant. The only situation I know where this > matters is when you want to compute the "overlap encodings" (with > encodeOverlaps()), which is actually what motivated the addition of the > 'order.as.in.query' arg. > > HTH, > H. > >> >> Best, >> Liang >> >> >> ________________________________________ >> From: Hervé Pagès [hpages at fhcrc.org] >> Sent: Sunday, March 30, 2014 4:45 PM >> To: Niu, Liang (NIH/NIEHS) [E]; bioconductor at r-project.org >> Subject: Re: A question about the function readGAlignmentPairs in >> GenomicRnages package >> >> On 03/30/2014 01:31 PM, Niu, Liang (NIH/NIEHS) [E] wrote: >>> Dear Herve, >>> >>> Here is the code and output: >>> >>>> bam.path<-"test.bam" >>>> bam.file<-BamFile(bam.path,asMates=TRUE) >>>> >>>> param<-ScanBamParam(flag=scanBamFlag(isPaired=TRUE,hasUnmappedMat e=FALSE,isUnmappedQuery=FALSE,isNotPassingQualityControls=FALSE,isDupl icate=FALSE,isNotPrimaryRead=FALSE)) >>>> >>>> alignment<-readGAlignmentsList(bam.file,param=param) >>>> length(alignment) >>> [1] 18041910 >>>> table(mcols(unlist(alignment, >>>> use.names=FALSE))$mates[end(PartitioningByEnd(alignment))]) >>> >>> FALSE >>> 18041910 >>> >>> What is the problem? >> >> Whaoo. No idea :-/ I'm on week-end right now and have to turn off >> the computer due to other obligations, sorry. I'll try to get to this >> later. In the mean time it would be great if you could install R-3.1.0 >> + BioC 2.14 (you will need the GenomicAlignments package) and try >> this again. >> >> Thanks, >> H. >> >>> >>> Thanks! >>> >>> Liang >>> >>> ________________________________________ >>> From: Hervé Pagès [hpages at fhcrc.org] >>> Sent: Sunday, March 30, 2014 4:13 PM >>> To: Niu, Liang (NIH/NIEHS) [E]; bioconductor at r-project.org >>> Subject: Re: A question about the function readGAlignmentPairs in >>> GenomicRnages package >>> >>> Dear Liang, >>> >>> Please keep the communication on the mailing list (by using >>> the "Reply All" button). >>> >>> On 03/30/2014 12:13 PM, Niu, Liang (NIH/NIEHS) [E] wrote: >>>> Dear Herve, >>>> >>>> I use the following command to read in the alignments: >>>> >>>> bam.path<-"test.bam" >>>> bam.file<-BamFile(bam.path, yieldSize=1E6, asMates=TRUE) >>>> >>>> param<-ScanBamParam(flag=scanBamFlag(isPaired=TRUE,hasUnmappedMat e=FALSE,isUnmappedQuery=FALSE,isNotPassingQualityControls=FALSE,isDupl icate=FALSE,isNotPrimaryRead=FALSE)) >>>> >>>> alignment<-readGAlignmentsList(bam.file,param=param) >>>> >>>> However, each element ((a pair of alignments)) of the >>>> GAlignmentsList "alignment" has meta field Mates as "FALSE" for >>>> both reads. Is it a problem? >>> >>> I'm surprised that *each* list element in your GAlignmentsList object >>> 'alignment' looks like this. How do you know? Why not show us the >>> object? >>> >>> My understanding is that even though your BAM file contains paired-end >>> reads, when you read it with readGAlignmentsList(), you end up with a >>> GAlignmentsList object where some fraction of the list elements have >>> the "mates" field set to FALSE. The reads in such list element have the >>> same QNAME but are not mates. Note that all list elements with the >>> "mates" field set to TRUE are guaranteed to contain exactly 2 reads. >>> But there is no such expectation for list elements with the "mates" >>> field set to FALSE: they can contain 1, 2, 3, or more reads. >>> >>> So if *each* list element in your GAlignmentsList object >>> 'alignment' really has the "mates" field set to FALSE, that >>> means none of the reads in your file could be mated. That could >>> actually happen if your data was not paired-end and if you didn't >>> use isPaired=TRUE but that doesn't seem to be the case here. >>> >>> Here is how you can summarize how many list elements have the >>> "mates" field set to FALSE and how many have it set to TRUE: >>> >>> table(mcols(unlist(alignment, >>> use.names=FALSE))$mates[end(PartitioningByEnd(alignment))]) >>> >>> Finally note that in BioC devel, the "mates" field has been replaced >>> by the "mate_status" field and that this field is now an attribute of >>> the list elements rather than of the individual reads. This means that >>> it's a top-level metadata column (i.e. directly accessible with >>> mcols(alignment)$mate_status) instead of an inner metadata column >>> (which are more complicated to access). So it's much easier now to get >>> the above count: >>> >>> table(mcols(alignment)$mate_status) >>> >>> This will give you something like: >>> >>> > table(mcols(galist1)$mate_status) >>> >>> mated ambiguous unmated >>> 75346 0 21286 >>> >>> As you can see, another change is that this field is now a 3-level >>> factor (levels are explained in ?readGAlignmentsList). >>> >>> Hope this helps, >>> H. >>> >>> >>>> >>>> Liang >>>> >>>> >>>> ________________________________________ >>>> From: Hervé Pagès [hpages at fhcrc.org] >>>> Sent: Sunday, March 30, 2014 2:54 PM >>>> To: Niu, Liang (NIH/NIEHS) [E]; bioconductor at r-project.org >>>> Subject: Re: A question about the function readGAlignmentPairs in >>>> GenomicRnages package >>>> >>>> On 03/30/2014 11:52 AM, Niu, Liang (NIH/NIEHS) [E] wrote: >>>>> Sorry for the typo in the previous email. What I mean is "I do NEED >>>>> (not NOT) the pairing information". >>>> >>>> I see. So yes, readGAlignmentsList() should do it. >>>> >>>> Cheers, >>>> H. >>>> >>>>> >>>>> Liang >>>>> ________________________________________ >>>>> From: Hervé Pagès [hpages at fhcrc.org] >>>>> Sent: Sunday, March 30, 2014 2:51 PM >>>>> To: Niu, Liang (NIH/NIEHS) [E]; bioconductor at r-project.org >>>>> Subject: Re: A question about the function readGAlignmentPairs in >>>>> GenomicRnages package >>>>> >>>>> On 03/30/2014 11:31 AM, Niu, Liang (NIH/NIEHS) [E] wrote: >>>>>> Dear Herve, >>>>>> >>>>>> Thanks for your suggestion. I do not the paring information in the >>>>>> downstream analysis, therefore, readGAlignmentsList() is good. >>>>> >>>>> Maybe I was not clear enough but if you don't need the pairing, then >>>>> you can just use readGAlignments(). readGAlignmentsList() will pair >>>>> everything that can be paired and this has a cost (in terms of >>>>> performance and memory usage). By using readGAlignments() you avoid >>>>> that cost and you also end up with an object that is a little bit >>>>> simpler to manipulate (i.e. a GAlignments instead of GAlignmentsList >>>>> object). >>>>> >>>>> Hope that makes sense, >>>>> H. >>>>> >>>>>> >>>>>> Liang >>>>>> ________________________________________ >>>>>> From: Hervé Pagès [hpages at fhcrc.org] >>>>>> Sent: Sunday, March 30, 2014 2:12 PM >>>>>> To: Niu, Liang (NIH/NIEHS) [E] >>>>>> Cc: bioconductor at r-project.org >>>>>> Subject: Re: A question about the function readGAlignmentPairs in >>>>>> GenomicRnages package >>>>>> >>>>>> Hi Liang, >>>>>> >>>>>> Hope you don't mind that I'm cc'ing the Bioconductor mailing list, so >>>>>> other can give suggestions. >>>>>> >>>>>> On 03/30/2014 09:32 AM, Niu, Liang (NIH/NIEHS) [E] wrote: >>>>>>> Dear Mr. Pages, >>>>>>> >>>>>>> This is Liang Niu, a research fellow at the National Institute of >>>>>>> Environmental Health Sciences. >>>>>>> >>>>>>> I am using R to read .bam files for chromatin interaction data >>>>>>> sets. Such data sets contains alignments for paired-end reads >>>>>>> from ChIA-PET experiment, thus it has pairs in which two reads on >>>>>>> different chromosomes and/or on same strand, and those pairs are >>>>>>> valid pairs. I want to use your function readGAlignmentPairs in >>>>>>> GenomicRnages package to read the pairs, but the manual says >>>>>>> that it will discard those pairs. Do you have any suggestion? >>>>>> >>>>>> You could use readGAlignmentsList() instead of readGAlignmentPairs(). >>>>>> readGAlignmentsList() will keep these "discordant pairs". It will >>>>>> also >>>>>> keep the reads that cannot be paired. >>>>>> Note that, depending on what you will do downstream, you don't >>>>>> necessarily need to pair the reads (e.g. if you're going to compute >>>>>> the read coverage). In that case you can just load the reads with >>>>>> readGAlignments(). >>>>>> >>>>>> Cheers, >>>>>> H. >>>>>> >>>>>>> >>>>>>> Best, >>>>>>> Liang >>>>>>> >>>>>> >>>>>> -- >>>>>> Hervé Pagès >>>>>> >>>>>> Program in Computational Biology >>>>>> Division of Public Health Sciences >>>>>> Fred Hutchinson Cancer Research Center >>>>>> 1100 Fairview Ave. N, M1-B514 >>>>>> P.O. Box 19024 >>>>>> Seattle, WA 98109-1024 >>>>>> >>>>>> E-mail: hpages at fhcrc.org >>>>>> Phone: (206) 667-5791 >>>>>> Fax: (206) 667-1319 >>>>>> >>>>> >>>>> -- >>>>> Hervé Pagès >>>>> >>>>> Program in Computational Biology >>>>> Division of Public Health Sciences >>>>> Fred Hutchinson Cancer Research Center >>>>> 1100 Fairview Ave. N, M1-B514 >>>>> P.O. Box 19024 >>>>> Seattle, WA 98109-1024 >>>>> >>>>> E-mail: hpages at fhcrc.org >>>>> Phone: (206) 667-5791 >>>>> Fax: (206) 667-1319 >>>>> >>>> >>>> -- >>>> Hervé Pagès >>>> >>>> Program in Computational Biology >>>> Division of Public Health Sciences >>>> Fred Hutchinson Cancer Research Center >>>> 1100 Fairview Ave. N, M1-B514 >>>> P.O. Box 19024 >>>> Seattle, WA 98109-1024 >>>> >>>> E-mail: hpages at fhcrc.org >>>> Phone: (206) 667-5791 >>>> Fax: (206) 667-1319 >>>> >>> >>> -- >>> Hervé Pagès >>> >>> Program in Computational Biology >>> Division of Public Health Sciences >>> Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N, M1-B514 >>> P.O. Box 19024 >>> Seattle, WA 98109-1024 >>> >>> E-mail: hpages at fhcrc.org >>> Phone: (206) 667-5791 >>> Fax: (206) 667-1319 >>> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpages at fhcrc.org >> Phone: (206) 667-5791 >> Fax: (206) 667-1319 >> > -- Valerie Obenchain Program in Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B155 Seattle, WA 98109-1024 E-mail: vobencha at fhcrc.org Phone: (206) 667-3158
ADD COMMENT
0
Entering edit mode
@herve-pages-1542
Last seen 13 hours ago
Seattle, WA, United States
Hi Liang, On 04/04/2014 11:22 AM, Niu, Liang (NIH/NIEHS) [E] wrote: > Hi Herve, > > I used readGAlignmentsList() to read in pairs with reads possibly on different chromosomes: > >> bam.path<-"test.bam" >> bam.file<-BamFile(bam.path,asMates=TRUE) > >> param<-ScanBamParam(flag=scanBamFlag(isPaired=TRUE,hasUnmappedMate= FALSE,isUnmappedQuery=FALSE,isNotPassingQualityControls=FALSE,isDuplic ate=FALSE,isNotPrimaryRead=FALSE)) >> alignment<-readGAlignmentsList(bam.file,param=param) > > Then I also read in the pairs in the usual way: > > alignment.pair<-readGAlignmentPairs(bam.file,param=param) > > The two objects are different in length: > >> length(alignment) > [1] 18041910 >> length(alignment.pair) > [1] 423127 > > However, I found the elements are not consistent. For example, the first element of alignment.pair is: > >> alignment.pair[[1]] > GAlignments with 2 alignments and 0 metadata columns: > seqnames strand cigar qwidth start end width > <rle> <rle> <character> <integer> <integer> <integer> <integer> > [1] chr2 - 20M 20 136479847 136479866 20 > [2] chr2 + 20M 20 97483596 97483615 20 > ngap > <integer> > [1] 0 > [2] 0 > > > And there are 4 corresponding elements in alignment (index is the index of the corresponding 4 elements): > >> alignment[index] > GAlignmentsList of length 4: > $8 > GAlignments with 2 alignments and 1 metadata column: > seqnames strand cigar qwidth start end width ngap | mates > [1] chr2 + 20M 20 97483596 97483615 20 0 | FALSE > [2] chr2 - 20M 20 136479847 136479866 20 0 | FALSE > > $4481122 > GAlignments with 2 alignments and 1 metadata column: > seqnames strand cigar qwidth start end width ngap | mates > [1] chr2 + 20M 20 97483596 97483615 20 0 | FALSE > [2] chr2 - 20M 20 136479847 136479866 20 0 | FALSE > > $11430011 > GAlignments with 2 alignments and 1 metadata column: > seqnames strand cigar qwidth start end width ngap | mates > [1] chr2 + 20M 20 97483596 97483615 20 0 | FALSE > [2] chr2 - 20M 20 136479847 136479866 20 0 | FALSE > > ... > <1 more element> > --- > seqlengths: > chr10 chr11 chr12 chr13 ... chrM chrX chrY > 135534747 135006516 133851895 115169878 ... 16571 155270560 59373566 > > Notice that we have 4 identical elements in alignment and the order of the two reads are reversed. Why? A few things: - Maybe these alignments don't correspond to the same reads? If reads have the same sequences, they'll align to the same positions. To see the read ids (stored in the QNAME field), use use.names=TRUE when you do readGAlignmentsList() or readGAlignmentPairs(). - I see 'mates' is FALSE here so these alignments are not mated. Val has confirmed the order of the pairs when mate_status is "mated". I don't know if you can make the same assumption for alignments that are not mated (these groups are not even guaranteed to be of size 2). - Looks like you're still using BioC 2.13. Note that the pairing code has been completely reworked in Bioc-devel and Val confirmed the order of the pairs for the new code. We're so close to the next release now that I would highly recommend you stick to Bioc-devel (will be officially released as BioC 2.14 in about 10 days). Also note that the day we release Bioc 2.14 we'll also stop supporting BioC 2.13. HTH, H. > How to get unique records? Since the order of the reads in each element is important in the project, how can I keep the original order (read 1 followed by read 2)? > > Thanks! > > Liang > ________________________________________ > From: Hervé Pagès [hpages at fhcrc.org] > Sent: Wednesday, April 02, 2014 1:27 AM > To: Niu, Liang (NIH/NIEHS) [E]; bioconductor at r-project.org > Subject: Re: A question about the function readGAlignmentPairs in GenomicRnages package > > Hi Liang, > > On 04/01/2014 12:12 PM, Niu, Liang (NIH/NIEHS) [E] wrote: >> Dear Herve, >> >> I found the reason for the mate_status error: it is because that the read 1 in each pair were reverse complemented before the alignment. >> >> Now I have a question: when I use readGAlignmentsList() to read in the pairs, how can I preserve the order of the two reads, i.e., keep read 1 as the first alignment and read 2 as the second alignment in a element (with two alignments) of the alignments list? is it the default order of each element of the list? > > Yes. Even though I cannot find this information in the man page > for readGAlignmentsList(), my understanding is that the 1st mate > will always be 1st in the GAlignmentsList list elements that have > mate_status set to "mated", and that the 2nd mate will always be > 2nd. > > Val please correct me if I'm wrong. > >> >> Also, if I use grglist() to convert the above GAlignmentsList into a GRangesList, do I need to set the order.as.in.query=TRUE for the above purpose? > > I didn't know the "grglist" method for GAlignmentsList objects supported > this argument (and it doesn't seem to be documented). But yes, if you > care about the order of the ranges within each list element of the > returned GRangesList, you should use 'order.as.in.query=TRUE'. At least > that's what you should do on a GAlignmentPairs object. I'm just assuming > it's the same for a GAlignmentsList object but I don't know how this > translates for list elements with mate_status other than "mated" (you > will need to check if that matters for your case). > However, note that if you're going to find or count overlaps (with > findOverlaps(), countOverlaps(), or summarizeOverlaps()), how you set > 'order.as.in.query' is irrelevant. The only situation I know where this > matters is when you want to compute the "overlap encodings" (with > encodeOverlaps()), which is actually what motivated the addition of the > 'order.as.in.query' arg. > > HTH, > H. > >> >> Best, >> Liang >> >> >> ________________________________________ >> From: Hervé Pagès [hpages at fhcrc.org] >> Sent: Sunday, March 30, 2014 4:45 PM >> To: Niu, Liang (NIH/NIEHS) [E]; bioconductor at r-project.org >> Subject: Re: A question about the function readGAlignmentPairs in GenomicRnages package >> >> On 03/30/2014 01:31 PM, Niu, Liang (NIH/NIEHS) [E] wrote: >>> Dear Herve, >>> >>> Here is the code and output: >>> >>>> bam.path<-"test.bam" >>>> bam.file<-BamFile(bam.path,asMates=TRUE) >>>> >>>> param<-ScanBamParam(flag=scanBamFlag(isPaired=TRUE,hasUnmappedMat e=FALSE,isUnmappedQuery=FALSE,isNotPassingQualityControls=FALSE,isDupl icate=FALSE,isNotPrimaryRead=FALSE)) >>>> alignment<-readGAlignmentsList(bam.file,param=param) >>>> length(alignment) >>> [1] 18041910 >>>> table(mcols(unlist(alignment, use.names=FALSE))$mates[end(PartitioningByEnd(alignment))]) >>> >>> FALSE >>> 18041910 >>> >>> What is the problem? >> >> Whaoo. No idea :-/ I'm on week-end right now and have to turn off >> the computer due to other obligations, sorry. I'll try to get to this >> later. In the mean time it would be great if you could install R-3.1.0 >> + BioC 2.14 (you will need the GenomicAlignments package) and try >> this again. >> >> Thanks, >> H. >> >>> >>> Thanks! >>> >>> Liang >>> >>> ________________________________________ >>> From: Hervé Pagès [hpages at fhcrc.org] >>> Sent: Sunday, March 30, 2014 4:13 PM >>> To: Niu, Liang (NIH/NIEHS) [E]; bioconductor at r-project.org >>> Subject: Re: A question about the function readGAlignmentPairs in GenomicRnages package >>> >>> Dear Liang, >>> >>> Please keep the communication on the mailing list (by using >>> the "Reply All" button). >>> >>> On 03/30/2014 12:13 PM, Niu, Liang (NIH/NIEHS) [E] wrote: >>>> Dear Herve, >>>> >>>> I use the following command to read in the alignments: >>>> >>>> bam.path<-"test.bam" >>>> bam.file<-BamFile(bam.path, yieldSize=1E6, asMates=TRUE) >>>> >>>> param<-ScanBamParam(flag=scanBamFlag(isPaired=TRUE,hasUnmappedMat e=FALSE,isUnmappedQuery=FALSE,isNotPassingQualityControls=FALSE,isDupl icate=FALSE,isNotPrimaryRead=FALSE)) >>>> alignment<-readGAlignmentsList(bam.file,param=param) >>>> >>>> However, each element ((a pair of alignments)) of the GAlignmentsList "alignment" has meta field Mates as "FALSE" for both reads. Is it a problem? >>> >>> I'm surprised that *each* list element in your GAlignmentsList object >>> 'alignment' looks like this. How do you know? Why not show us the >>> object? >>> >>> My understanding is that even though your BAM file contains paired-end >>> reads, when you read it with readGAlignmentsList(), you end up with a >>> GAlignmentsList object where some fraction of the list elements have >>> the "mates" field set to FALSE. The reads in such list element have the >>> same QNAME but are not mates. Note that all list elements with the >>> "mates" field set to TRUE are guaranteed to contain exactly 2 reads. >>> But there is no such expectation for list elements with the "mates" >>> field set to FALSE: they can contain 1, 2, 3, or more reads. >>> >>> So if *each* list element in your GAlignmentsList object >>> 'alignment' really has the "mates" field set to FALSE, that >>> means none of the reads in your file could be mated. That could >>> actually happen if your data was not paired-end and if you didn't >>> use isPaired=TRUE but that doesn't seem to be the case here. >>> >>> Here is how you can summarize how many list elements have the >>> "mates" field set to FALSE and how many have it set to TRUE: >>> >>> table(mcols(unlist(alignment, >>> use.names=FALSE))$mates[end(PartitioningByEnd(alignment))]) >>> >>> Finally note that in BioC devel, the "mates" field has been replaced >>> by the "mate_status" field and that this field is now an attribute of >>> the list elements rather than of the individual reads. This means that >>> it's a top-level metadata column (i.e. directly accessible with >>> mcols(alignment)$mate_status) instead of an inner metadata column >>> (which are more complicated to access). So it's much easier now to get >>> the above count: >>> >>> table(mcols(alignment)$mate_status) >>> >>> This will give you something like: >>> >>> > table(mcols(galist1)$mate_status) >>> >>> mated ambiguous unmated >>> 75346 0 21286 >>> >>> As you can see, another change is that this field is now a 3-level >>> factor (levels are explained in ?readGAlignmentsList). >>> >>> Hope this helps, >>> H. >>> >>> >>>> >>>> Liang >>>> >>>> >>>> ________________________________________ >>>> From: Hervé Pagès [hpages at fhcrc.org] >>>> Sent: Sunday, March 30, 2014 2:54 PM >>>> To: Niu, Liang (NIH/NIEHS) [E]; bioconductor at r-project.org >>>> Subject: Re: A question about the function readGAlignmentPairs in GenomicRnages package >>>> >>>> On 03/30/2014 11:52 AM, Niu, Liang (NIH/NIEHS) [E] wrote: >>>>> Sorry for the typo in the previous email. What I mean is "I do NEED (not NOT) the pairing information". >>>> >>>> I see. So yes, readGAlignmentsList() should do it. >>>> >>>> Cheers, >>>> H. >>>> >>>>> >>>>> Liang >>>>> ________________________________________ >>>>> From: Hervé Pagès [hpages at fhcrc.org] >>>>> Sent: Sunday, March 30, 2014 2:51 PM >>>>> To: Niu, Liang (NIH/NIEHS) [E]; bioconductor at r-project.org >>>>> Subject: Re: A question about the function readGAlignmentPairs in GenomicRnages package >>>>> >>>>> On 03/30/2014 11:31 AM, Niu, Liang (NIH/NIEHS) [E] wrote: >>>>>> Dear Herve, >>>>>> >>>>>> Thanks for your suggestion. I do not the paring information in the downstream analysis, therefore, readGAlignmentsList() is good. >>>>> >>>>> Maybe I was not clear enough but if you don't need the pairing, then >>>>> you can just use readGAlignments(). readGAlignmentsList() will pair >>>>> everything that can be paired and this has a cost (in terms of >>>>> performance and memory usage). By using readGAlignments() you avoid >>>>> that cost and you also end up with an object that is a little bit >>>>> simpler to manipulate (i.e. a GAlignments instead of GAlignmentsList >>>>> object). >>>>> >>>>> Hope that makes sense, >>>>> H. >>>>> >>>>>> >>>>>> Liang >>>>>> ________________________________________ >>>>>> From: Hervé Pagès [hpages at fhcrc.org] >>>>>> Sent: Sunday, March 30, 2014 2:12 PM >>>>>> To: Niu, Liang (NIH/NIEHS) [E] >>>>>> Cc: bioconductor at r-project.org >>>>>> Subject: Re: A question about the function readGAlignmentPairs in GenomicRnages package >>>>>> >>>>>> Hi Liang, >>>>>> >>>>>> Hope you don't mind that I'm cc'ing the Bioconductor mailing list, so >>>>>> other can give suggestions. >>>>>> >>>>>> On 03/30/2014 09:32 AM, Niu, Liang (NIH/NIEHS) [E] wrote: >>>>>>> Dear Mr. Pages, >>>>>>> >>>>>>> This is Liang Niu, a research fellow at the National Institute of Environmental Health Sciences. >>>>>>> >>>>>>> I am using R to read .bam files for chromatin interaction data sets. Such data sets contains alignments for paired-end reads from ChIA-PET experiment, thus it has pairs in which two reads on different chromosomes and/or on same strand, and those pairs are valid pairs. I want to use your function readGAlignmentPairs in GenomicRnages package to read the pairs, but the manual says that it will discard those pairs. Do you have any suggestion? >>>>>> >>>>>> You could use readGAlignmentsList() instead of readGAlignmentPairs(). >>>>>> readGAlignmentsList() will keep these "discordant pairs". It will also >>>>>> keep the reads that cannot be paired. >>>>>> Note that, depending on what you will do downstream, you don't >>>>>> necessarily need to pair the reads (e.g. if you're going to compute >>>>>> the read coverage). In that case you can just load the reads with >>>>>> readGAlignments(). >>>>>> >>>>>> Cheers, >>>>>> H. >>>>>> >>>>>>> >>>>>>> Best, >>>>>>> Liang >>>>>>> >>>>>> >>>>>> -- >>>>>> Hervé Pagès >>>>>> >>>>>> Program in Computational Biology >>>>>> Division of Public Health Sciences >>>>>> Fred Hutchinson Cancer Research Center >>>>>> 1100 Fairview Ave. N, M1-B514 >>>>>> P.O. Box 19024 >>>>>> Seattle, WA 98109-1024 >>>>>> >>>>>> E-mail: hpages at fhcrc.org >>>>>> Phone: (206) 667-5791 >>>>>> Fax: (206) 667-1319 >>>>>> >>>>> >>>>> -- >>>>> Hervé Pagès >>>>> >>>>> Program in Computational Biology >>>>> Division of Public Health Sciences >>>>> Fred Hutchinson Cancer Research Center >>>>> 1100 Fairview Ave. N, M1-B514 >>>>> P.O. Box 19024 >>>>> Seattle, WA 98109-1024 >>>>> >>>>> E-mail: hpages at fhcrc.org >>>>> Phone: (206) 667-5791 >>>>> Fax: (206) 667-1319 >>>>> >>>> >>>> -- >>>> Hervé Pagès >>>> >>>> Program in Computational Biology >>>> Division of Public Health Sciences >>>> Fred Hutchinson Cancer Research Center >>>> 1100 Fairview Ave. N, M1-B514 >>>> P.O. Box 19024 >>>> Seattle, WA 98109-1024 >>>> >>>> E-mail: hpages at fhcrc.org >>>> Phone: (206) 667-5791 >>>> Fax: (206) 667-1319 >>>> >>> >>> -- >>> Hervé Pagès >>> >>> Program in Computational Biology >>> Division of Public Health Sciences >>> Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N, M1-B514 >>> P.O. Box 19024 >>> Seattle, WA 98109-1024 >>> >>> E-mail: hpages at fhcrc.org >>> Phone: (206) 667-5791 >>> Fax: (206) 667-1319 >>> >> >> -- >> Hervé Pagès >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B514 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: hpages at fhcrc.org >> Phone: (206) 667-5791 >> Fax: (206) 667-1319 >> > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages at fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > -- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
ADD COMMENT

Login before adding your answer.

Traffic: 758 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6