makeGappedAlignmentPairs for broken pairs
1
0
Entering edit mode
Ido M. Tamir ▴ 150
@ido-m-tamir-2778
Last seen 10.2 years ago
Hi, is there a roadmap for allowing paired end reads where only one is mapped to be reported? Basically I would like to have readGappedAlignmentPairs report them also. One could reread the whole file and take only those reads where the makeGappedAlignmentPairs test was unsuccessfull. thank you very much, ido
• 860 views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 2.9 years ago
United States
Hi Ido, library(GenomicAlignments) fl <- system.file("extdata", "ex1.bam", package="Rsamtools") You can set different flags in ScanBamParam to control what is read in. See ?ScanBamParam for details. Here is a 'param' that reads only records with unmapped mates. param <- ScanBamParam(flag=scanBamFlag(hasUnmappedMate=TRUE)) Two classes can hold paired-end reads, one is GAlignmentPairs and the other is GAlignmentsList. They run the same mate-pairing code but return different objects. ## GAlignmentPairs: Oops. Looks like there is a bug in readGAlignmentPairs() when this flag is set. We will fix that. > gapairs <- readGAlignmentPairs(fl, param=param) Error in .combineBamFlagFilters(bamFlag(param, asInteger = TRUE), flag0) : BAM flag filters to combine are incompatible ## GAlignmentsList: galist <- readGAlignmentsList(fl, param=param) >> galist > GAlignmentsList of length 127: > [[1]] > GAlignments with 1 alignment and 0 metadata columns: > seqnames strand cigar qwidth start end width ngap > [1] seq2 + 35M 35 1520 1554 35 0 > > [[2]] > GAlignments with 1 alignment and 0 metadata columns: > seqnames strand cigar qwidth start end width ngap > [1] seq1 + 36M 36 6 41 36 0 There is a metadata column on the list that has the mate status. In this case they are all 'unmated'. > mcols(galist) DataFrame with 127 rows and 1 column mate_status <factor> 1 unmated 2 unmated 3 unmated 4 unmated 5 unmated ... ... This code is available in the devel branch. Valerie > sessionInfo() R Under development (unstable) (2014-01-06 r64682) Platform: x86_64-unknown-linux-gnu (64-bit) ... other attached packages: [1] GenomicAlignments_0.99.11 VariantAnnotation_1.9.28 [3] Rsamtools_1.15.18 Biostrings_2.31.7 [5] GenomicRanges_1.15.19 XVector_0.3.6 [7] IRanges_1.21.19 BiocGenerics_0.9.2 loaded via a namespace (and not attached): [1] AnnotationDbi_1.25.9 Biobase_2.23.3 biomaRt_2.19.1 [4] bitops_1.0-6 BSgenome_1.31.7 DBI_0.2-7 [7] GenomicFeatures_1.15.4 RCurl_1.95-4.1 RSQLite_0.11.4 [10] rtracklayer_1.23.6 stats4_3.1.0 tools_3.1.0 [13] XML_3.98-1.1 zlibbioc_1.9.0 On 01/16/2014 02:57 AM, Ido Tamir wrote: > Hi, > is there a roadmap for allowing paired end reads where only one is mapped to be reported? > Basically I would like to have readGappedAlignmentPairs report them also. > One could reread the whole file and take only those reads where the makeGappedAlignmentPairs test > was unsuccessfull. > > thank you very much, > ido > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Valerie Obenchain Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B155 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: vobencha at fhcrc.org Phone: (206) 667-3158 Fax: (206) 667-1319
ADD COMMENT
0
Entering edit mode
Dear Valerie, thank you very much for your reply. I can't test the development packages at the moment, I'm still on 2.12. > param <- ScanBamParam(flag=scanBamFlag(hasUnmappedMate=TRUE)) a) Will this load only reads with unmapped mates or will this allow both proper paired and unpaired which is what I really want. reading the files in twice (once for proper pairs, once for single) would be possible, but I prefer reading in once. b) will findMateAlignment be extended to allow these unmapped mate reads to be represented in a GAlignmentPairs class? c) Do you know when this (2.14) will be released? thank you very much, ido On Jan 16, 2014, at 9:09 PM, Valerie Obenchain <vobencha at="" fhcrc.org=""> wrote: > Hi Ido, > > library(GenomicAlignments) > fl <- system.file("extdata", "ex1.bam", package="Rsamtools") > > You can set different flags in ScanBamParam to control what is read in. > See ?ScanBamParam for details. Here is a 'param' that reads only records with unmapped mates. > > param <- ScanBamParam(flag=scanBamFlag(hasUnmappedMate=TRUE)) > > Two classes can hold paired-end reads, one is GAlignmentPairs and the other is GAlignmentsList. They run the same mate-pairing code but return different objects. > > ## GAlignmentPairs: > > Oops. Looks like there is a bug in readGAlignmentPairs() when this flag is set. We will fix that. > > gapairs <- readGAlignmentPairs(fl, param=param) > Error in .combineBamFlagFilters(bamFlag(param, asInteger = TRUE), flag0) : > BAM flag filters to combine are incompatible > > ## GAlignmentsList: > > galist <- readGAlignmentsList(fl, param=param) >>> galist >> GAlignmentsList of length 127: >> [[1]] >> GAlignments with 1 alignment and 0 metadata columns: >> seqnames strand cigar qwidth start end width ngap >> [1] seq2 + 35M 35 1520 1554 35 0 >> >> [[2]] >> GAlignments with 1 alignment and 0 metadata columns: >> seqnames strand cigar qwidth start end width ngap >> [1] seq1 + 36M 36 6 41 36 0 > > There is a metadata column on the list that has the mate status. In this case they are all 'unmated'. > > mcols(galist) > DataFrame with 127 rows and 1 column > mate_status > <factor> > 1 unmated > 2 unmated > 3 unmated > 4 unmated > 5 unmated > ... ... > > This code is available in the devel branch. > > Valerie > > > > sessionInfo() > R Under development (unstable) (2014-01-06 r64682) > Platform: x86_64-unknown-linux-gnu (64-bit) > ... > > other attached packages: > [1] GenomicAlignments_0.99.11 VariantAnnotation_1.9.28 > [3] Rsamtools_1.15.18 Biostrings_2.31.7 > [5] GenomicRanges_1.15.19 XVector_0.3.6 > [7] IRanges_1.21.19 BiocGenerics_0.9.2 > > loaded via a namespace (and not attached): > [1] AnnotationDbi_1.25.9 Biobase_2.23.3 biomaRt_2.19.1 > [4] bitops_1.0-6 BSgenome_1.31.7 DBI_0.2-7 > [7] GenomicFeatures_1.15.4 RCurl_1.95-4.1 RSQLite_0.11.4 > [10] rtracklayer_1.23.6 stats4_3.1.0 tools_3.1.0 > [13] XML_3.98-1.1 zlibbioc_1.9.0 > > > > > On 01/16/2014 02:57 AM, Ido Tamir wrote: >> Hi, >> is there a roadmap for allowing paired end reads where only one is mapped to be reported? >> Basically I would like to have readGappedAlignmentPairs report them also. >> One could reread the whole file and take only those reads where the makeGappedAlignmentPairs test >> was unsuccessfull. >> >> thank you very much, >> ido >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >> > > > -- > Valerie Obenchain > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B155 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: vobencha at fhcrc.org > Phone: (206) 667-3158 > Fax: (206) 667-1319
ADD REPLY
0
Entering edit mode
Hi, One clarification about what I mistook as a 'bug' in readGAlignmentPairs with the 'hasUnmappedMate' flag. I don't think this a bug at all but instead intended behavior. The GAlignmentPairs class can only contain pairs in a 'left', 'right' fashion. It can't hold one read without a mate. The GAlignmentsList class was developed to meet this need and can hold paired reads as well as single fragments or groups (more on that below). On 01/17/14 01:50, Ido Tamir wrote: > Dear Valerie, > thank you very much for your reply. I can't test the development packages at the moment, I'm still on 2.12. readGAlignmentsList() is in the GenomicRanges package in the current release branch (Bioconductor 2.13, released October 2013). If you update to the current release you can read in both properly-paired and unmapped reads with readGAlignmentsList(). For help installing: http://www.bioconductor.org/install/ > >> param <- ScanBamParam(flag=scanBamFlag(hasUnmappedMate=TRUE)) > a) > Will this load only reads with unmapped mates or will this allow both proper paired and unpaired which is what I really want. > reading the files in twice (once for proper pairs, once for single) would be possible, but I prefer reading in once. That particular param will only read in records with unmapped mates. The ?ScanBamParam man page lists all the flags and describes them in detail. You can create a query with (almost) any combination of flags. To read in all records, do not supply a param. readGAlignmentsList(fl) To read in only proper pairs use the 'isProperPair' flag. In release the output of readGAlignmentsList() differs from devel. In release the mate status is a elementMetadata column on the individual GAlignments inside the GAlignmentsList object. > b) > will findMateAlignment be extended to allow these unmapped mate reads to be represented in a GAlignmentPairs class? findMateAlignment() is going away. readGAlignmentPairs() will stay. readGAlignmentPairs() and readGAlignmentsList() call the same code in the background but return a different object. I don't think the GAlignmentPairs container will be modified to hold one read without its mate. The GAlignmentsList class was developed to be a more flexible container that could hold single fragments, paired reads and groups of multiple fragments. If you are after both paired and single reads (e.g., unmapped reads) you should try readGAlignmentsList(). > c) > Do you know when this (2.14) will be released? The next release will be in spring around March or April. Closer to the date the schedule will be posted here: http://www.bioconductor.org/developers/release-schedule/ Valerie > > thank you very much, > ido > > > On Jan 16, 2014, at 9:09 PM, Valerie Obenchain <vobencha at="" fhcrc.org=""> wrote: > >> Hi Ido, >> >> library(GenomicAlignments) >> fl <- system.file("extdata", "ex1.bam", package="Rsamtools") >> >> You can set different flags in ScanBamParam to control what is read in. >> See ?ScanBamParam for details. Here is a 'param' that reads only records with unmapped mates. >> >> param <- ScanBamParam(flag=scanBamFlag(hasUnmappedMate=TRUE)) >> >> Two classes can hold paired-end reads, one is GAlignmentPairs and the other is GAlignmentsList. They run the same mate-pairing code but return different objects. >> >> ## GAlignmentPairs: >> >> Oops. Looks like there is a bug in readGAlignmentPairs() when this flag is set. We will fix that. >>> gapairs <- readGAlignmentPairs(fl, param=param) >> Error in .combineBamFlagFilters(bamFlag(param, asInteger = TRUE), flag0) : >> BAM flag filters to combine are incompatible >> >> ## GAlignmentsList: >> >> galist <- readGAlignmentsList(fl, param=param) >>>> galist >>> GAlignmentsList of length 127: >>> [[1]] >>> GAlignments with 1 alignment and 0 metadata columns: >>> seqnames strand cigar qwidth start end width ngap >>> [1] seq2 + 35M 35 1520 1554 35 0 >>> >>> [[2]] >>> GAlignments with 1 alignment and 0 metadata columns: >>> seqnames strand cigar qwidth start end width ngap >>> [1] seq1 + 36M 36 6 41 36 0 >> >> There is a metadata column on the list that has the mate status. In this case they are all 'unmated'. >>> mcols(galist) >> DataFrame with 127 rows and 1 column >> mate_status >> <factor> >> 1 unmated >> 2 unmated >> 3 unmated >> 4 unmated >> 5 unmated >> ... ... >> >> This code is available in the devel branch. >> >> Valerie >> >> >>> sessionInfo() >> R Under development (unstable) (2014-01-06 r64682) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> ... >> >> other attached packages: >> [1] GenomicAlignments_0.99.11 VariantAnnotation_1.9.28 >> [3] Rsamtools_1.15.18 Biostrings_2.31.7 >> [5] GenomicRanges_1.15.19 XVector_0.3.6 >> [7] IRanges_1.21.19 BiocGenerics_0.9.2 >> >> loaded via a namespace (and not attached): >> [1] AnnotationDbi_1.25.9 Biobase_2.23.3 biomaRt_2.19.1 >> [4] bitops_1.0-6 BSgenome_1.31.7 DBI_0.2-7 >> [7] GenomicFeatures_1.15.4 RCurl_1.95-4.1 RSQLite_0.11.4 >> [10] rtracklayer_1.23.6 stats4_3.1.0 tools_3.1.0 >> [13] XML_3.98-1.1 zlibbioc_1.9.0 >> >> >> >> >> On 01/16/2014 02:57 AM, Ido Tamir wrote: >>> Hi, >>> is there a roadmap for allowing paired end reads where only one is mapped to be reported? >>> Basically I would like to have readGappedAlignmentPairs report them also. >>> One could reread the whole file and take only those reads where the makeGappedAlignmentPairs test >>> was unsuccessfull. >>> >>> thank you very much, >>> ido >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> >> -- >> Valerie Obenchain >> >> Program in Computational Biology >> Division of Public Health Sciences >> Fred Hutchinson Cancer Research Center >> 1100 Fairview Ave. N, M1-B155 >> P.O. Box 19024 >> Seattle, WA 98109-1024 >> >> E-mail: vobencha at fhcrc.org >> Phone: (206) 667-3158 >> Fax: (206) 667-1319 >
ADD REPLY

Login before adding your answer.

Traffic: 894 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6