Lifting over a bam file with Rsamtools and GenomicRanges
1
0
Entering edit mode
Guest User ★ 13k
@guest-user-4897
Last seen 10.3 years ago
Hi, I have a bam file that was a aligned to a non-reference genome and I have the coordinate mapping between the non-reference and the reference genomes, either in a chains format or in the following block like format: #CHR REF NONREF chr1 1 1 chr1 3003641 0 chr1 3003645 3003641 chr1 3003650 0 chr1 3003654 3003646 chr1 3006791 0 chr1 3006793 3006783 chr1 3006835 0 chr1 3006836 3006825 chr1 0 3007262 chr1 3007273 0 chr1 3007275 3007263 chr1 0 3008478 chr1 3008490 3008481 Where the lines containing 0's indicate deletions (either in the ref or in the nonref). I would like to lift-over my bam from the non-reference to the reference which basically means lifting over the "pos" and "cigar" fields. Using the functions of GenomicRanges it is straight forward to do it for the "pos" field but it gets complicated for the "cigar" field. Any idea how can this be done efficiently? -- output of sessionInfo(): R version 2.15.2 (2012-10-26) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] data.table_1.8.10 Rsamtools_1.10.2 Biostrings_2.26.3 GenomicRanges_1.10.7 IRanges_1.16.6 BiocGenerics_0.4.0 loaded via a namespace (and not attached): [1] bitops_1.0-5 parallel_2.15.2 stats4_2.15.2 tools_2.15.2 zlibbioc_1.4.0 -- Sent via the guest posting facility at bioconductor.org.
GenomicRanges genomes GenomicRanges genomes • 1.7k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 3.0 years ago
United States
If I were you I'd just realign the data to the other genomes, but if for some reason you really need to do this, use grglist() to get the alignments as a GRangesList, then unlist it, and call rtracklayer::liftOver. It will return a GRangesList, where each element corresponds to the input GRanges. You'll need to account for the elements that are not length one (i.e., not 1-1 mappings). After that, you can form a GRangesList using the groupings from the original grglist() result. Getting that back into a BAM will be tricky though, but maybe the ranges are all you need? On Sun, Sep 22, 2013 at 11:37 AM, rubi [guest] <guest@bioconductor.org>wrote: > > Hi, > > I have a bam file that was a aligned to a non-reference genome and I have > the coordinate mapping between the non-reference and the reference genomes, > either in a chains format or in the following block like format: > #CHR REF NONREF > chr1 1 1 > chr1 3003641 0 > chr1 3003645 3003641 > chr1 3003650 0 > chr1 3003654 3003646 > chr1 3006791 0 > chr1 3006793 3006783 > chr1 3006835 0 > chr1 3006836 3006825 > chr1 0 3007262 > chr1 3007273 0 > chr1 3007275 3007263 > chr1 0 3008478 > chr1 3008490 3008481 > > Where the lines containing 0's indicate deletions (either in the ref or in > the nonref). > > I would like to lift-over my bam from the non-reference to the reference > which basically means lifting over the "pos" and "cigar" fields. Using the > functions of GenomicRanges it is straight forward to do it for the "pos" > field but it gets complicated for the "cigar" field. Any idea how can this > be done efficiently? > > -- output of sessionInfo(): > > R version 2.15.2 (2012-10-26) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] data.table_1.8.10 Rsamtools_1.10.2 Biostrings_2.26.3 > GenomicRanges_1.10.7 IRanges_1.16.6 BiocGenerics_0.4.0 > > loaded via a namespace (and not attached): > [1] bitops_1.0-5 parallel_2.15.2 stats4_2.15.2 tools_2.15.2 > zlibbioc_1.4.0 > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 348 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6