Search
Question: Lifting over a bam file with Rsamtools and GenomicRanges
0
gravatar for Guest User
4.2 years ago by
Guest User12k
Guest User12k wrote:
Hi, I have a bam file that was a aligned to a non-reference genome and I have the coordinate mapping between the non-reference and the reference genomes, either in a chains format or in the following block like format: #CHR REF NONREF chr1 1 1 chr1 3003641 0 chr1 3003645 3003641 chr1 3003650 0 chr1 3003654 3003646 chr1 3006791 0 chr1 3006793 3006783 chr1 3006835 0 chr1 3006836 3006825 chr1 0 3007262 chr1 3007273 0 chr1 3007275 3007263 chr1 0 3008478 chr1 3008490 3008481 Where the lines containing 0's indicate deletions (either in the ref or in the nonref). I would like to lift-over my bam from the non-reference to the reference which basically means lifting over the "pos" and "cigar" fields. Using the functions of GenomicRanges it is straight forward to do it for the "pos" field but it gets complicated for the "cigar" field. Any idea how can this be done efficiently? -- output of sessionInfo(): R version 2.15.2 (2012-10-26) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] data.table_1.8.10 Rsamtools_1.10.2 Biostrings_2.26.3 GenomicRanges_1.10.7 IRanges_1.16.6 BiocGenerics_0.4.0 loaded via a namespace (and not attached): [1] bitops_1.0-5 parallel_2.15.2 stats4_2.15.2 tools_2.15.2 zlibbioc_1.4.0 -- Sent via the guest posting facility at bioconductor.org.
ADD COMMENTlink modified 4.2 years ago by Michael Lawrence9.8k • written 4.2 years ago by Guest User12k
0
gravatar for Michael Lawrence
4.2 years ago by
United States
Michael Lawrence9.8k wrote:
If I were you I'd just realign the data to the other genomes, but if for some reason you really need to do this, use grglist() to get the alignments as a GRangesList, then unlist it, and call rtracklayer::liftOver. It will return a GRangesList, where each element corresponds to the input GRanges. You'll need to account for the elements that are not length one (i.e., not 1-1 mappings). After that, you can form a GRangesList using the groupings from the original grglist() result. Getting that back into a BAM will be tricky though, but maybe the ranges are all you need? On Sun, Sep 22, 2013 at 11:37 AM, rubi [guest] <guest@bioconductor.org>wrote: > > Hi, > > I have a bam file that was a aligned to a non-reference genome and I have > the coordinate mapping between the non-reference and the reference genomes, > either in a chains format or in the following block like format: > #CHR REF NONREF > chr1 1 1 > chr1 3003641 0 > chr1 3003645 3003641 > chr1 3003650 0 > chr1 3003654 3003646 > chr1 3006791 0 > chr1 3006793 3006783 > chr1 3006835 0 > chr1 3006836 3006825 > chr1 0 3007262 > chr1 3007273 0 > chr1 3007275 3007263 > chr1 0 3008478 > chr1 3008490 3008481 > > Where the lines containing 0's indicate deletions (either in the ref or in > the nonref). > > I would like to lift-over my bam from the non-reference to the reference > which basically means lifting over the "pos" and "cigar" fields. Using the > functions of GenomicRanges it is straight forward to do it for the "pos" > field but it gets complicated for the "cigar" field. Any idea how can this > be done efficiently? > > -- output of sessionInfo(): > > R version 2.15.2 (2012-10-26) > Platform: x86_64-w64-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] data.table_1.8.10 Rsamtools_1.10.2 Biostrings_2.26.3 > GenomicRanges_1.10.7 IRanges_1.16.6 BiocGenerics_0.4.0 > > loaded via a namespace (and not attached): > [1] bitops_1.0-5 parallel_2.15.2 stats4_2.15.2 tools_2.15.2 > zlibbioc_1.4.0 > > > -- > Sent via the guest posting facility at bioconductor.org. > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENTlink written 4.2 years ago by Michael Lawrence9.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 142 users visited in the last hour