Best way to capture both NCBI and UCSC labeled alignments?
1
0
Entering edit mode
chris warth ▴ 30
@chris-warth-6295
Last seen 8.6 years ago
United States
I am using readGAlignmentsFromBam() to extract alignments that overlap a set of genomic ranges. The ranges include seqnames derived from UCSC nomenclature, eg chr1, chr2, chrY, etc. However, some of my BAM files (from TCGA) use NCBI nomenclature for their chromosomes, eg 1, 2, Y, etc. When I try to extract alignments from these files I get an error message, readGAlignmentsFromBam(bamfile, param=param) Error in value[[3L]](cond) : 'scanBam' failed: record: 0 error: 0 file: /home/TCGA/LAML/RNA-seq/TCGA-AB-2847-03A- 01T-0736-13_rnaseq.bam index: /home/TCGA/LAML/RNA-seq/TCGA-AB-2847-03A- 01T-0736-13_rnaseq.bam In addition: Warning message: In doTryCatch(return(expr), name, parentenv, handler) : space 'chrY' not in BAM header I am handling this by wrapping the call to readGAlignmentsFromBam() in a try-catch. If an error is caught I directly modifying the seqnames in the genomic ranges before trying the call to readGAlignmentsFromBam() again. This seems highly kludgy. Is there any way to allow looser matching of seqnames when extracting alignments? Is there a better way to handle this situation? Thanks in advance, -csw [[alternative HTML version deleted]]
• 1.3k views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 4 days ago
United States
On 12/18/2013 03:04 PM, chris warth wrote: > I am using readGAlignmentsFromBam() to extract alignments that overlap a > set of genomic ranges. The ranges include seqnames derived from UCSC > nomenclature, eg chr1, chr2, chrY, etc. > > However, some of my BAM files (from TCGA) use NCBI nomenclature for their > chromosomes, eg 1, 2, Y, etc. When I try to extract alignments from these > files I get an error message, > > > readGAlignmentsFromBam(bamfile, param=param) > Error in value[[3L]](cond) : 'scanBam' failed: > record: 0 > error: 0 > file: /home/TCGA/LAML/RNA-seq/TCGA-AB-2847-03A- 01T-0736-13_rnaseq.bam > index: /home/TCGA/LAML/RNA-seq/TCGA-AB-2847-03A- 01T-0736-13_rnaseq.bam > In addition: Warning message: > In doTryCatch(return(expr), name, parentenv, handler) : > space 'chrY' not in BAM header > > > I am handling this by wrapping the call to readGAlignmentsFromBam() in a > try-catch. If an error is caught I directly modifying the seqnames in the > genomic ranges before trying the call to readGAlignmentsFromBam() again. > This seems highly kludgy. > > Is there any way to allow looser matching of seqnames when extracting > alignments? Is there a better way to handle this situation? no looser matching, but seqlevels(BamFile("/home/TCGA/LAML/RNA-seq/TCGA-AB-2847-03A- 01T-0736-13_rnaseq.bam")) tells you the levels in the bam file so you don't have to catch errors. Martin > > Thanks in advance, > > -csw > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENT

Login before adding your answer.

Traffic: 600 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6