problem in reading bam files with EDASeq
2
0
Entering edit mode
rcaloger ▴ 500
@rcaloger-1888
Last seen 6.7 years ago
European Union
Hi, I an trying to use EDASeq for Read-level data. My bam file was generated by bowtie mapping of reads against human hg19 chr22, sam were then converted in bam using picard tool and indexing was done with Rsamtools. If I try to using plotQuality: > bfs BamFileList of length 4 > bfs[[1]] class: BamFile path: H:/data/calogero/bioC- devel/oneChannelGUI.svn/testing.oneChannel/NGS/mRNA/libs.../breast1_R1 .fastq-breast1_R2.fastq.hs_chr22.bam index: H:/data/calogero/bioC- devel/oneChannelGUI.svn/testing.oneChannel/NGS/mRNA/lib.../breast1_R1 .fastq-breast1_R2.fastq.hs_chr22.bam isOpen: FALSE > plotQuality(bfs[[1]]) Errore in .IRanges.checkAndTranslateSingleBracketSubscript(x, i) : subscript contains NAs > I investigated the problem and I found out that the problems comes out at the line of setMethod plotQuality quality <- as(fq[strand=="+"], "matrix") It seems that the problem is related to the presence of three types of codes in mapped strands: +, - and <na> unique(strand) [1] + - <na> Levels: + - * > length(strand) [1] 20044478 > length(strand[which(strand=="-")]) [1] 143352 > length(strand[which(strand=="+")]) [1] 143352 > length(strand[is.na(strand)]) [1] 19757774 > The 19757774 sequences that provide the error are those unmapped where it is not possible to assign a strand #If I remove unmapped quality scores: > fq <- fq[!is.na(strand)] #and if I remove <na> from stran > strand <- strand[!is.na(strand)] #The problem is solved > quality <- as(fq[strand=="+"], "matrix") >str(quality) int [1:143352, 1:50] 70 70 70 68 70 70 51 70 70 70 So I tried: plotQuality(bfs[[1]], flag=scanBamFlag(isUnmappedQuery=FALSE)) but I still get: Error in .IRanges.checkAndTranslateSingleBracketSubscript(x, i) : subscript contains NAs Any suggestion? Many thanks Raffaele -- ---------------------------------------- Prof. Raffaele A. Calogero Bioinformatics and Genomics Unit MBC Centro di Biotecnologie Molecolari Via Nizza 52, Torino 10126 tel. ++39 0116706457 Fax ++39 0112366457 Mobile ++39 3333827080 email: raffaele.calogero at unito.it raffaele[dot]calogero[at]gmail[dot]com www: http://www.bioinformatica.unito.it
Rsamtools EDASeq ASSIGN Rsamtools EDASeq ASSIGN • 891 views
ADD COMMENT
0
Entering edit mode
@valerie-obenchain-4275
Last seen 23 months ago
United States
Hi Raffaele, It looks like you've identified a bug in EDASeq and the author is cc'd. In the meantime, if you're after an assessment of the read quality have you tried using the qa report in ShortRead? fl <- system.file("extdata", "ex1.bam", package="Rsamtools") qaSummary <- qa(fl, type="BAM") report(qaSummary, dest="/path/to/report_directory") This report can then be viewed in your browser. Valerie On 01/11/12 03:55, rcaloger wrote: > Hi, > I an trying to use EDASeq for Read-level data. > My bam file was generated by bowtie mapping of reads against human > hg19 chr22, sam were then converted in bam using picard tool and > indexing was done with Rsamtools. > If I try to using plotQuality: > > bfs > BamFileList of length 4 > > bfs[[1]] > class: BamFile > path: > H:/data/calogero/bioC- devel/oneChannelGUI.svn/testing.oneChannel/NGS/mRNA/libs.../breast1_R1 .fastq-breast1_R2.fastq.hs_chr22.bam > index: > H:/data/calogero/bioC- devel/oneChannelGUI.svn/testing.oneChannel/NGS/mRNA/lib.../breast1_R1 .fastq-breast1_R2.fastq.hs_chr22.bam > isOpen: FALSE > > plotQuality(bfs[[1]]) > Errore in .IRanges.checkAndTranslateSingleBracketSubscript(x, i) : > subscript contains NAs > > > > I investigated the problem and I found out that the problems comes out > at the line of setMethod plotQuality > quality <- as(fq[strand=="+"], "matrix") > It seems that the problem is related to the presence of three types of > codes in mapped strands: +, - and <na> > unique(strand) > [1] + - <na> > Levels: + - * > > length(strand) > [1] 20044478 > > length(strand[which(strand=="-")]) > [1] 143352 > > length(strand[which(strand=="+")]) > [1] 143352 > > length(strand[is.na(strand)]) > [1] 19757774 > > > The 19757774 sequences that provide the error are those unmapped where > it is not possible to assign a strand > > #If I remove unmapped quality scores: > > fq <- fq[!is.na(strand)] > #and if I remove <na> from stran > > strand <- strand[!is.na(strand)] > #The problem is solved > > quality <- as(fq[strand=="+"], "matrix") > >str(quality) > int [1:143352, 1:50] 70 70 70 68 70 70 51 70 70 70 > > So I tried: > plotQuality(bfs[[1]], flag=scanBamFlag(isUnmappedQuery=FALSE)) > but I still get: > Error in .IRanges.checkAndTranslateSingleBracketSubscript(x, i) : > subscript contains NAs > > > Any suggestion? > > Many thanks > Raffaele >
ADD COMMENT
0
Entering edit mode
Just re-sending with the EDASeq maintainer CC'd. Dan On Fri, Jan 20, 2012 at 12:40 PM, Valerie Obenchain <vobencha at="" fhcrc.org=""> wrote: > Hi Raffaele, > > It looks like you've identified a bug in EDASeq and the author is cc'd. > > In the meantime, if you're after an assessment of the read quality have you > tried using the qa report in ShortRead? > > fl <- system.file("extdata", "ex1.bam", package="Rsamtools") > qaSummary <- qa(fl, type="BAM") > report(qaSummary, dest="/path/to/report_directory") > > This report can then be viewed in your browser. > > > Valerie > > > > On 01/11/12 03:55, rcaloger wrote: >> >> Hi, >> I an trying to use EDASeq for Read-level data. >> My bam file was generated by bowtie mapping of reads against human hg19 >> chr22, sam were then converted in bam using picard tool and indexing was >> done with Rsamtools. >> If I try to using plotQuality: >> > bfs >> BamFileList of length 4 >> > bfs[[1]] >> class: BamFile >> path: >> H:/data/calogero/bioC- devel/oneChannelGUI.svn/testing.oneChannel/NGS/mRNA/libs.../breast1_R1 .fastq-breast1_R2.fastq.hs_chr22.bam >> index: >> H:/data/calogero/bioC- devel/oneChannelGUI.svn/testing.oneChannel/NGS/mRNA/lib.../breast1_R1 .fastq-breast1_R2.fastq.hs_chr22.bam >> isOpen: FALSE >> > plotQuality(bfs[[1]]) >> Errore in .IRanges.checkAndTranslateSingleBracketSubscript(x, i) : >> ?subscript contains NAs >> > >> >> I investigated the problem and I found out that the problems comes out >> at the line of setMethod plotQuality >> quality <- as(fq[strand=="+"], "matrix") >> It seems that the problem is related to the presence of three types of >> codes in mapped strands: +, - and <na> >> unique(strand) >> [1] + ? ?- <na> >> Levels: + - * >> > length(strand) >> [1] 20044478 >> > length(strand[which(strand=="-")]) >> [1] 143352 >> > length(strand[which(strand=="+")]) >> [1] 143352 >> > length(strand[is.na(strand)]) >> [1] 19757774 >> > >> The 19757774 sequences that provide the error are those unmapped where it >> is not possible to assign a strand >> >> #If I remove unmapped quality scores: >> > fq <- fq[!is.na(strand)] >> #and if I remove <na> from stran >> > strand <- strand[!is.na(strand)] >> #The problem is solved >> > quality <- as(fq[strand=="+"], "matrix") >> >str(quality) >> ?int [1:143352, 1:50] 70 70 70 68 70 70 51 70 70 70 >> >> So I tried: >> plotQuality(bfs[[1]], flag=scanBamFlag(isUnmappedQuery=FALSE)) >> but I still get: >> ?Error in .IRanges.checkAndTranslateSingleBracketSubscript(x, i) : >> ?subscript contains NAs >> >> >> Any suggestion? >> >> Many thanks >> Raffaele >> > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor
ADD REPLY
0
Entering edit mode
Many thanks for the nice comment I will definetively try this functionality of Rsamtools. Concerning the problem I had with EDASeq, I solved Filtering with Rsamtools the bam file: removing all unmapped and unpaired PE reads. This bam filtered file works perfectly on EDASeq cheers Raffaele Sent from iPad Il giorno 20/gen/2012, alle ore 21:40, Valerie Obenchain <vobencha at="" fhcrc.org=""> ha scritto: > Hi Raffaele, > > It looks like you've identified a bug in EDASeq and the author is cc'd. > > In the meantime, if you're after an assessment of the read quality have you tried using the qa report in ShortRead? > > fl <- system.file("extdata", "ex1.bam", package="Rsamtools") > qaSummary <- qa(fl, type="BAM") > report(qaSummary, dest="/path/to/report_directory") > > This report can then be viewed in your browser. > > > Valerie > > > On 01/11/12 03:55, rcaloger wrote: >> Hi, >> I an trying to use EDASeq for Read-level data. >> My bam file was generated by bowtie mapping of reads against human hg19 chr22, sam were then converted in bam using picard tool and indexing was done with Rsamtools. >> If I try to using plotQuality: >> > bfs >> BamFileList of length 4 >> > bfs[[1]] >> class: BamFile >> path: H:/data/calogero/bioC- devel/oneChannelGUI.svn/testing.oneChannel/NGS/mRNA/libs.../breast1_R1 .fastq-breast1_R2.fastq.hs_chr22.bam >> index: H:/data/calogero/bioC- devel/oneChannelGUI.svn/testing.oneChannel/NGS/mRNA/lib.../breast1_R1 .fastq-breast1_R2.fastq.hs_chr22.bam >> isOpen: FALSE >> > plotQuality(bfs[[1]]) >> Errore in .IRanges.checkAndTranslateSingleBracketSubscript(x, i) : >> subscript contains NAs >> > >> >> I investigated the problem and I found out that the problems comes out >> at the line of setMethod plotQuality >> quality <- as(fq[strand=="+"], "matrix") >> It seems that the problem is related to the presence of three types of codes in mapped strands: +, - and <na> >> unique(strand) >> [1] + - <na> >> Levels: + - * >> > length(strand) >> [1] 20044478 >> > length(strand[which(strand=="-")]) >> [1] 143352 >> > length(strand[which(strand=="+")]) >> [1] 143352 >> > length(strand[is.na(strand)]) >> [1] 19757774 >> > >> The 19757774 sequences that provide the error are those unmapped where it is not possible to assign a strand >> >> #If I remove unmapped quality scores: >> > fq <- fq[!is.na(strand)] >> #and if I remove <na> from stran >> > strand <- strand[!is.na(strand)] >> #The problem is solved >> > quality <- as(fq[strand=="+"], "matrix") >> >str(quality) >> int [1:143352, 1:50] 70 70 70 68 70 70 51 70 70 70 >> >> So I tried: >> plotQuality(bfs[[1]], flag=scanBamFlag(isUnmappedQuery=FALSE)) >> but I still get: >> Error in .IRanges.checkAndTranslateSingleBracketSubscript(x, i) : >> subscript contains NAs >> >> >> Any suggestion? >> >> Many thanks >> Raffaele >> >
ADD REPLY
0
Entering edit mode
davide risso ▴ 870
@davide-risso-5075
Last seen 5 months ago
University of Padova
Hi Raffaele, thank you for your message. Sorry for the late response but I've been traveling. The lines > quality <- as(fq[strand=="+"], "matrix") > quality <- rbind(quality,as(fq[strand=="-"], "matrix")[,NCOL(quality):1]) are needed since SAM/BAM store reads which map onto the reverse strand in reverse complement and hence the quality is reversed. Of course, if you have unmapped reads the strand will not be present, so I changed the function in the devel version of the package to fix this. Cheers, davide On Wed, Jan 11, 2012 at 3:55 AM, rcaloger <raffaele.calogero at="" gmail.com=""> wrote: > Hi, > I an trying to use EDASeq for Read-level data. > My bam file was generated by bowtie mapping of reads against human hg19 > chr22, sam were then converted in bam using picard tool and indexing was > done with Rsamtools. > If I try to using plotQuality: >> bfs > BamFileList of length 4 >> bfs[[1]] > class: BamFile > path: > H:/data/calogero/bioC- devel/oneChannelGUI.svn/testing.oneChannel/NGS/mRNA/libs.../breast1_R1 .fastq-breast1_R2.fastq.hs_chr22.bam > index: > H:/data/calogero/bioC- devel/oneChannelGUI.svn/testing.oneChannel/NGS/mRNA/lib.../breast1_R1 .fastq-breast1_R2.fastq.hs_chr22.bam > isOpen: FALSE >> plotQuality(bfs[[1]]) > Errore in .IRanges.checkAndTranslateSingleBracketSubscript(x, i) : > ?subscript contains NAs >> > > I investigated the problem and I found out that the problems comes out > at the line of setMethod plotQuality > quality <- as(fq[strand=="+"], "matrix") > It seems that the problem is related to the presence of three types of codes > in mapped strands: +, - and <na> > unique(strand) > [1] + ? ?- <na> > Levels: + - * >> length(strand) > [1] 20044478 >> length(strand[which(strand=="-")]) > [1] 143352 >> length(strand[which(strand=="+")]) > [1] 143352 >> length(strand[is.na(strand)]) > [1] 19757774 >> > The 19757774 sequences that provide the error are those unmapped where it is > not possible to assign a strand > > #If I remove unmapped quality scores: >> fq <- fq[!is.na(strand)] > #and if I remove <na> from stran >> strand <- strand[!is.na(strand)] > #The problem is solved >> quality <- as(fq[strand=="+"], "matrix") >>str(quality) > ?int [1:143352, 1:50] 70 70 70 68 70 70 51 70 70 70 > > So I tried: > plotQuality(bfs[[1]], flag=scanBamFlag(isUnmappedQuery=FALSE)) > but I still get: > ?Error in .IRanges.checkAndTranslateSingleBracketSubscript(x, i) : > ?subscript contains NAs > > > Any suggestion? > > Many thanks > Raffaele > > -- > > ---------------------------------------- > Prof. Raffaele A. Calogero > Bioinformatics and Genomics Unit > MBC Centro di Biotecnologie Molecolari > Via Nizza 52, Torino 10126 > tel. ? ++39 0116706457 > Fax ? ?++39 0112366457 > Mobile ++39 3333827080 > email: raffaele.calogero at unito.it > ? ? ? raffaele[dot]calogero[at]gmail[dot]com > www: ? http://www.bioinformatica.unito.it > > -- --------------------------------------------- Davide Risso Department of Statistical Sciences University of Padova, Italy e-mail: davide at stat.unipd.it
ADD COMMENT

Login before adding your answer.

Traffic: 342 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6