possible bug in extendReads() in package chipseq
1
0
Entering edit mode
Nora Rieber ▴ 40
@nora-rieber-3722
Last seen 10.3 years ago
Dear all, I'm using the ShortRead package to read in my MAQ alignment output files with readAligned. Then I extract a subset of reads mapping to one chromosome: aln_Pat_chr<-aln_Pat[chromosome(aln_Pat)==chromosomeString] and extend the reads with extendReads: aln_Pat_chr_extended<-extendReads(aln_Pat_chr, seqLen=350) However, extendReads seems to misplace the reads. Here's the output of aln_Pat_chr: >aln_Pat_chr class: AlignedRead length: 1025425 reads; width: 36 cycles chromosome: gi|51511732|ref|NC_000016.8|NC_000016 gi|51511732|ref|NC_000016.8|NC_000016 ... gi|51511732|ref|NC_000016.8|NC_000016 gi|51511732|ref|NC_000016.8|NC_000016 position: 553 553 ... 88698835 88762373 strand: - - ... + - alignQuality: IntegerQuality alignData varLabels: nMismatchBestHit mismatchQuality nExactMatch24 nOneMismatch24 and the output of aln_Pat_chr_extended: > aln_Pat_chr_extended $`gi|51511732|ref|NC_000016.8|NC_000016` IRanges of length 1025426 start end width [1] 553 902 350 [2] 239 588 350 [3] 239 588 350 [4] 239 588 350 [5] 239 588 350 [6] 239 588 350 [7] 239 588 350 [8] 239 588 350 [9] 239 588 350 ... ... ... ... [1025418] 239 588 350 [1025419] 239 588 350 [1025420] 239 588 350 [1025421] 239 588 350 [1025422] 239 588 350 [1025423] 239 588 350 [1025424] 239 588 350 [1025425] 239 588 350 [1025426] 239 588 350 While the highest mapped position is 88762373 before extension, it is 902 after extension. This happens with any dataset I use the function with. Does anybody observe the same thing? I tried to recreate this using the maq alignment data coming with the ShortRead package, but using extendReads on it throws an error: >testaln<-readAligned("~/R/2.10/ShortRead/extdata/maq","out.aln.2.txt" ,type="MAQMapview") >readsextended<-extendReads(testaln,seqLen=40) Error in s1[[ineg]] : attempt to select less than one element. Am I doing something wrong? This is my session info: sessionInfo() R version 2.10.0 (2009-10-26) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=de_DE.UTF-8 [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] chipseq_0.2.0 ShortRead_1.4.0 lattice_0.17-26 BSgenome_1.14.1 [5] Biostrings_2.14.5 IRanges_1.4.6 loaded via a namespace (and not attached): [1] Biobase_2.6.0 grid_2.10.0 hwriter_1.1 tools_2.10.0 Best, Nora
Alignment ShortRead Alignment ShortRead • 888 views
ADD COMMENT
0
Entering edit mode
@martin-morgan-1513
Last seen 5 months ago
United States
Hi Nora -- Nora Rieber <n.rieber at="" dkfz-heidelberg.de=""> writes: > Dear all, > > I'm using the ShortRead package to read in my MAQ alignment output files > with readAligned. Then I extract a subset of reads mapping to one > chromosome: > > aln_Pat_chr<-aln_Pat[chromosome(aln_Pat)==chromosomeString] > > and extend the reads with extendReads: > > aln_Pat_chr_extended<-extendReads(aln_Pat_chr, seqLen=350) > > However, extendReads seems to misplace the reads. Here's the output of > aln_Pat_chr: > >>aln_Pat_chr > class: AlignedRead > length: 1025425 reads; width: 36 cycles > chromosome: gi|51511732|ref|NC_000016.8|NC_000016 > gi|51511732|ref|NC_000016.8|NC_000016 ... > gi|51511732|ref|NC_000016.8|NC_000016 gi|51511732|ref|NC_000016.8|NC_000016 > position: 553 553 ... 88698835 88762373 > strand: - - ... + - > alignQuality: IntegerQuality > alignData varLabels: nMismatchBestHit mismatchQuality nExactMatch24 > nOneMismatch24 > > and the output of aln_Pat_chr_extended: > >> aln_Pat_chr_extended > $`gi|51511732|ref|NC_000016.8|NC_000016` > IRanges of length 1025426 > start end width > [1] 553 902 350 > [2] 239 588 350 > [3] 239 588 350 > [4] 239 588 350 > [5] 239 588 350 > [6] 239 588 350 > [7] 239 588 350 > [8] 239 588 350 > [9] 239 588 350 > ... ... ... ... > [1025418] 239 588 350 > [1025419] 239 588 350 > [1025420] 239 588 350 > [1025421] 239 588 350 > [1025422] 239 588 350 > [1025423] 239 588 350 > [1025424] 239 588 350 > [1025425] 239 588 350 > [1025426] 239 588 350 > > While the highest mapped position is 88762373 before extension, it is > 902 after extension. This happens with any dataset I use the function > with. Does anybody observe the same thing? > > I tried to recreate this using the maq alignment data coming with the > ShortRead package, but using extendReads on it throws an error: > >>testaln<-readAligned("~/R/2.10/ShortRead/extdata/maq","out.aln.2.txt ",type="MAQMapview") >>readsextended<-extendReads(testaln,seqLen=40) > Error in s1[[ineg]] : attempt to select less than one element. > > Am I doing something wrong? There do seem to be bugs in extendReads applied to AligendRead objects. A work around (the package is being revised; look for version 0.2.1 in the next few days) is extendAlignedRead <- function(reads, seqLen=200, strand=c("+", "-")) { rng <- IRanges(ifelse(strand(reads) == "+", position(reads), position(reads) + width(reads) - seqLen), ifelse(strand(reads) == "+", position(reads) + seqLen - 1L, position(reads) + width(reads) - 1L)) strandIdx <- strand(reads) %in% strand split(rng[strandIdx], chromosome(reads)[strandIdx]) } Martin > This is my session info: > > sessionInfo() > R version 2.10.0 (2009-10-26) > x86_64-unknown-linux-gnu > > locale: > [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C > [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=de_DE.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=de_DE.UTF-8 > [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] chipseq_0.2.0 ShortRead_1.4.0 lattice_0.17-26 BSgenome_1.14.1 > [5] Biostrings_2.14.5 IRanges_1.4.6 > > loaded via a namespace (and not attached): > [1] Biobase_2.6.0 grid_2.10.0 hwriter_1.1 tools_2.10.0 > > Best, > Nora > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
ADD COMMENT

Login before adding your answer.

Traffic: 401 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6