pileup function in ShortRead package

0

Entering edit mode

Nora Rieber ▴ 40

@nora-rieber-3722

Last seen 11.3 years ago

Hi there, I'm using the pileup function in the ShortRead package and it seems I call it with the wrong arguments, or that there is some kind of bug in the function. All my reads are piled up in forward direction only, even if they were actually mapped to the backward strand. Here's a small (nonfunctional since I assume I cannot append data to this e-mail) example of how I call the pileup function: #read in the MAQ alignment file: aln_s_7_MCIP4<-readAligned("/home","sequenceALN.txt", type="MAQMapview") aln_Pat_subset<-aln_Pat[1:10] pos_Patient_subset<-position(aln_Pat_subset) str_Patient_subset<-strand(aln_Pat_subset) #use only a small subset (all positions are on the same chromosome 1) subset_pileuptest<-pileup(pos_Patient_subset, 36, max(pos_Patient,na.rm=TRUE)+37,dir=str_Patient_subset) Is there anything wrong with my dir argument? I checked several places all over the genome for the complete alignment and my reads all seem to be considered on the + strand, even when they are not. Can anybody help? This is my session info: > sessionInfo() R version 2.9.2 (2009-08-24) x86_64-unknown-linux-gnu locale: LC_CTYPE=de_DE.UTF-8;LC_NUMERIC=C;LC_TIME=de_DE.UTF-8;LC_COLLATE=de_DE .UTF-8;LC_MONETARY=C;LC_MESSAGES=de_DE.UTF-8;LC_PAPER=de_DE.UTF-8;LC_N AME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=de_DE.UTF-8;LC_IDENTI FICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ShortRead_1.2.1 lattice_0.17-25 BSgenome_1.12.3 Biostrings_2.12.9 [5] IRanges_1.2.3 loaded via a namespace (and not attached): [1] Biobase_2.4.1 grid_2.9.2 hwriter_1.1 Thanks, Nora

Alignment ShortRead Alignment ShortRead • 1.4k views

ADD COMMENT • link updated 16.2 years ago by Martin Morgan 25k • written 16.2 years ago by Nora Rieber ▴ 40

0

Entering edit mode

Patrick Aboyoun ★ 1.6k

@patrick-aboyoun-6734

Last seen 11.3 years ago

United States

Nora, In cases like yours, it is useful to provide the error message along with the code. In particular you reference objects aln_Pat and pos_Patient that are not created in your code snippet. After I made some modifications to your code, I was able to run it without issue: > suppressMessages(library(ShortRead)) > sp <- SolexaPath(system.file("extdata", package="ShortRead")) > filt <- compose(chromosomeFilter("chr[1-5].fa"), strandFilter("+")) > aln_Pat <- readAligned(sp, "s_2_export.txt", filter=filt) > aln_Pat_subset<-aln_Pat[1:10] > pos_Patient_subset<-position(aln_Pat_subset) > str_Patient_subset<-strand(aln_Pat_subset) > subset_pileuptest<-pileup(pos_Patient_subset, 36, > max(pos_Patient_subset,na.rm=TRUE)+37,dir=str_Patient_subset) > head(subset_pileuptest) [1] 0 0 0 0 0 0 > sessionInfo() R version 2.9.2 (2009-08-24) x86_64-unknown-linux-gnu locale: LC_CTYPE=en_US;LC_NUMERIC=C;LC_TIME=en_US;LC_COLLATE=en_US;LC_MONETARY =C;LC_MESSAGES=en_US;LC_PAPER=en_US;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHON E=C;LC_MEASUREMENT=en_US;LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ShortRead_1.2.1 lattice_0.17-26 BSgenome_1.12.4 Biostrings_2.12.10 [5] IRanges_1.2.3 loaded via a namespace (and not attached): [1] Biobase_2.4.1 grid_2.9.2 hwriter_1.1 tools_2.9.2 Quoting Nora Rieber <n.rieber at="" dkfz-heidelberg.de="">: > Hi there, > > I'm using the pileup function in the ShortRead package and it seems I > call it with the wrong arguments, or that there is some kind of bug in > the function. All my reads are piled up in forward direction only, even > if they were actually mapped to the backward strand. > > Here's a small (nonfunctional since I assume I cannot append data to > this e-mail) example of how I call the pileup function: > > #read in the MAQ alignment file: > aln_s_7_MCIP4<-readAligned("/home","sequenceALN.txt", type="MAQMapview") > aln_Pat_subset<-aln_Pat[1:10] > pos_Patient_subset<-position(aln_Pat_subset) > str_Patient_subset<-strand(aln_Pat_subset) > #use only a small subset (all positions are on the same chromosome 1) > > subset_pileuptest<-pileup(pos_Patient_subset, 36, > max(pos_Patient,na.rm=TRUE)+37,dir=str_Patient_subset) > > Is there anything wrong with my dir argument? I checked several places > all over the genome for the complete alignment and my reads all seem to > be considered on the + strand, even when they are not. Can anybody help? > > This is my session info: > >> sessionInfo() > R version 2.9.2 (2009-08-24) > x86_64-unknown-linux-gnu > > locale: > LC_CTYPE=de_DE.UTF-8;LC_NUMERIC=C;LC_TIME=de_DE.UTF-8;LC_COLLATE=de_ DE.UTF-8;LC_MONETARY=C;LC_MESSAGES=de_DE.UTF-8;LC_PAPER=de_DE.UTF-8;LC _NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=de_DE.UTF-8;LC_IDEN TIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] ShortRead_1.2.1 lattice_0.17-25 BSgenome_1.12.3 Biostrings_2.12.9 > [5] IRanges_1.2.3 > > loaded via a namespace (and not attached): > [1] Biobase_2.4.1 grid_2.9.2 hwriter_1.1 > > Thanks, > Nora > >

ADD COMMENT • link 16.2 years ago Patrick Aboyoun ★ 1.6k

0

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 11 months ago

United States

Hi Nora -- Nora Rieber wrote: > Hi there, > > I'm using the pileup function in the ShortRead package and it seems I > call it with the wrong arguments, or that there is some kind of bug in > the function. All my reads are piled up in forward direction only, even > if they were actually mapped to the backward strand. > > Here's a small (nonfunctional since I assume I cannot append data to > this e-mail) example of how I call the pileup function: > > #read in the MAQ alignment file: > aln_s_7_MCIP4<-readAligned("/home","sequenceALN.txt", type="MAQMapview") > aln_Pat_subset<-aln_Pat[1:10] > pos_Patient_subset<-position(aln_Pat_subset) > str_Patient_subset<-strand(aln_Pat_subset) > #use only a small subset (all positions are on the same chromosome 1) > > subset_pileuptest<-pileup(pos_Patient_subset, 36, > max(pos_Patient,na.rm=TRUE)+37,dir=str_Patient_subset) > > Is there anything wrong with my dir argument? I checked several places > all over the genome for the complete alignment and my reads all seem to > be considered on the + strand, even when they are not. Can anybody help? To get something reproducible, I did library(ShortRead) example(readAligned) aln <- readAligned(ap, "s_2_export.txt", "SolexaExport") lane <- aln[chromosome(aln) == "chr5.fa"] mlane <- lane[strand(lane)=="-"] The first position is > min(position(mlane)) [1] 6774915 Here's a pileup like the one you provide, with a convoluted way of finding the first non-zero position in the pileup > p <- pileup(position(mlane), 50, max(position(mlane)) + 50, + strand("-")) > Rle(p) 'integer' Rle instance of length 140154400 with 25 runs Lengths: 6774914 50 39858191 50 5640193 50 5762358 50 2355597 50 ... Values : 0 1 0 1 0 1 0 1 0 1 ... which is to say that the first non-zero position occurs after 6774914 zeros, or at position 6774915. The read's position is interpreted to mean 'the position is the leftmost nucleotide of the aligned 50mer'. Here I add a 'readlength' argument > p <- pileup(position(mlane), 50, max(position(mlane)) + 50, + strand("-"), readlength=35) > Rle(p) 'integer' Rle instance of length 140154400 with 25 runs Lengths: 6774899 50 39858191 50 5640193 50 5762358 50 2355597 50 ... Values : 0 1 0 1 0 1 0 1 0 1 ... and the 'position' is interpreted in the same way -- the leftmost position of the aligned _read_. The fragment has been extended so the read is a total of 50 nt long, and the extension has been in the appropriate (from 5' to 3' on the minus strand, decrementing the number of zeros before the first pileup). I introduced the Rle above in preparation for suggesting use of 'coverage' (see ?"AlignedRead-class") as an alternative to pileup; it's a much more compact representation and ties in better with some of the down-stream tools. > coverage(mlane, start=1L, extend=15L)[[1]] 'integer' Rle instance of length 140154384 with 24 runs Lengths: 6774899 50 39858191 50 5640193 50 5762358 50 2355597 50 ... Values : 0 1 0 1 0 1 0 1 0 1 ... 'coverage' can also be tricky to get correct. Martin > > This is my session info: > >> sessionInfo() > R version 2.9.2 (2009-08-24) > x86_64-unknown-linux-gnu > > locale: > LC_CTYPE=de_DE.UTF-8;LC_NUMERIC=C;LC_TIME=de_DE.UTF-8;LC_COLLATE=de_ DE.UTF-8;LC_MONETARY=C;LC_MESSAGES=de_DE.UTF-8;LC_PAPER=de_DE.UTF-8;LC _NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=de_DE.UTF-8;LC_IDEN TIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] ShortRead_1.2.1 lattice_0.17-25 BSgenome_1.12.3 Biostrings_2.12.9 > [5] IRanges_1.2.3 > > loaded via a namespace (and not attached): > [1] Biobase_2.4.1 grid_2.9.2 hwriter_1.1 > > Thanks, > Nora > > > > -------------------------------------------------------------------- ---- > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793

ADD COMMENT • link 16.2 years ago Martin Morgan 25k

Login before adding your answer.