Hi there,
I'm using the pileup function in the ShortRead package and it seems I
call it with the wrong arguments, or that there is some kind of bug in
the function. All my reads are piled up in forward direction only,
even
if they were actually mapped to the backward strand.
Here's a small (nonfunctional since I assume I cannot append data to
this e-mail) example of how I call the pileup function:
#read in the MAQ alignment file:
aln_s_7_MCIP4<-readAligned("/home","sequenceALN.txt",
type="MAQMapview")
aln_Pat_subset<-aln_Pat[1:10]
pos_Patient_subset<-position(aln_Pat_subset)
str_Patient_subset<-strand(aln_Pat_subset)
#use only a small subset (all positions are on the same chromosome 1)
subset_pileuptest<-pileup(pos_Patient_subset, 36,
max(pos_Patient,na.rm=TRUE)+37,dir=str_Patient_subset)
Is there anything wrong with my dir argument? I checked several places
all over the genome for the complete alignment and my reads all seem
to
be considered on the + strand, even when they are not. Can anybody
help?
This is my session info:
> sessionInfo()
R version 2.9.2 (2009-08-24)
x86_64-unknown-linux-gnu
locale:
LC_CTYPE=de_DE.UTF-8;LC_NUMERIC=C;LC_TIME=de_DE.UTF-8;LC_COLLATE=de_DE
.UTF-8;LC_MONETARY=C;LC_MESSAGES=de_DE.UTF-8;LC_PAPER=de_DE.UTF-8;LC_N
AME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=de_DE.UTF-8;LC_IDENTI
FICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ShortRead_1.2.1 lattice_0.17-25 BSgenome_1.12.3
Biostrings_2.12.9
[5] IRanges_1.2.3
loaded via a namespace (and not attached):
[1] Biobase_2.4.1 grid_2.9.2 hwriter_1.1
Thanks,
Nora
Nora,
In cases like yours, it is useful to provide the error message along
with the code. In particular you reference objects aln_Pat and
pos_Patient that are not created in your code snippet. After I made
some modifications to your code, I was able to run it without issue:
> suppressMessages(library(ShortRead))
> sp <- SolexaPath(system.file("extdata", package="ShortRead"))
> filt <- compose(chromosomeFilter("chr[1-5].fa"), strandFilter("+"))
> aln_Pat <- readAligned(sp, "s_2_export.txt", filter=filt)
> aln_Pat_subset<-aln_Pat[1:10]
> pos_Patient_subset<-position(aln_Pat_subset)
> str_Patient_subset<-strand(aln_Pat_subset)
> subset_pileuptest<-pileup(pos_Patient_subset, 36,
> max(pos_Patient_subset,na.rm=TRUE)+37,dir=str_Patient_subset)
> head(subset_pileuptest)
[1] 0 0 0 0 0 0
> sessionInfo()
R version 2.9.2 (2009-08-24)
x86_64-unknown-linux-gnu
locale:
LC_CTYPE=en_US;LC_NUMERIC=C;LC_TIME=en_US;LC_COLLATE=en_US;LC_MONETARY
=C;LC_MESSAGES=en_US;LC_PAPER=en_US;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHON
E=C;LC_MEASUREMENT=en_US;LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ShortRead_1.2.1 lattice_0.17-26 BSgenome_1.12.4
Biostrings_2.12.10
[5] IRanges_1.2.3
loaded via a namespace (and not attached):
[1] Biobase_2.4.1 grid_2.9.2 hwriter_1.1 tools_2.9.2
Quoting Nora Rieber <n.rieber at="" dkfz-heidelberg.de="">:
> Hi there,
>
> I'm using the pileup function in the ShortRead package and it seems
I
> call it with the wrong arguments, or that there is some kind of bug
in
> the function. All my reads are piled up in forward direction only,
even
> if they were actually mapped to the backward strand.
>
> Here's a small (nonfunctional since I assume I cannot append data to
> this e-mail) example of how I call the pileup function:
>
> #read in the MAQ alignment file:
> aln_s_7_MCIP4<-readAligned("/home","sequenceALN.txt",
type="MAQMapview")
> aln_Pat_subset<-aln_Pat[1:10]
> pos_Patient_subset<-position(aln_Pat_subset)
> str_Patient_subset<-strand(aln_Pat_subset)
> #use only a small subset (all positions are on the same chromosome
1)
>
> subset_pileuptest<-pileup(pos_Patient_subset, 36,
> max(pos_Patient,na.rm=TRUE)+37,dir=str_Patient_subset)
>
> Is there anything wrong with my dir argument? I checked several
places
> all over the genome for the complete alignment and my reads all seem
to
> be considered on the + strand, even when they are not. Can anybody
help?
>
> This is my session info:
>
>> sessionInfo()
> R version 2.9.2 (2009-08-24)
> x86_64-unknown-linux-gnu
>
> locale:
> LC_CTYPE=de_DE.UTF-8;LC_NUMERIC=C;LC_TIME=de_DE.UTF-8;LC_COLLATE=de_
DE.UTF-8;LC_MONETARY=C;LC_MESSAGES=de_DE.UTF-8;LC_PAPER=de_DE.UTF-8;LC
_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=de_DE.UTF-8;LC_IDEN
TIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] ShortRead_1.2.1 lattice_0.17-25 BSgenome_1.12.3
Biostrings_2.12.9
> [5] IRanges_1.2.3
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.4.1 grid_2.9.2 hwriter_1.1
>
> Thanks,
> Nora
>
>
Hi Nora --
Nora Rieber wrote:
> Hi there,
>
> I'm using the pileup function in the ShortRead package and it seems
I
> call it with the wrong arguments, or that there is some kind of bug
in
> the function. All my reads are piled up in forward direction only,
even
> if they were actually mapped to the backward strand.
>
> Here's a small (nonfunctional since I assume I cannot append data to
> this e-mail) example of how I call the pileup function:
>
> #read in the MAQ alignment file:
> aln_s_7_MCIP4<-readAligned("/home","sequenceALN.txt",
type="MAQMapview")
> aln_Pat_subset<-aln_Pat[1:10]
> pos_Patient_subset<-position(aln_Pat_subset)
> str_Patient_subset<-strand(aln_Pat_subset)
> #use only a small subset (all positions are on the same chromosome
1)
>
> subset_pileuptest<-pileup(pos_Patient_subset, 36,
> max(pos_Patient,na.rm=TRUE)+37,dir=str_Patient_subset)
>
> Is there anything wrong with my dir argument? I checked several
places
> all over the genome for the complete alignment and my reads all seem
to
> be considered on the + strand, even when they are not. Can anybody
help?
To get something reproducible, I did
library(ShortRead)
example(readAligned)
aln <- readAligned(ap, "s_2_export.txt", "SolexaExport")
lane <- aln[chromosome(aln) == "chr5.fa"]
mlane <- lane[strand(lane)=="-"]
The first position is
> min(position(mlane))
[1] 6774915
Here's a pileup like the one you provide, with a convoluted way of
finding the first non-zero position in the pileup
> p <- pileup(position(mlane), 50, max(position(mlane)) + 50,
+ strand("-"))
> Rle(p)
'integer' Rle instance of length 140154400 with 25 runs
Lengths: 6774914 50 39858191 50 5640193 50 5762358 50 2355597 50
...
Values : 0 1 0 1 0 1 0 1 0 1 ...
which is to say that the first non-zero position occurs after 6774914
zeros, or at position 6774915. The read's position is interpreted to
mean 'the position is the leftmost nucleotide of the aligned 50mer'.
Here I add a 'readlength' argument
> p <- pileup(position(mlane), 50, max(position(mlane)) + 50,
+ strand("-"), readlength=35)
> Rle(p)
'integer' Rle instance of length 140154400 with 25 runs
Lengths: 6774899 50 39858191 50 5640193 50 5762358 50 2355597 50
...
Values : 0 1 0 1 0 1 0 1 0 1 ...
and the 'position' is interpreted in the same way -- the leftmost
position of the aligned _read_. The fragment has been extended so the
read is a total of 50 nt long, and the extension has been in the
appropriate (from 5' to 3' on the minus strand, decrementing the
number
of zeros before the first pileup).
I introduced the Rle above in preparation for suggesting use of
'coverage' (see ?"AlignedRead-class") as an alternative to pileup;
it's
a much more compact representation and ties in better with some of the
down-stream tools.
> coverage(mlane, start=1L, extend=15L)[[1]]
'integer' Rle instance of length 140154384 with 24 runs
Lengths: 6774899 50 39858191 50 5640193 50 5762358 50 2355597 50
...
Values : 0 1 0 1 0 1 0 1 0 1 ...
'coverage' can also be tricky to get correct.
Martin
>
> This is my session info:
>
>> sessionInfo()
> R version 2.9.2 (2009-08-24)
> x86_64-unknown-linux-gnu
>
> locale:
> LC_CTYPE=de_DE.UTF-8;LC_NUMERIC=C;LC_TIME=de_DE.UTF-8;LC_COLLATE=de_
DE.UTF-8;LC_MONETARY=C;LC_MESSAGES=de_DE.UTF-8;LC_PAPER=de_DE.UTF-8;LC
_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=de_DE.UTF-8;LC_IDEN
TIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] ShortRead_1.2.1 lattice_0.17-25 BSgenome_1.12.3
Biostrings_2.12.9
> [5] IRanges_1.2.3
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.4.1 grid_2.9.2 hwriter_1.1
>
> Thanks,
> Nora
>
>
>
> --------------------------------------------------------------------
----
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793