question regarding the behavior of function processReads in package nucleR
0
0
Entering edit mode
@ulrike-goebel-6255
Last seen 9.6 years ago
Dear list, I have a question regarding the behavior of function processReads in package nucleR. Assume a RangedData object 'tmp': >head(tmp,n=4) RangedData with 6 rows and 1 value column across 1 space      space    ranges |      strand   <factor> <iranges> | <character> 1     Chr1  [ 4, 58] |           - 2     Chr1  [ 7, 61] |           - 3     Chr1  [ 9, 63] |           - 4     Chr1  [10, 55] |           + of single-end read coordinates, then >rd_150_40 <- processReads(tmp,type="single",fragmentLen=150,trim=40) ; yields >head(rd_150_40[1],n=4)      space     ranges |   <factor>  <iranges> | 1     Chr1 [-51, -12] | 2     Chr1 [-48,  -9] | 3     Chr1 [-46,  -7] | 4     Chr1 [ 65, 104] | The processed coordinates of the (+) read conform to the protocol "extend the read in 5'->3' direction to a length of 150 bp, then extract the window from position 55 to 95 of the extended read". I understand that this is the expected behavior (trim to the 40bp middle window of the read after extension). Obviously, what the function does is to shift the start of a read by 55bp (to the left in case of a (-) read, and to the right in case of a (+) read),  and then extract the 40 bp window starting at the start coordinate of the shifted read: > head(start(tmp)-start(rd_150_40)) [1]  55  55  55 -55  55  55 > unique(start(tmp)-start(rd_150_40)) [1]  55 -55 For (+) reads, this selects the 40bp middle window as described above. For (-) reads, however, I think that rather the 40 bp window *ending at the end of the shifted read* should be extracted ? Otherwise, the location of the window is displaced towards the 3' end of the (oriented) extended read by an amount depending on the original read length (the length before extension), rather than being at a distance of 55bp from position 1 (which is the *last* position of a (-) read). I was just wondering whether this behaviour of processReads is intended. Sorry if I missed something obvious ! With best regards Ulrike Goebel > sessionInfo() R version 2.15.2 (2012-10-26) Platform: x86_64-unknown-linux-gnu (64-bit) locale:  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8  [7] LC_PAPER=C                 LC_NAME=C  [9] LC_ADDRESS=C               LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel  stats     graphics  grDevices utils     datasets methods [8] base other attached packages:  [1] nucleR_1.6.0         ShortRead_1.16.4     latticeExtra_0.6-26  [4] RColorBrewer_1.0-5   Rsamtools_1.10.2     lattice_0.20-10  [7] Biostrings_2.26.3    GenomicRanges_1.10.7 Biobase_2.18.0 [10] IRanges_1.16.6       BiocGenerics_0.4.0 loaded via a namespace (and not attached): [1] bitops_1.0-6   grid_2.15.2    hwriter_1.3    stats4_2.15.2 tools_2.15.2 [6] zlibbioc_1.4.0 [[alternative HTML version deleted]]
nucleR nucleR • 963 views
ADD COMMENT

Login before adding your answer.

Traffic: 815 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6