Entering edit mode
Hi,
I am using countOverlaps function in GenomicFeatures package
to count how many sequencing reads are mapped to each genomic feature
(e.g. promoter region, gene region etc.). However I found that if a
read was mapped to more than one features, then the count of each
feature will be increased by 1, i.e. this read was counted more than
one times although it can only originate from one genomic location.
Below is a simple example to show this:
-------------------------
features <- GRanges(seqnames="chr1",
ranges=IRanges(c(100,500),c(1000,1500)) ,strand=c("+","+"))
features
GRanges with 2 ranges and 0 elementMetadata values
seqnames ranges strand |
<rle> <iranges> <rle> |
[1] chr1 [100, 1000] + |
[2] chr1 [500, 1500] + |
seqlengths
chr1
NA
reads <- GRanges(seqnames="chr1",ranges=IRanges(c(150,600),c(250,700))
,strand=c("+","+"))
GRanges with 2 ranges and 0 elementMetadata values
seqnames ranges strand |
<rle> <iranges> <rle> |
[1] chr1 [150, 250] + |
[2] chr1 [600, 700] + |
seqlengths
chr1
NA
countOverlaps(features,reads)
[1] 2 1
--------------------
I think It will be a better to have an argument in countOverlaps
function which specifies whether a single hit (eg. a random one) or
multiple hits should be returned for each query sequence.
Below is my session info:
sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] GenomicFeatures_1.2.3 GenomicRanges_1.2.3 IRanges_1.8.9
loaded via a namespace (and not attached):
[1] Biobase_2.10.0 biomaRt_2.5.1 Biostrings_2.18.2
BSgenome_1.18.3 DBI_0.2-5 RCurl_1.4-3
[7] RSQLite_0.9-2 rtracklayer_1.10.6 tools_2.12.0
XML_3.2-0
Cheers,
Wei
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:6}}