Entering edit mode
Eva Benito Garagorri
▴
70
@eva-benito-garagorri-4263
Last seen 10.3 years ago
Dear list,
I am trying to generate a tag density plot from a ChIP-Seq experiment
for regions around the TSS of known transcripts. I suppose this is not
too complicated, but I am having difficulties approaching this issue.
I think maybe the most straightforward way is to use the GRanges
capabilities. I generated a GRanges object by downloading the mouse
refseq database from UCSC (see code below). This gives me a GRanges
object containing the start and end coordinate of each refseq. I could
cross this with my tag file with "findOverlaps", but that would give
me the overlap of my tags with the entire span of the transcript and I
would like to just keep (and make bins around) the region at either
side of the TSS. I couldn't find a way to generate a sliding window in
the refseq database to calculate the overlap between each bin and my
tag file.
If anyone could point me to some package/functionality/reading
material or maybe to a previous thread (I did search the mailing list
but maybe I missed something) or else help with the strategy, I would
be most grateful.
Thank you very much in advance.
Best regards,
Eva
library(GenomicRanges)
library(GenomicFeatures)
refseq = makeTranscriptDbFromUCSC('mm9', tablename='refGene')
refseqGR = transcripts(refseq)
head(refseqGR)
GRanges with 6 ranges and 2 elementMetadata values
seqnames ranges strand | tx_id tx_name
<rle> <iranges> <rle> | <integer> <character>
[1] chr1 [4797974, 4836816] + | 490 NM_008866
[2] chr1 [4847775, 4887990] + | 73 NM_011541
[3] chr1 [4847775, 4887990] + | 78 NM_001159750
[4] chr1 [4848409, 4887990] + | 75 NM_001159751
[5] chr1 [5073254, 5152630] + | 77 NM_133826
[6] chr1 [5578574, 5596214] + | 494 NM_001204371
seqlengths
chr1 chr2 chr3 chr4 chr5
chr6 ... chr7_random chr8_random chr9_random chrUn_random
chrX_random chrY_random
197195432 181748087 159599783 155630120 152537259
149517037 ... 362490 849593 449403 5900358
1785075 58682461
### Simulate a tag file as a sample of the above
simTags = sample(refseqGR, 1000)
#### It would now be possible to get the overlap between the two by
doing:
olaps = findOverlaps(refseqGR, simTags)
### But how do I divide the refseqGR into bins of equal size around
the start coordinate?
sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: i386-apple-darwin9.8.0/i386 (32-bit)
locale:
[1] es_ES.UTF-8/es_ES.UTF-8/C/C/es_ES.UTF-8/es_ES.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] GenomicFeatures_1.2.3 GenomicRanges_1.2.3 IRanges_1.8.9
loaded via a namespace (and not attached):
[1] Biobase_2.10.0 biomaRt_2.6.0 Biostrings_2.18.0
BSgenome_1.18.3 DBI_0.2-5 RCurl_1.4-3 RSQLite_0.9-3
rtracklayer_1.10.6
[9] tools_2.12.0 XML_3.2-0
----------
Eva Benito Garagorri
PhD program in Neurosciences
Institute for Neurosciences in Alicante
UMH-CSIC
San Juan de Alicante
03550
Spain
ebenito@umh.es
(34) 965 91 92 33
[[alternative HTML version deleted]]