Chip-Seq tag density plot
0
0
Entering edit mode
@eva-benito-garagorri-4263
Last seen 9.5 years ago
Dear list, Hi again. Thank you once more for your input about the coverage plot. It has taken a bit but I finally got to try the idea Hervé proposed for calculating the coverage around the TSS. I know it's probably not very efficient since it uses a pretty long 'for' loop, but it works and it seems to do what it should. Any ideas to make the code faster or better are welcome. I am, however, a bit puzzled about the aspect of the final plot. I was expecting to have a certain accumulation of reads downstream of the TSS, but I seem to be getting the inverted tendency(http://img199.imageshack.us/i/profileplottest.png/) (Sorry about the ads in the page, just looked for a free hosting site). We ChIPed an acetylated histone mark. I may be missing something really obvious about genes on the + or the - strand, I did review the code once and again and I can't find the mistake So I am providing the code with an example and if anyone can spot a mistake in it, I'd be grateful if they could point it out. Thanks a lot for your help. Here's the code: library(GenomicRanges) library(GenomicFeatures) gr = GRanges(seqnames=Rle(c('chr1','chr2','chr3'),c(4,5,1)),ranges=IRa nges(1000:1009,end=1034:1043),strand=Rle(strand(c('+','-','+','-')),c( 3,1,2,4))) gr_extend = resize(gr, 200) ## Extend the reads to approx. fragment size. 'resize' seems to take into account the strand refseq = GRanges(seqnames=Rle(c('chr1','chr2','chr3'),c(2,6,2)),ranges =IRanges(1000:1009,end=2000:2009),strand=Rle(strand(c('+','-','+','-') ),c(1,2,4,3))) ##Fictious refseq object ############################################# ########## Downstream windows ############################################# downstreamRefseq = refseq ## First window 50bp downstream of the TSS idx = strand(downstreamRefseq) == '+' end(downstreamRefseq[idx]) = start(downstreamRefseq[idx])+49 start(downstreamRefseq[!idx]) = end(downstreamRefseq[!idx]) -49 downstreamWindow1 = countOverlaps(downstreamRefseq, gr_extend) ### Now generate remaining windows downstreamRefseqs = GRangesList() for (i in 1:99) { downstreamRefseqs[[i]] = shift(downstreamRefseq, as.integer(ifelse(strand(downstreamRefseq) =='+',i*50,- i*50))) } ## This loop doesn't take too long to run, generates all the windows downstream of the TSS downstreamOlaps = lapply(downstreamRefseqs, function(x) countOverlaps(x,gr_extend)) downstreamOlaps1 = lapply(downstreamOlaps, mean) ## Calculate the mean number of reads per window downstreamCoverage = c(mean(downstreamWindow1),unlist(downstreamOlaps1))/50 ## Incorporate window 1 and calculate the coverage per 50bp ############################################# ########## Upstream windows ############################################# upstreamRefseq = refseq end(upstreamRefseq[idx]) = start(upstreamRefseq[idx]) start(upstreamRefseq[idx]) = start(upstreamRefseq[idx])-49 start(upstreamRefseq[!idx]) = end(upstreamRefseq[!idx]) end(upstreamRefseq[!idx]) = end(upstreamRefseq[!idx])+49 upstreamWindow1 = countOverlaps(downstreamRefseq, gr_extend) upstreamRefseqs = GRangesList() for (i in 1:99) { upstreamRefseqs[[i]] = shift(upstreamRefseq, as.integer(ifelse(strand(upstreamRefseq) =='+',-i*50,+ i*50))) } upstreamOlaps = lapply(upstreamRefseqs, function(x) countOverlaps(x,gr_extend)) upstreamOlaps1 = lapply(upstreamOlaps, mean) upstreamCoverage = c(mean(upstreamWindow1),unlist(upstreamOlaps1))/50 ############################################# ########## Plotting ############################################# ## The plot looks horrible, it was just as an example... x = seq(from = 1, to = 5000,by=50) y = seq(from = -1, to = -5000,by=-50) plot(x,downstreamCoverage,pch=20,xlim=c(-5100,5100)) lines(x, downstreamCoverage) points(y,upstreamCoverage,pch=20) lines(y,upstreamCoverage) > sessionInfo() R version 2.12.0 (2010-10-15) Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) locale: [1] es_ES.UTF-8/es_ES.UTF-8/C/C/es_ES.UTF-8/es_ES.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] GenomicFeatures_1.2.3 GenomicRanges_1.2.3 IRanges_1.8.9 loaded via a namespace (and not attached): [1] Biobase_2.10.0 biomaRt_2.6.0 Biostrings_2.18.4 BSgenome_1.18.3 [5] DBI_0.2-5 RCurl_1.4-3 RSQLite_0.9-4 rtracklayer_1.10.6 [9] tools_2.12.0 XML_3.2-0 On 30/03/2011, at 22:21, Hervé Pagès wrote: > Hi Eva, > > One solution that involves a little bit of programming from your part > is to make those bins "by hand" by using things like > 'start(gr) <- value' and 'end(gr) <- value' to adjust the starts > and ends of your GRanges object. For example: > > refseqGR1 <- refseqGR > ## Adjust 'end(refseqGR1)' so that each range in it corresponds to > ## the region of length 1000 starting at 'start(refseqGR))' (let's > ## call this region the "1st bin"): > end(refseqGR1) <- start(refseqGR1) + 999 > olaps1 <- findOverlaps(refseqGR1, simTags) > > ## Then for the "2nd bin": > refseqGR2 <- shift(refseqGR1, 1000) > olaps2 <- findOverlaps(refseqGR2, simTags) > > ## etc... > > You might want to put this in a loop, decide how many loops you want > to do, or maybe define some criteria to exit the loop prematurely... > > An important thing to be aware of though is that 'start(refseqGR)' is > always the position of the left-most base of the transcript, that is, > it is its 5' end if it's on the + strand and its 3' end if it's on the > - strand. So you will probably need to adapt the code above to do > something like: > > ## 1st bin: > refseqGR1 <- refseqGR > idx <- strand(refseqGR) == "+" > end(refseqGR1[idx]) <- start(refseqGR1[idx]) + 999 > start(refseqGR1[!idx]) <- end(refseqGR1[!idx]) - 999 > > ## 2nd bin: > refseqGR2 <- shift(refseqGR1, as.vector(ifelse(strand(refseqGR1) == "+", 1000, -1000))) > > Cheers, > H. > > > On 03/30/2011 08:37 AM, Eva Benito Garagorri wrote: >> Dear list, >> >> I am trying to generate a tag density plot from a ChIP-Seq experiment for regions around the TSS of known transcripts. I suppose this is not too complicated, but I am having difficulties approaching this issue. I think maybe the most straightforward way is to use the GRanges capabilities. I generated a GRanges object by downloading the mouse refseq database from UCSC (see code below). This gives me a GRanges object containing the start and end coordinate of each refseq. I could cross this with my tag file with "findOverlaps", but that would give me the overlap of my tags with the entire span of the transcript and I would like to just keep (and make bins around) the region at either side of the TSS. I couldn't find a way to generate a sliding window in the refseq database to calculate the overlap between each bin and my tag file. >> >> If anyone could point me to some package/functionality/reading material or maybe to a previous thread (I did search the mailing list but maybe I missed something) or else help with the strategy, I would be most grateful. >> >> Thank you very much in advance. >> >> Best regards, >> >> Eva >> >> >> >> library(GenomicRanges) >> library(GenomicFeatures) >> >> refseq = makeTranscriptDbFromUCSC('mm9', tablename='refGene') >> >> refseqGR = transcripts(refseq) >> >> head(refseqGR) >> >> GRanges with 6 ranges and 2 elementMetadata values >> seqnames ranges strand | tx_id tx_name >> <rle> <iranges> <rle> |<integer> <character> >> [1] chr1 [4797974, 4836816] + | 490 NM_008866 >> [2] chr1 [4847775, 4887990] + | 73 NM_011541 >> [3] chr1 [4847775, 4887990] + | 78 NM_001159750 >> [4] chr1 [4848409, 4887990] + | 75 NM_001159751 >> [5] chr1 [5073254, 5152630] + | 77 NM_133826 >> [6] chr1 [5578574, 5596214] + | 494 NM_001204371 >> >> seqlengths >> chr1 chr2 chr3 chr4 chr5 chr6 ... chr7_random chr8_random chr9_random chrUn_random chrX_random chrY_random >> 197195432 181748087 159599783 155630120 152537259 149517037 ... 362490 849593 449403 5900358 1785075 58682461 >> >> ### Simulate a tag file as a sample of the above >> >> simTags = sample(refseqGR, 1000) >> >> >> #### It would now be possible to get the overlap between the two by doing: >> >> olaps = findOverlaps(refseqGR, simTags) >> >> ### But how do I divide the refseqGR into bins of equal size around the start coordinate? >> >> >> sessionInfo() >> >> >> R version 2.12.0 (2010-10-15) >> Platform: i386-apple-darwin9.8.0/i386 (32-bit) >> >> locale: >> [1] es_ES.UTF-8/es_ES.UTF-8/C/C/es_ES.UTF-8/es_ES.UTF-8 >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] GenomicFeatures_1.2.3 GenomicRanges_1.2.3 IRanges_1.8.9 >> >> loaded via a namespace (and not attached): >> [1] Biobase_2.10.0 biomaRt_2.6.0 Biostrings_2.18.0 BSgenome_1.18.3 DBI_0.2-5 RCurl_1.4-3 RSQLite_0.9-3 rtracklayer_1.10.6 >> [9] tools_2.12.0 XML_3.2-0 >> >> ---------- >> Eva Benito Garagorri >> PhD program in Neurosciences >> Institute for Neurosciences in Alicante >> UMH-CSIC >> San Juan de Alicante >> 03550 >> Spain >> ebenito@umh.es >> (34) 965 91 92 33 >> >> >> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor@r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M2-B876 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpages@fhcrc.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 ---------- Eva Benito Garagorri PhD program in Neurosciences Institute for Neurosciences in Alicante UMH-CSIC San Juan de Alicante 03550 Spain ebenito@umh.es (34) 965 91 92 33 [[alternative HTML version deleted]]
Coverage Cancer Coverage Cancer • 1.7k views
ADD COMMENT

Login before adding your answer.

Traffic: 738 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6