Using rtracklayer for ranges of Genomic DNA
1
0
Entering edit mode
@sheena-scroggins-4305
Last seen 9.7 years ago
Hello, I'm trying to find a way to use rtracklayer to give me an output of the conservation scores for a genomic region. For example: I want to be able to define a range of data like the RangedData does: > strand <- c("+", "+", "+") > start <- c(17563082, 21920042, 61191807) > end <- c(17605866, 21956374, 61290544) > rdTable <- RangedData(ranges = IRanges(start = start, end = end), strand = strand, space = chr) > rdTable RangedData with 3 rows and 1 value column across 2 spaces space ranges | strand <character> <iranges> | <character> 1 chr1 [17563082, 17605866] | + 2 chr1 [61191807, 61290544] | + 3 chr2 [21920042, 21956374] | + But then combine it with rtracklayer so that I can find the conservation scores of the data ranges: > s2 = browserSession("UCSC") > ct = track(s2, "cons46way") > ct UCSC track 'Primate Cons' UCSCData with 9974 rows and 1 value column across 1 space space ranges | score <character> <iranges> | <numeric> 1 chr21 [33031597, 33031597] | 0.560087 2 chr21 [33031598, 33031598] | 0.560087 3 chr21 [33031599, 33031599] | 0.435717 4 chr21 [33031600, 33031600] | 0.435717 5 chr21 [33031601, 33031601] | 0.560087 6 chr21 [33031602, 33031602] | 0.560087 7 chr21 [33031603, 33031603] | 0.435717 8 chr21 [33031604, 33031604] | 0.435717 9 chr21 [33031605, 33031605] | 0.560087 ... ... ... ... ... 9966 chr21 [33041562, 33041562] | -0.266457 9967 chr21 [33041563, 33041563] | 0.433850 9968 chr21 [33041564, 33041564] | 0.507567 9969 chr21 [33041565, 33041565] | 0.655000 9970 chr21 [33041566, 33041566] | -0.155882 9971 chr21 [33041567, 33041567] | -0.340173 9972 chr21 [33041568, 33041568] | 0.507567 9973 chr21 [33041569, 33041569] | -1.740790 9974 chr21 [33041570, 33041570] | -1.925080 > So that the output will be the different regions in my RangedData with their respective conservation scores. It would be REALLY great if you could help me also find a way to put a threshold on the conservation scores, or a way to filter them afterwards. Thanks! Sheena [[alternative HTML version deleted]]
rtracklayer rtracklayer • 977 views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 2.4 years ago
United States
On Tue, Oct 26, 2010 at 5:02 PM, Sheena Scroggins < sheena.scroggins@gmail.com> wrote: > Hello, > > I'm trying to find a way to use rtracklayer to give me an output of the > conservation scores for a genomic region. For example: I want to be able to > define a range of data like the RangedData does: > > > strand <- c("+", "+", "+") > > start <- c(17563082, 21920042, 61191807) > > end <- c(17605866, 21956374, 61290544) > > rdTable <- RangedData(ranges = IRanges(start = start, end = end), strand > = > strand, space = chr) > > rdTable > RangedData with 3 rows and 1 value column across 2 spaces > space ranges | strand > <character> <iranges> | <character> > 1 chr1 [17563082, 17605866] | + > 2 chr1 [61191807, 61290544] | + > 3 chr2 [21920042, 21956374] | + > > But then combine it with rtracklayer so that I can find the conservation > scores of the data ranges: > > s2 = browserSession("UCSC") > > ct = track(s2, "cons46way") > > ct > UCSC track 'Primate Cons' > UCSCData with 9974 rows and 1 value column across 1 space > space ranges | score > <character> <iranges> | <numeric> > 1 chr21 [33031597, 33031597] | 0.560087 > 2 chr21 [33031598, 33031598] | 0.560087 > 3 chr21 [33031599, 33031599] | 0.435717 > 4 chr21 [33031600, 33031600] | 0.435717 > 5 chr21 [33031601, 33031601] | 0.560087 > 6 chr21 [33031602, 33031602] | 0.560087 > 7 chr21 [33031603, 33031603] | 0.435717 > 8 chr21 [33031604, 33031604] | 0.435717 > 9 chr21 [33031605, 33031605] | 0.560087 > ... ... ... ... ... > 9966 chr21 [33041562, 33041562] | -0.266457 > 9967 chr21 [33041563, 33041563] | 0.433850 > 9968 chr21 [33041564, 33041564] | 0.507567 > 9969 chr21 [33041565, 33041565] | 0.655000 > 9970 chr21 [33041566, 33041566] | -0.155882 > 9971 chr21 [33041567, 33041567] | -0.340173 > 9972 chr21 [33041568, 33041568] | 0.507567 > 9973 chr21 [33041569, 33041569] | -1.740790 > 9974 chr21 [33041570, 33041570] | -1.925080 > > > > So that the output will be the different regions in my RangedData with > their > respective conservation scores. Currently, rtracklayer uses the UCSC Table Browser via the web, which is generally too limited and inefficient for genome-wide queries. It is feasible to obtain the conservation scores for a small number of queries: query <- UCSCTableQuery(s2, "cons46way") range(query) <- ranges(rdTable[1,]) cons <- track(query) And repeat the last two lines for each range. In the long run, the best method is to download the conservation scores track from their FTP site. You could just mget the WIG file for each chromosome: http://hgdownload.cse.ucsc.edu/goldenPath/hg19/phastCons46way/vertebra te/. Then use the UCSC bigwig utilities to convert the merged data into a bigwig file. This is essentially an indexed database of the conservation scores, and you can query it with rtracklayer using code like: cons <- import("cons.bw", selection = ranges(rdTable)) That will give you a RangedData with a column containing the conservation scores for those regions in rdTable. > It would be REALLY great if you could help > me also find a way to put a threshold on the conservation scores, or a way > to filter them afterwards. > > Filtering afterwards is straight-forward, e.g.: filteredCons <- subset(cons, score > 2) Michael Thanks! > Sheena > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 496 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6