handling missing data with ucscTableQuery in rtracklayer package
1
0
Entering edit mode
@ana-rodrigues-2623
Last seen 9.6 years ago
Dear all, I am using the rtracklayer package to fetch phastCons scores for certain positions from the UCSC genome browser. For some of these positions, there will be no phastCons score available (ie no alignment), and the code below will crash, rather than spit out an error an move on to the next position. library("rtracklayer") get.phastCons.rtl <- function(pos, ref="ce6") { if(!exists("session")) { session <<- browserSession("UCSC") } pos <- unlist(strsplit(pos, "[:-]")) pos.track <- GenomicData(IRanges(start=as.numeric(pos[2]), end=as.numeric(pos[3])), strand = "+", chrom = pos[1], genome = ref) track(session, "pos") <- pos.track query <- ucscTableQuery(session, "multiz6way", GenomicRanges(as.numeric(pos[2]), as.numeric(pos[3]), pos[1])) tableName(query) <- "phastCons6way" return(track(query)$score) } positions <- c("chrX:16039269-16039273", "chrX:16039357-16039361", "chrX:16039609-16039613") # no phastCons score # this works pc.scores.works <- sapply(positions[1:2], get.phastCons.rtl) # this crashes with an error and no scores get reported pc.scores.fails <- sapply(positions, get.phastCons.rtl) Here is the error: Error in read.table(con, colClasses = bedClasses, as.is = TRUE) : no lines available in input I am wondering if you could give me some advice on making this function able to cope with missing phastCons data, so that I can apply it for any given position? Thank you in advance for any help you can provide, Ana > sessionInfo() R version 2.10.1 (2009-12-14) i386-apple-darwin9.8.0 locale: [1] en_US.US-ASCII/en_US.UTF-8/C/C/en_US.US-ASCII/en_US.US-ASCII attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rtracklayer_1.6.0 RCurl_1.3-1 bitops_1.0-4.1 loaded via a namespace (and not attached): [1] Biobase_2.6.1 Biostrings_2.14.11 BSgenome_1.14.2 IRanges_1.4.11 [5] XML_2.6-0 > traceback() 35: stop("no lines available in input") 34: read.table(con, colClasses = bedClasses, as.is = TRUE) 33: DataFrame(read.table(con, colClasses = bedClasses, as.is = TRUE)) 32: .local(con, variant, trackLine, genome, ...) 31: fun(con, ...) 30: fun(con, ...) 29: import(con, format, ...) 28: import(con, format, ...) 27: import(text = lines, format = "bed", variant = "bedGraph", genome = genome) 26: import(text = lines, format = "bed", variant = "bedGraph", genome = genome) 25: .local(con, genome, ...) 24: fun(con, ...) 23: fun(con, ...) 22: import(con, format, ...) 21: import(con, format, ...) 20: import(format = subformat, text = text, ...) 19: import(format = subformat, text = text, ...) 18: FUN(1L[[1L]], ...) 17: lapply(seq_along(trackLines), makeTrackSet) 16: lapply(seq_along(trackLines), makeTrackSet) 15: import.ucsc(con, "wig", TRUE, genome = genome) 14: import.ucsc(con, "wig", TRUE, genome = genome) 13: .local(con, genome, ...) 12: fun(con, ...) 11: fun(con, ...) 10: import(con, format, ...) 9: import(con, format, ...) 8: import(text = output, format = format) 7: import(text = output, format = format) 6: .local(object, ...) 5: track(query) 4: track(query) 3: FUN(c("chrX:16039269-16039273", "chrX:16039357-16039361", "chrX:16039609-16039613" )[[3L]], ...) 2: lapply(X, FUN, ...) 1: sapply(positions, get.phastCons.rtl) -- Ana P. C. Rodrigues, Ph.D. Salk Institute for Biological Studies T +1 858.453.4100 x1067 W bioinformatics.salk.edu/people/ana/
rtracklayer rtracklayer • 1.1k views
ADD COMMENT
0
Entering edit mode
@michael-lawrence-3846
Last seen 2.4 years ago
United States
I just checked in a fix for this. Sorry about how you need to loop over each range at a time. I've been working on bigWig support, which would be a much more efficient solution for your problem. Michael On Tue, Feb 9, 2010 at 11:47 AM, Ana Rodrigues <arodrigues@salk.edu> wrote: > Dear all, > I am using the rtracklayer package to fetch phastCons scores for > certain positions from the UCSC genome browser. For some of these > positions, there will be no phastCons score available (ie no > alignment), and the code below will crash, rather than spit out an > error an move on to the next position. > > library("rtracklayer") > > get.phastCons.rtl <- function(pos, ref="ce6") { > if(!exists("session")) { > session <<- browserSession("UCSC") > } > > pos <- unlist(strsplit(pos, "[:-]")) > pos.track <- GenomicData(IRanges(start=as.numeric(pos[2]), > end=as.numeric(pos[3])), > strand = "+", chrom = pos[1], > genome = ref) > track(session, "pos") <- pos.track > query <- ucscTableQuery(session, "multiz6way", > GenomicRanges(as.numeric(pos[2]), > as.numeric(pos[3]), > pos[1])) > tableName(query) <- "phastCons6way" > return(track(query)$score) > } > > positions <- c("chrX:16039269-16039273", > "chrX:16039357-16039361", > "chrX:16039609-16039613") # no phastCons score > > # this works > pc.scores.works <- sapply(positions[1:2], get.phastCons.rtl) > > # this crashes with an error and no scores get reported > pc.scores.fails <- sapply(positions, get.phastCons.rtl) > > Here is the error: > Error in read.table(con, colClasses = bedClasses, as.is = TRUE) : > no lines available in input > > I am wondering if you could give me some advice on making this > function able to cope with missing phastCons data, so that I can apply > it for any given position? > > Thank you in advance for any help you can provide, > Ana > > > > > sessionInfo() > R version 2.10.1 (2009-12-14) > i386-apple-darwin9.8.0 > > locale: > [1] en_US.US-ASCII/en_US.UTF-8/C/C/en_US.US-ASCII/en_US.US-ASCII > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] rtracklayer_1.6.0 RCurl_1.3-1 bitops_1.0-4.1 > > loaded via a namespace (and not attached): > [1] Biobase_2.6.1 Biostrings_2.14.11 BSgenome_1.14.2 IRanges_1.4.11 > [5] XML_2.6-0 > > > > traceback() > 35: stop("no lines available in input") > 34: read.table(con, colClasses = bedClasses, as.is = TRUE) > 33: DataFrame(read.table(con, colClasses = bedClasses, as.is = TRUE)) > 32: .local(con, variant, trackLine, genome, ...) > 31: fun(con, ...) > 30: fun(con, ...) > 29: import(con, format, ...) > 28: import(con, format, ...) > 27: import(text = lines, format = "bed", variant = "bedGraph", genome = > genome) > 26: import(text = lines, format = "bed", variant = "bedGraph", genome = > genome) > 25: .local(con, genome, ...) > 24: fun(con, ...) > 23: fun(con, ...) > 22: import(con, format, ...) > 21: import(con, format, ...) > 20: import(format = subformat, text = text, ...) > 19: import(format = subformat, text = text, ...) > 18: FUN(1L[[1L]], ...) > 17: lapply(seq_along(trackLines), makeTrackSet) > 16: lapply(seq_along(trackLines), makeTrackSet) > 15: import.ucsc(con, "wig", TRUE, genome = genome) > 14: import.ucsc(con, "wig", TRUE, genome = genome) > 13: .local(con, genome, ...) > 12: fun(con, ...) > 11: fun(con, ...) > 10: import(con, format, ...) > 9: import(con, format, ...) > 8: import(text = output, format = format) > 7: import(text = output, format = format) > 6: .local(object, ...) > 5: track(query) > 4: track(query) > 3: FUN(c("chrX:16039269-16039273", "chrX:16039357-16039361", > "chrX:16039609-16039613" > )[[3L]], ...) > 2: lapply(X, FUN, ...) > 1: sapply(positions, get.phastCons.rtl) > > -- > Ana P. C. Rodrigues, Ph.D. > Salk Institute for Biological Studies > T +1 858.453.4100 x1067 > W bioinformatics.salk.edu/people/ana/ > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENT

Login before adding your answer.

Traffic: 945 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6