Search
Question: handling missing data with ucscTableQuery in rtracklayer package
0
gravatar for Ana Rodrigues
8.7 years ago by
Ana Rodrigues30 wrote:
Dear all, I am using the rtracklayer package to fetch phastCons scores for certain positions from the UCSC genome browser. For some of these positions, there will be no phastCons score available (ie no alignment), and the code below will crash, rather than spit out an error an move on to the next position. library("rtracklayer") get.phastCons.rtl <- function(pos, ref="ce6") { if(!exists("session")) { session <<- browserSession("UCSC") } pos <- unlist(strsplit(pos, "[:-]")) pos.track <- GenomicData(IRanges(start=as.numeric(pos[2]), end=as.numeric(pos[3])), strand = "+", chrom = pos[1], genome = ref) track(session, "pos") <- pos.track query <- ucscTableQuery(session, "multiz6way", GenomicRanges(as.numeric(pos[2]), as.numeric(pos[3]), pos[1])) tableName(query) <- "phastCons6way" return(track(query)$score) } positions <- c("chrX:16039269-16039273", "chrX:16039357-16039361", "chrX:16039609-16039613") # no phastCons score # this works pc.scores.works <- sapply(positions[1:2], get.phastCons.rtl) # this crashes with an error and no scores get reported pc.scores.fails <- sapply(positions, get.phastCons.rtl) Here is the error: Error in read.table(con, colClasses = bedClasses, as.is = TRUE) : no lines available in input I am wondering if you could give me some advice on making this function able to cope with missing phastCons data, so that I can apply it for any given position? Thank you in advance for any help you can provide, Ana > sessionInfo() R version 2.10.1 (2009-12-14) i386-apple-darwin9.8.0 locale: [1] en_US.US-ASCII/en_US.UTF-8/C/C/en_US.US-ASCII/en_US.US-ASCII attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rtracklayer_1.6.0 RCurl_1.3-1 bitops_1.0-4.1 loaded via a namespace (and not attached): [1] Biobase_2.6.1 Biostrings_2.14.11 BSgenome_1.14.2 IRanges_1.4.11 [5] XML_2.6-0 > traceback() 35: stop("no lines available in input") 34: read.table(con, colClasses = bedClasses, as.is = TRUE) 33: DataFrame(read.table(con, colClasses = bedClasses, as.is = TRUE)) 32: .local(con, variant, trackLine, genome, ...) 31: fun(con, ...) 30: fun(con, ...) 29: import(con, format, ...) 28: import(con, format, ...) 27: import(text = lines, format = "bed", variant = "bedGraph", genome = genome) 26: import(text = lines, format = "bed", variant = "bedGraph", genome = genome) 25: .local(con, genome, ...) 24: fun(con, ...) 23: fun(con, ...) 22: import(con, format, ...) 21: import(con, format, ...) 20: import(format = subformat, text = text, ...) 19: import(format = subformat, text = text, ...) 18: FUN(1L[[1L]], ...) 17: lapply(seq_along(trackLines), makeTrackSet) 16: lapply(seq_along(trackLines), makeTrackSet) 15: import.ucsc(con, "wig", TRUE, genome = genome) 14: import.ucsc(con, "wig", TRUE, genome = genome) 13: .local(con, genome, ...) 12: fun(con, ...) 11: fun(con, ...) 10: import(con, format, ...) 9: import(con, format, ...) 8: import(text = output, format = format) 7: import(text = output, format = format) 6: .local(object, ...) 5: track(query) 4: track(query) 3: FUN(c("chrX:16039269-16039273", "chrX:16039357-16039361", "chrX:16039609-16039613" )[[3L]], ...) 2: lapply(X, FUN, ...) 1: sapply(positions, get.phastCons.rtl) -- Ana P. C. Rodrigues, Ph.D. Salk Institute for Biological Studies T +1 858.453.4100 x1067 W bioinformatics.salk.edu/people/ana/
ADD COMMENTlink modified 8.7 years ago by Michael Lawrence10k • written 8.7 years ago by Ana Rodrigues30
0
gravatar for Michael Lawrence
8.7 years ago by
United States
Michael Lawrence10k wrote:
I just checked in a fix for this. Sorry about how you need to loop over each range at a time. I've been working on bigWig support, which would be a much more efficient solution for your problem. Michael On Tue, Feb 9, 2010 at 11:47 AM, Ana Rodrigues <arodrigues@salk.edu> wrote: > Dear all, > I am using the rtracklayer package to fetch phastCons scores for > certain positions from the UCSC genome browser. For some of these > positions, there will be no phastCons score available (ie no > alignment), and the code below will crash, rather than spit out an > error an move on to the next position. > > library("rtracklayer") > > get.phastCons.rtl <- function(pos, ref="ce6") { > if(!exists("session")) { > session <<- browserSession("UCSC") > } > > pos <- unlist(strsplit(pos, "[:-]")) > pos.track <- GenomicData(IRanges(start=as.numeric(pos[2]), > end=as.numeric(pos[3])), > strand = "+", chrom = pos[1], > genome = ref) > track(session, "pos") <- pos.track > query <- ucscTableQuery(session, "multiz6way", > GenomicRanges(as.numeric(pos[2]), > as.numeric(pos[3]), > pos[1])) > tableName(query) <- "phastCons6way" > return(track(query)$score) > } > > positions <- c("chrX:16039269-16039273", > "chrX:16039357-16039361", > "chrX:16039609-16039613") # no phastCons score > > # this works > pc.scores.works <- sapply(positions[1:2], get.phastCons.rtl) > > # this crashes with an error and no scores get reported > pc.scores.fails <- sapply(positions, get.phastCons.rtl) > > Here is the error: > Error in read.table(con, colClasses = bedClasses, as.is = TRUE) : > no lines available in input > > I am wondering if you could give me some advice on making this > function able to cope with missing phastCons data, so that I can apply > it for any given position? > > Thank you in advance for any help you can provide, > Ana > > > > > sessionInfo() > R version 2.10.1 (2009-12-14) > i386-apple-darwin9.8.0 > > locale: > [1] en_US.US-ASCII/en_US.UTF-8/C/C/en_US.US-ASCII/en_US.US-ASCII > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] rtracklayer_1.6.0 RCurl_1.3-1 bitops_1.0-4.1 > > loaded via a namespace (and not attached): > [1] Biobase_2.6.1 Biostrings_2.14.11 BSgenome_1.14.2 IRanges_1.4.11 > [5] XML_2.6-0 > > > > traceback() > 35: stop("no lines available in input") > 34: read.table(con, colClasses = bedClasses, as.is = TRUE) > 33: DataFrame(read.table(con, colClasses = bedClasses, as.is = TRUE)) > 32: .local(con, variant, trackLine, genome, ...) > 31: fun(con, ...) > 30: fun(con, ...) > 29: import(con, format, ...) > 28: import(con, format, ...) > 27: import(text = lines, format = "bed", variant = "bedGraph", genome = > genome) > 26: import(text = lines, format = "bed", variant = "bedGraph", genome = > genome) > 25: .local(con, genome, ...) > 24: fun(con, ...) > 23: fun(con, ...) > 22: import(con, format, ...) > 21: import(con, format, ...) > 20: import(format = subformat, text = text, ...) > 19: import(format = subformat, text = text, ...) > 18: FUN(1L[[1L]], ...) > 17: lapply(seq_along(trackLines), makeTrackSet) > 16: lapply(seq_along(trackLines), makeTrackSet) > 15: import.ucsc(con, "wig", TRUE, genome = genome) > 14: import.ucsc(con, "wig", TRUE, genome = genome) > 13: .local(con, genome, ...) > 12: fun(con, ...) > 11: fun(con, ...) > 10: import(con, format, ...) > 9: import(con, format, ...) > 8: import(text = output, format = format) > 7: import(text = output, format = format) > 6: .local(object, ...) > 5: track(query) > 4: track(query) > 3: FUN(c("chrX:16039269-16039273", "chrX:16039357-16039361", > "chrX:16039609-16039613" > )[[3L]], ...) > 2: lapply(X, FUN, ...) > 1: sapply(positions, get.phastCons.rtl) > > -- > Ana P. C. Rodrigues, Ph.D. > Salk Institute for Biological Studies > T +1 858.453.4100 x1067 > W bioinformatics.salk.edu/people/ana/ > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]
ADD COMMENTlink written 8.7 years ago by Michael Lawrence10k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 441 users visited in the last hour