Search
Question: Importing a BED file with floating point scores?
0
gravatar for Ryan C. Thompson
2.1 years ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson6.7k wrote:

I'm trying to import some of my BED files, and rtracklayer is choking because it expects an integer where my files have floating point values:

> x <- import(sprintf("data_files/ChIP-Seq/H3K4me3_peaks_IDR_filtered.bed"))
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
  scan() expected 'an integer', got '16.25851'
> traceback()
8: scan(file = file, what = what, sep = sep, quote = quote, dec = dec,
       nmax = nrows, skip = 0, na.strings = na.strings, quiet = TRUE,
       fill = fill, strip.white = strip.white, blank.lines.skip = blank.lines.skip,
       multi.line = FALSE, comment.char = comment.char, allowEscapes = allowEscapes,
       flush = flush, encoding = encoding, skipNul = skipNul)
7: read.table(con, colClasses = bedClasses, as.is = TRUE, na.strings = ".",
       comment.char = "")
6: DataFrame(read.table(con, colClasses = bedClasses, as.is = TRUE,
       na.strings = ".", comment.char = ""))
5: .local(con, format, text, ...)
4: import(FileForFormat(con), ...)
3: import(FileForFormat(con), ...)
2: import(sprintf("data_files/ChIP-Seq/H3K4me3_peaks_IDR_filtered.bed"))
1: import(sprintf("data_files/ChIP-Seq/H3K4me3_peaks_IDR_filtered.bed"))
>

Here's what the first few lines of that file look like:

$ head "data_files/ChIP-Seq/H3K4me3_peaks_IDR_filtered.bed"
chr1    27510   30426   H3K4me3_peak_5  800     *       16.25851        80.02037        -1      1703
chr1    540149  541537  H3K4me3_peak_22 145     *       4.97981 14.53762        -1      426
chr1    713117  714018  H3K4me3_peak_28 1277    *       25.02645        127.71224       -1      770
chr1    714191  716308  H3K4me3_peak_29 602     *       14.99855        60.24956        -1      427
chr1    760783  762907  H3K4me3_peak_41 711     *       14.27445        71.18283        -1      1761
chr1    763044  765173  H3K4me3_peak_42 198     *       4.9907  19.84909        -1      987
chr1    776489  778099  H3K4me3_peak_51 358     *       9.35653 35.85113        -1      981
chr1    778950  780404  H3K4me3_peak_53 233     *       7.24799 23.38787        -1      1133
chr1    892835  894617  H3K4me3_peak_72 410     *       9.88536 41.03654        -1      1206
chr1    894804  897069  H3K4me3_peak_73 224     *       8.19048 22.41068        -1      101

Is it possible that rtracklayer could be modified to accept floating point values for the relevant columns?

ADD COMMENTlink modified 2.1 years ago by Michael Lawrence10.0k • written 2.1 years ago by Ryan C. Thompson6.7k
2
gravatar for Michael Lawrence
2.1 years ago by
Michael Lawrence10.0k
United States
Michael Lawrence10.0k wrote:

This looks like a narrowPeaks file, not a conventional BED file. For those, you need to use the extraCols argument. See ?import.bed.

ADD COMMENTlink written 2.1 years ago by Michael Lawrence10.0k

Yes, these are derived from narrowPeak MACS2 output files. Good to know that rtracklayer has support for them.

ADD REPLYlink written 2.1 years ago by Ryan C. Thompson6.7k

I'm trying out the extraCols argument, and it doesn't seem to be working, although it seems to encounter an error on a different element of the first row:

extraCols_narrowPeak <- c(signalValue = "numeric", pValue = "numeric",
                          qValue = "numeric", peak = "integer")
import.narrowPeak <- function(..., ) {
    import(..., format="BED", extraCols=extraCols_narrowPeak)
}

> x <- import.narrowPeak("data_files/ChIP-Seq/H3K4me3_peaks_IDR_filtered.bed")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  (from #2) :
  scan() expected 'an integer', got '80.02037'
ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by Ryan C. Thompson6.7k

Where can I download that file?

ADD REPLYlink written 2.1 years ago by Michael Lawrence10.0k

I figured out the problem. My function signature above has an extra comma, which somehow resulted in this non sequitur error. After removing the comma, the above function works as expected.

ADD REPLYlink written 2.1 years ago by Ryan C. Thompson6.7k
0
gravatar for Ryan C. Thompson
2.1 years ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson6.7k wrote:

Actually, looking at the format for bed files, it looks like only the first 6 columns of these files correspond to bed columns, and the rest are specific to the application. Unfortunately, it still calls them bed files. Columns 7 and 8 are supposed to be integer genomic positions ("thick start" and "thick end", respectively), but they are used for something else in this file.

ADD COMMENTlink written 2.1 years ago by Ryan C. Thompson6.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 194 users visited in the last hour