Search
Question: Importing a BED file with floating point scores?
0
gravatar for Ryan C. Thompson
20 months ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson6.2k wrote:

I'm trying to import some of my BED files, and rtracklayer is choking because it expects an integer where my files have floating point values:

> x <- import(sprintf("data_files/ChIP-Seq/H3K4me3_peaks_IDR_filtered.bed"))
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
  scan() expected 'an integer', got '16.25851'
> traceback()
8: scan(file = file, what = what, sep = sep, quote = quote, dec = dec,
       nmax = nrows, skip = 0, na.strings = na.strings, quiet = TRUE,
       fill = fill, strip.white = strip.white, blank.lines.skip = blank.lines.skip,
       multi.line = FALSE, comment.char = comment.char, allowEscapes = allowEscapes,
       flush = flush, encoding = encoding, skipNul = skipNul)
7: read.table(con, colClasses = bedClasses, as.is = TRUE, na.strings = ".",
       comment.char = "")
6: DataFrame(read.table(con, colClasses = bedClasses, as.is = TRUE,
       na.strings = ".", comment.char = ""))
5: .local(con, format, text, ...)
4: import(FileForFormat(con), ...)
3: import(FileForFormat(con), ...)
2: import(sprintf("data_files/ChIP-Seq/H3K4me3_peaks_IDR_filtered.bed"))
1: import(sprintf("data_files/ChIP-Seq/H3K4me3_peaks_IDR_filtered.bed"))
>

Here's what the first few lines of that file look like:

$ head "data_files/ChIP-Seq/H3K4me3_peaks_IDR_filtered.bed"
chr1    27510   30426   H3K4me3_peak_5  800     *       16.25851        80.02037        -1      1703
chr1    540149  541537  H3K4me3_peak_22 145     *       4.97981 14.53762        -1      426
chr1    713117  714018  H3K4me3_peak_28 1277    *       25.02645        127.71224       -1      770
chr1    714191  716308  H3K4me3_peak_29 602     *       14.99855        60.24956        -1      427
chr1    760783  762907  H3K4me3_peak_41 711     *       14.27445        71.18283        -1      1761
chr1    763044  765173  H3K4me3_peak_42 198     *       4.9907  19.84909        -1      987
chr1    776489  778099  H3K4me3_peak_51 358     *       9.35653 35.85113        -1      981
chr1    778950  780404  H3K4me3_peak_53 233     *       7.24799 23.38787        -1      1133
chr1    892835  894617  H3K4me3_peak_72 410     *       9.88536 41.03654        -1      1206
chr1    894804  897069  H3K4me3_peak_73 224     *       8.19048 22.41068        -1      101

Is it possible that rtracklayer could be modified to accept floating point values for the relevant columns?

ADD COMMENTlink modified 20 months ago by Michael Lawrence9.8k • written 20 months ago by Ryan C. Thompson6.2k
2
gravatar for Michael Lawrence
20 months ago by
United States
Michael Lawrence9.8k wrote:

This looks like a narrowPeaks file, not a conventional BED file. For those, you need to use the extraCols argument. See ?import.bed.

ADD COMMENTlink written 20 months ago by Michael Lawrence9.8k

Yes, these are derived from narrowPeak MACS2 output files. Good to know that rtracklayer has support for them.

ADD REPLYlink written 20 months ago by Ryan C. Thompson6.2k

I'm trying out the extraCols argument, and it doesn't seem to be working, although it seems to encounter an error on a different element of the first row:

extraCols_narrowPeak <- c(signalValue = "numeric", pValue = "numeric",
                          qValue = "numeric", peak = "integer")
import.narrowPeak <- function(..., ) {
    import(..., format="BED", extraCols=extraCols_narrowPeak)
}

> x <- import.narrowPeak("data_files/ChIP-Seq/H3K4me3_peaks_IDR_filtered.bed")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  (from #2) :
  scan() expected 'an integer', got '80.02037'
ADD REPLYlink modified 20 months ago • written 20 months ago by Ryan C. Thompson6.2k

Where can I download that file?

ADD REPLYlink written 20 months ago by Michael Lawrence9.8k

I figured out the problem. My function signature above has an extra comma, which somehow resulted in this non sequitur error. After removing the comma, the above function works as expected.

ADD REPLYlink written 20 months ago by Ryan C. Thompson6.2k
0
gravatar for Ryan C. Thompson
20 months ago by
The Scripps Research Institute, La Jolla, CA
Ryan C. Thompson6.2k wrote:

Actually, looking at the format for bed files, it looks like only the first 6 columns of these files correspond to bed columns, and the rest are specific to the application. Unfortunately, it still calls them bed files. Columns 7 and 8 are supposed to be integer genomic positions ("thick start" and "thick end", respectively), but they are used for something else in this file.

ADD COMMENTlink written 20 months ago by Ryan C. Thompson6.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 310 users visited in the last hour