Question: import.bed fails on bed file with no scores
0
gravatar for jmr
3.4 years ago by
jmr0
jmr0 wrote:

This file: UCSD.H1.H2AK5ac.SAK201.bed.gz

looks like this:

chr1    9942    10141    SOLEXA2_1:1:101:4024:16163    -
chr1    9988    10187    SOLEXA2_1:1:10:12241:10803    -
chr1    9992    10191    SOLEXA2_1:1:93:18918:18953    -
chr1    9997    10196    SOLEXA2_1:1:30:11903:16499    -

It doesn't have a scores column.  When I try to load it with

import.bed("UCSD.H1.H2AK5ac.SAK201.bed.gz")

I get:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
  scan() expected 'a real', got '-'

Is there a way to instruct import.bed to deal with missing scores?  And would the same options work with files that have scores?  The problem is that other files from the same source (e.g. UCSD.H1_BMP4_Derived_Mesendoderm_Cultured_Cells.H2AK5ac.AK126.bed.gz) do have scores, and I'd like to process them with the same instruction.  I just need the Ranges info.  I expected the format to be the same for every file on that site.

I'm quite new to R and to Bioconductor, so forgive my ignorance.  (I did try reading the help documents and searching the web.)

João Rodrigues

 

Edited: Fixed link to first file.

rtracklayer bed files import • 885 views
ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by jmr0
Answer: import.bed fails on bed file with no scores
2
gravatar for Michael Lawrence
3.4 years ago by
United States
Michael Lawrence11k wrote:

This file strays pretty far from the standard by skipping a column, but I think you can at least get the range information by passing extraCols=c(strand="factor") to the import function. Effectively that is saying that the valid BED part stops at the name column, and that there is a strand column tacked onto the end. I'm not sure if the strand column will become the strand component on the GRanges, but it might.

ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by Michael Lawrence11k

Thanks Michael, that really works.  It reads both files with no scores, as well as files with scores!  I really don't understand this function, but this seems to solve my problem.

It does, however, produce a warning when reading either of files I mentioned:

Warning message:
In `[<-.factor`(`*tmp*`, is.na(strand), value = "*") :
  invalid factor level, NA generated

It seems that this is caused by the files not having any "*" in the strand column, but the output seems fine to me.  Probably a bug?

ADD REPLYlink written 3.4 years ago by jmr0

Yea, it could be smarter. But I think this is already fixed in devel, which will be released soon.

ADD REPLYlink written 3.4 years ago by Michael Lawrence11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 250 users visited in the last hour