Question

importing BED with non-integer values

0

Entering edit mode

Morris ▴ 10

@morris-13226

Last seen 11 months ago

Italy

Hello everyone, I have a problem importing BED files in R, here is the code I'm using

bcat<- import("B_catenin_filtered_Peak.bed", format = "BED")

here is the output

Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
  scan() expected 'an integer', got '3.27363'

The BED file looks like this

1       11211   11311   B_catenin_peak_1        32      .       3.27363 3.22608 0.52408 71
1       14937   15010   B_catenin_peak_2        17      .       2.18242 1.78085 0.00000 7
1       27112   27155   B_catenin_peak_3        20      .       2.62923 2.06618 0.04042 12
1       29097   29187   B_catenin_peak_4        36      .       3.60333 3.68581 0.74558 66
1       29327   29437   B_catenin_peak_5        58      .       4.74450 5.81076 2.00646 86
1       32834   32877   B_catenin_peak_6        41      .       3.91863 4.14593 1.07837 31
1       34997   35040   B_catenin_peak_7        22      .       2.79038 2.27548 0.04042 13
1       35547   35590   B_catenin_peak_8        21      .       2.72769 2.19294 0.04042 34
1       37762   37841   B_catenin_peak_9        17      .       2.34908 1.72152 0.00000 2
1       38619   38805   B_catenin_peak_10       47      .       3.15853 4.70046 1.29972 88

Do you guys have any suggestions?

thank you for your help

ChIPseeker ChIPSeqData • 1.6k views

ADD COMMENT • link updated 3.7 years ago by James W. MacDonald 66k • written 3.7 years ago by Morris ▴ 10

score 3 · Accepted Answer · 2020-11-11

3

Entering edit mode

James W. MacDonald 66k

@james-w-macdonald-5106

Last seen 6 hours ago

United States

That's pretty much not a bed file. The column that scan is complaining about (column 7) is supposed to be the thickStart position, and as a position it has to be integer. The same is true of column 8 (integer). And column 9 should be RGB values, comma separated.

So you won't be able to tell import that you have a bed file because it isn't one. That's not to say you can't import it, however, particularly if you already know what those other columns are intended to represent. If it's not a big file you can just read it in using read.table and then generate a GRanges object. Something like

tmp <- read.table("B_catenin_filtered_Peak.bed") 
## you may need other arguments, and you might use colClasses to speed things up
gr <- GRanges(tmp[,1], 
              IRanges(tmp[,2], tmp[,3]), 
              peak = tmp[,4], 
              score = tmp[,5], 
              strand = tmp[,6])

I don't know what those other columns represent, but you can include them similarly in the call to GRanges and they will be read in.

If the file is pretty big you can (as I noted) include a colClasses argument to read.table, or you can use fread from datatable or read_tsv from readr, both of which will tend to be faster. I usually wrap read_tsv in as.data.frame, because I cannot see a compelling reason to ever have a tibble floating around, but ymmv.

ADD COMMENT • link 3.7 years ago James W. MacDonald 66k

0

Entering edit mode

Hi James, thank you for the answer, as in your script, my genomic range is the second and third column, which I think are the only important for further analysis ... I'm getting another error tho :/

gr <- GRanges(tmp[,1], 
+               IRanges(tmp[,2], tmp[,3]), 
+               peak = tmp[,4], 
+               score = tmp[,5], 
+               strand = tmp[,6])

**Error in .local(x, ...) : invalid strand levels in 'x':** .

I solve by not importing the strand informations but I'm not sure if I can do it, what do you think ?

thank you

ADD REPLY • link 3.7 years ago Morris ▴ 10

2

Entering edit mode

Right. The strand has to be one of "-", "+", or "*"

So you could do something like

mapper <- setNames(c("-", "+", "*", "*"), c("-", "+", ".", "*"))

gr <- GRanges(tmp[,1], 
              IRanges(tmp[,2], tmp[,3]), 
              peak = tmp[,4], 
              score = tmp[,5], 
              strand = mapper[tmp[,6]])

Which will convert all the periods to asterisks, assuming that a period indicates no strand information. But that assumes there are any non-period values in the sixth column. If not, what you have done will accomplish the same thing.

ADD REPLY • link 3.7 years ago James W. MacDonald 66k