How do you use the readBismark command in BiSeq properly?
1
0
Entering edit mode
akhaira • 0
@akhaira-23607
Last seen 4.3 years ago

I am very new to R and coding in general, so forgive me if my language is not 100% correct. I have 2 control files and 2 patient files in .cov format that I wish to analyze for DMRs using BiSeq. I am confused about whether I am supposed to import all 4 files at once using readBismark, and create 1 BSraw file with all 4 datasets, or if I am supposed to import all 4 files separately. I have tried importing the files separately, but I am not sure if I am doing it correctly. I am trying to follow the vignette for BiSeq but the information it contains is super brief and doesn't really help.

This is what I have been trying so far:

setwd("~/Desktop/control_nucleated RBC_data")

rrbsc1 <- readBismark("RRBS_C1_S19_bismark_bt2.bismark.cov", colData = DataFrame(row.names = "control_1"))

If anyone could tell me if I'm on the right track, or if I should be importing everything together, I would greatly appreciate it. Thanks

BiSeq DMR • 1.6k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 21 minutes ago
United States

You are right about the vignette, and the help page for readBismark is about as unhelpful. And I also appreciate you re-posting rather than relying on a comment posted to a years-old post. Although you could have re-posted here what you posted there.

Anyway, in the other post you tried

rrbs <- readBismark("RRBS_C1_S19_bismark_bt2.bismark.cov", "RRBS_C2_S20_bismark_bt2.bismark.cov", colData = DataFrame(row.names = c("control_1", "control_2"), group = "control"))

And you got the predictable error that you had an unused argument. This is because there are only two arguments to readBismark and you have passed three! The first argument is 'files' and the second is 'colData'. You have used a combination of positional and named arguments (where a positional argument is inferred by its position in the function call, and a named argument is directly named. So you have essentially done this:

rrbs <- readBismark(files = "RRBS_C1_S19_bismark_bt2.bismark.cov", colData = DataFrame(row.names = c("control_1", "control_2"), group = "control"), <some unnamed argument that R has no idea about> =  "RRBS_C2_S20_bismark_bt2.bismark.cov")

What is not clear in the help page, and is super not helpful in the vignette is whether or not one can read in more than one file, and if so how one might do that. You could check the function body to see how the 'files' argument is processed, and that would make things clear, but that's not something that should be expected of an end user; hence the existence of help files and stuff.

Anyway, the 'files' argument is supposed to be a character vector of file names that you wish to read in. So the correct thing to do is either

rrbs <- readBismark(files = c("RRBS_C1_S19_bismark_bt2.bismark.cov", "RRBS_C2_S20_bismark_bt2.bismark.cov"), colData = DataFrame(row.names = c("control_1", "control_2"), group = "control"))

Or you can use positional arguments

rrbs <- readBismark(c("RRBS_C1_S19_bismark_bt2.bismark.cov", "RRBS_C2_S20_bismark_bt2.bismark.cov"), DataFrame(row.names = c("control_1", "control_2"), group = "control"))

Or you could mix'n'match, so long as you use a character vector for the first argument.

ADD COMMENT
0
Entering edit mode

Thank you for the reply! I think I am getting closer. Now, when I enter

rrbs <- readBismark(files = c("RRBS_C1_S19_bismark_bt2.bismark.cov", "RRBS_C2_S20_bismark_bt2.bismark.cov"), colData = DataFrame(row.names = c("control_1", "control_2"), group = "control"))

I get an error message that says

Error in DataFrame(row.names = c("control_1", "control_2"), group = "control") : invalid length of row names

Do you have any idea why this might be?

ADD REPLY
1
Entering edit mode

Yes. You are specifying a DataFrame with two rows, but only one rows worth of data.

> DataFrame(row.names = c("control_1", "control_2"), group = rep("control", 1))
Error in DataFrame(row.names = c("control_1", "control_2"), group = rep("control",  : 
  invalid length of row names

> DataFrame(row.names = c("control_1", "control_2"), group = rep("control", 2))
DataFrame with 2 rows and 1 column
                group
          <character>
control_1     control
control_2     control
ADD REPLY
0
Entering edit mode

Thanks for getting back to me. I tried this:

 rrbs <- readBismark(files = c("RRBS_C1_S19_bismark_bt2.bismark.cov", "RRBS_C2_S20_bismark_bt2.bismark.cov"), colData = DataFrame(row.names = c("control_1", "control_2"), group = rep("control", 2)))

and I get this output:

Processing sample control_1 ... 
Read 0 records
Processing sample control_2 ... 
Read 0 records
Building BSraw object.
Error in S4Vectors:::normarg_names(value, class(x), length(x)) : 
  attempt to set too many names (2) on IRanges object of length 0 code here

I tried it a second time by removing the files = and colData = and got the exact same error message. I think the software is trying to read the colData instead of the files, but I am not sure. How might I fix this?

ADD REPLY
0
Entering edit mode

So when you get a message that says

Processing sample control_1 ... 
Read 0 records

Why do you think it's doing something wrong rather than that there might not be any records in the file? Did you check "RRBSC1S19bismarkbt2.bismark.cov" to make sure that file actually has anything in it?

ADD REPLY
0
Entering edit mode

Put a different way, consider

> system("touch tmp.cov")
> scan("tmp.cov", "c")
Read 0 items

Where if I read an empty file, scan tells me it's empty.

ADD REPLY
0
Entering edit mode

Or more to the point, since readBismark uses scan with a list specifying the "what" argument:

> z <- scan("tmp.cov", list(0, "", 0))
Read 0 records

Which is exactly what you get, indicating that you have some empty files.

ADD REPLY
0
Entering edit mode

I have checked the files, they are not empty. Each file has approx. 677642 elements, so I am fairly certain they are not empty. Also, if I was attempting to read the "RRBSC1S19bismarkbt2.bismark.cov" and "RRBSC2S20bismarkbt2.bismark.cov" files, shouldn't those be the file names that appear next to processing? Processing sample "RRBS_C1_S19_bismark_bt2.bismark.cov" instead of Processing sample control_1?

ADD REPLY
0
Entering edit mode

No. You called those samples control_1 and control_2 in your colData object, so you have already said they are called 'control_1' and `control_2', not whatever the file name is.

Anyway, all readBismark does is call scan on the file, so you can try that yourself to see what happens:

z <- scan("RRBS_C1_S19_bismark_bt2.bismark.cov", list("", 0L, NULL, NULL, 0L, 0L), sep = "\t", comment.char = "#", skip = 0)
ADD REPLY
0
Entering edit mode

I got the readBismark command to work just now. The files had data, but for some reason it wasn't working. I deleted them, and unzipped the .gz files, and now it's working. Thank you so much for your help!

ADD REPLY

Login before adding your answer.

Traffic: 1040 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6