Hi everyone,
I'm working on RNA-Seq datasets containing 22 samples from 3 batches. I used mm10 as the reference genome and generated the count table from the BAM files using GenomicAlignments package and then used the rpkm() function to get the rpkm data. Now I want to perform batch correction using either the count matrix or the rpkm data. How do I use the sva package for this purpose ? Is there any other way I can perform batch estimation and correction with the data I have ? Can i use removeBatchEffects() function or is it only for microarray data ?
Thanks
As Ryan suggested, you should read through the vignettes for sva and RUVSeq, which were written to address this type of general question that you are asking.
Once you have tried to apply these procedures to your data and run into specific problems, you are welcome to come back to the support site and ask a more specific question that describes your problem and includes a code snippet that outlines the steps you've taken to get to the point you might be asking about.
At that point, you'll likely receive useful help.
Thank you Dr.Lianoglu. I tried to implement sva but I am getting this error.
Error in dat %*% (Id - mod %*% solve(t(mod) %*% mod) %*% t(mod)) :
requires numeric/complex matrix/vector arguments.
this is my code
data <- read.csv("data.csv")
batch<-factor(c("b1","b1","b1","b1","b1","b1","b1","b1","b2","b2","b2","b2","b2","b2","b2","b2","b3","b3","b3","b3","b3","b3"))
type<-factor(c("ctrl","ctrl","ctrl","ctrl","mut","mut","mut","mut", "ctrl","ctrl","ctrl","ctrl","mut","mut","mut","mut", "ctrl","ctrl","mut", "mut","mut","mut"))
coldata<-data.frame(batch,type)
mod <- model.matrix(~type, colData=coldata)
mod0 <- model.matrix(~ 1, colData=coldata)
svseq <- svaseq(data, mod, mod0, n.sv=2)
You should post a new question on the support site for specific issues, but before you do that make sure that you are passing the correct data into the svaseq function.
In particular, the
dat
argument (yourdata
variable) needs to be a matrix of counts (integers). A call toread.csv
will return adata.frame
, so you should check what types of things are stored in the columns of it (sapply(data, class)
). These all need to be integer (or numeric), otherwise when you naively convert it (data
) to a matrix, it will be upcast to whatever the most general column type is (maybe you have a character column in there?)Also, I don't think your call to
model.matrix
is doing what you think it's doing, ie. I'm not sure what thecolData
argument is doing here (likely nothing), you just happen to get lucky and have the type variable defined in your global workspace. Please inspect those values (mod
andmod0
) to make sure they are what they should be, because the error you are getting suggest otherwise.How do you know you have 3 batches if you don't have any known batches? The sva documentation is fairly straightforward. What part of the documentation are you having difficulty with?