I am currently working on an RNA-seq ExpressionSet from TCGA and I want to proceed with a linear model analysis using voom, but first I need to remove the batch effect. In order to do so, I am using sva package.
My data is:
In this expression set, the dimensions are:
- pheno: 164 3
- edata: 20531 164
In pheno we have all samples in rows and each column represents: Call, Status and Id
- Call: the individual can present “Loss of Chromosome Y”, “Normal” or “XYY”.
- Status: stands for the origin of each sample “tumor” or “normal”.
- Id: the id of each sample
Full model is:
- mod = model.matrix(~ Call + Status, data = pheno) [both, Call and Status are class factor]
Null model is just like the example in the vignette:
- mod0 = model.matrix(~ 1, data = pheno)
Then, when I try to find the number of surrogate variables (ns <- num.sv(counts, mod, method="leek"), it turns out that there are 160 variable. If I change the method to "be", then the number of surrogate variables decrease to 1.
Finally, when I try the next function:
- svobj <- sva(counts, mod, mod0, n.sv=ns)
I ALWAYS get the following error message:
" Number of significant surrogate variables is: 1
Iteration (out of 5): Error in density.default(x, adjust = adj): 'x' contains missing values "
I have already checked the data and there are no missing values at all, and I have also tried to remove those rows with values equal to 0 in order to reduce the dataset but it did not work either.
Could anybody help me to try and find a solution?
Thank you all beforehand,