Hello,

I am currently working on an RNA-seq ExpressionSet from TCGA and I want to proceed with a linear model analysis using voom, but first I need to remove the batch effect. In order to do so, I am using sva package.

My data is:

In this expression set, the dimensions are:

- pheno:
**164 3** - edata:
**20531 164**

In pheno we have all samples in rows and each column represents: Call, Status and Id

**Call**: the individual can present “Loss of Chromosome Y”, “Normal” or “XYY”.**Status**: stands for the origin of each sample “tumor” or “normal”.**Id**: the id of each sample

Knowing that:

Full model is:

**mod = model.matrix(~ Call + Status, data = pheno)**[both, Call and Status are class factor]

Null model is just like the example in the vignette:

**mod0 = model.matrix(~ 1, data = pheno)**

Then, when I try to find the number of surrogate variables (ns <- num.sv(counts, mod, method="leek"), it turns out that there are 160 variable. If I change the method to "**be**", then the number of surrogate variables decrease to 1.

Finally, when I try the next function:

**svobj <- sva(counts, mod, mod0, n.sv=ns)**

****I ALWAYS get the following error message:

" **Number of significant surrogate variables is: 1**

**Iteration (out of 5): Error in density.default(x, adjust = adj): 'x' contains missing values **"

I have already checked the data and there are no missing values at all, and I have also tried to remove those rows with values equal to 0 in order to reduce the dataset but it did not work either.

Could anybody help me to try and find a solution?

Thank you all beforehand,

Aina

Dear Sina Nassiri,

Thanks a lot for your help. I will take a look to all your points. It was very helpful!

Aina