Question: SVA package - ERROR: nvobj = sva(edata, mod, mod0,
gravatar for aina.jene
10 months ago by
ISGlobal - Barcelona
aina.jene10 wrote:


I am currently working on an RNA-seq ExpressionSet from TCGA and I want to proceed with a linear model analysis using voom, but first I need to remove the batch effect. In order to do so, I am using sva package.

My data is:

In this expression set, the dimensions are:

  • pheno: 164         3
  • edata: 20531    164

In pheno we have all samples in rows and each column represents: Call, Status and Id

  • Call: the individual can present “Loss of Chromosome Y”, “Normal” or “XYY”.
  • Status: stands for the origin of each sample “tumor” or “normal”.
  • Id: the id of each sample

Knowing that:

Full model is:

  • mod = model.matrix(~ Call + Status, data = pheno) [both, Call and Status are class factor]

Null model is just like the example in the vignette:

  • mod0 = model.matrix(~ 1, data = pheno)


Then, when I try to find the number of surrogate variables (ns <-, mod, method="leek"), it turns out that there are 160 variable. If I change the method to "be", then the number of surrogate variables decrease to 1.

Finally, when I try the next function: 

  • svobj <- sva(counts, mod, mod0,

I ALWAYS get the following error message:


" Number of significant surrogate variables is: 1

Iteration (out of 5): Error in density.default(x, adjust = adj): 'x' contains missing values "


I have already checked the data and there are no missing values at all, and I have also tried to remove those rows with values equal to 0 in order to reduce the dataset but it did not work either.

Could anybody help me to try and find a solution?


Thank you all beforehand,




ADD COMMENTlink modified 10 months ago by sina.nassiri50 • written 10 months ago by aina.jene10
gravatar for sina.nassiri
10 months ago by
sina.nassiri50 wrote:


  • You mentioned that you want to proceed with a linear model analysis using "voom" and also referred to your input matrix as "counts", so I'm assuming you have count data in hand. If that's the case, you would want to use svaseq() instead of  sva().
  • In regard to formulating the full model, I just want to add that you need to be careful that you don't leave any potentially important biological factor out of equation. For example, in your full model you're assuming that "Call" and "Status" have a linear additive effect on gene expression. I don't know enough about your data, but could there be interaction between "Call" and "Status"? Keep in mind that any effects that are not explicitly accounted for in the full model may be removed by sva/svaseq as unwanted effects.

mod1 = model.matrix(~ Call + Status, data = pheno)


mod2 = model.matrix(~ Call + Status + Call:Status, data = pheno)

  • In calculating number of surrogate variables, "be" is the recommended default method based on permutation. "leek" is an alternative asymptotic approach, which may perform better only when dealing with large samples.
  • Finally, regarding the error message, it's hard to tell without having access to your code and input file.
  • One last point since you mentioned your data is from TCGA, are you familiar with MBatch? MBatch provides visualization tools to assess batch effects in TCGA data. You can also obtain data already adjusted for batch effects using different methods including ComBat from the sva package. It might be worth exploring.

Hope these help!

ADD COMMENTlink modified 10 months ago • written 10 months ago by sina.nassiri50

Dear Sina Nassiri,

Thanks a lot for your help. I will take a look to all your points. It was very helpful!


ADD REPLYlink written 10 months ago by aina.jene10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 170 users visited in the last hour