sva

Question

What does this error mean? - issue implementing SVA

3

Entering edit mode

kmuench ▴ 40

@kmuench-9243

Last seen 6.2 years ago

United States

Hello,

I'm attempting to implement sva on my RNA-Seq count data matrix. I have a table, myFactors, which describes various potential sources of known (e.g., myFactors$Condition - control, condition1, condition2) and suspected (e.g., myFactors$Date - date of sample preparation) variation.

I've followed along with the sva tutorial, but I'm getting an error I cannot interpret. Does anyone know what it means when it says 'x' contains missing values, or NaNs produced? Any guesses what I'm doing wrong here?

Thanks for your help. It seems like a number of people have asked this question previously, but I have not found any posted answers.

Code:

> myFactors <- read.csv('Location/On/Computer') # myFactors is 17 obs x 15 variable data.frame

> data <- as.matrix(RNASeqCountDataVariable) # data is matrix (63,677 genes in rows, 17 samps in columns)

# remove rows from data where 0 expression across all genes

> row_sub = apply(data,1,function(x) !all(x==0) )

> data_noZeros <- data[row_sub,] # data_noZeros is matrix, 33,704 x 17

> myModel <- model.matrix(~as.factor(Condition), data=myFactors)
> nullModel <- model.matrix(~1, data=myFactors)

> numLatentFactors <- num.sv(data_noZeros, myModel, method="leek") 

> sva_obj <- sva(data_noZeros, myModel, nullModel, n.sv=numLatentFactors)

Output:

Number of significant surrogate variables is: 14

Iteration (out of 5 ): Error in density.default(x, adjust = adj) : 'x' contains missing values In addition: Warning message: In pf(fstats, df1 = (df1 - df0), df2 = (n - df1)) : NaNs produced

Output of sessionInfo():

R version 3.2.2 (2015-08-14) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X 10.11.2 (El Capitan)

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] grid parallel stats4 stats graphics grDevices utils datasets methods [10] base

other attached packages: [1] bladderbatch_1.8.0 limma_3.26.5 pamr_1.55 [4] survival_2.38-3 cluster_2.0.3 sva_3.18.0 [7] genefilter_1.52.0 mgcv_1.8-11 nlme_3.1-124 [10] VennDiagram_1.6.16 futile.logger_1.4.1 RColorBrewer_1.1-2 [13] gplots_2.17.0 biomaRt_2.26.1 DESeq2_1.10.1 [16] RcppArmadillo_0.6.400.2.2 Rcpp_0.12.3 SummarizedExperiment_1.0.2 [19] Biobase_2.30.0 GenomicRanges_1.22.3 GenomeInfoDb_1.6.3 [22] IRanges_2.4.6 S4Vectors_0.8.7 BiocGenerics_0.16.1

loaded via a namespace (and not attached): [1] gtools_3.5.0 locfit_1.5-9.1 splines_3.2.2 lattice_0.20-33 [5] colorspace_1.2-6 XML_3.98-1.3 foreign_0.8-66 DBI_0.3.1 [9] BiocParallel_1.4.3 lambda.r_1.1.7 plyr_1.8.3 zlibbioc_1.16.0 [13] munsell_0.4.2 gtable_0.1.2 caTools_1.17.1 latticeExtra_0.6-26 [17] geneplotter_1.48.0 AnnotationDbi_1.32.3 acepack_1.3-3.3 KernSmooth_2.23-15 [21] xtable_1.8-0 scales_0.3.0 gdata_2.17.0 Hmisc_3.17-1 [25] annotate_1.48.0 XVector_0.10.0 gridExtra_2.0.0 ggplot2_2.0.0 [29] tools_3.2.2 bitops_1.0-6 RCurl_1.95-4.7 RSQLite_1.0.0 [33] Formula_1.2-1 futile.options_1.0.0 Matrix_1.2-3 rpart_4.1-10 [37] nnet_7.3-11

**EDIT 1** I get the same error when I remove all rows with low variance instead of rows that are all 0s, using the following:

getVar <- apply(data,1,var)
param <- 1
data_removeNoVariance <- data[getVar > param& !is.na(getVar),]

Though, granted, I'm not sure if there's a different variance threshold I should set to make it work with SVA...

**EDIT 2** I think I found the piece of code causing the bug here. When I put this code into a script, remove the function, and just declare dat = data_noZeros, mod = myModel, etc., the code works perfectly - i.e., when I do this:

>tst <- pf(fstats,df1=(df1-df0),df2=(n-df1))

>sum(is.nan(tst))

Output:

[1] 0

So there aren't any NaNs being made when I write this function like it's a script. Why would calling the function through sva result in an error...?

sva rna-seq • 6.0k views

ADD COMMENT • link updated 6.8 years ago by Sandip Darji ▴ 30 • written 8.9 years ago by kmuench ▴ 40

score 0 · Answer 1 · 2018-04-01

0

Entering edit mode

Sandip Darji ▴ 30

@sandip-darji-12513

Last seen 3.6 years ago

USA/New York/Nathan Kline Institute for…

Doesn't anyone have answer to this? It seems like a number of people including myself getting the exact same error but can't find the documented workaround.

sva

ADD COMMENT • link 6.8 years ago Sandip Darji ▴ 30

0

Entering edit mode

Can you provide a reproducible example, i.e., something I or someone else can evaluate easily in our own R session? For instance, if I run

library(sva)
example(sva)

I don't see this error. What do I have to do to recreate the problem?

ADD REPLY • link 6.8 years ago Martin Morgan 25k

0

Entering edit mode

Deer Micheal,

I appreciate your prompt response. Please accept my apologies as I was using untransformed data with sva function in R. As sson as I corrected this, it went smooth.

Thanks.

Sandip

ADD REPLY • link 6.8 years ago Sandip Darji ▴ 30

0

Entering edit mode

Could you expand on the meaning of untransformed data? What are the data transformations you performed? Thanks

ADD REPLY • link 6.8 years ago mrunal.dehankar • 0

0

Entering edit mode

I used voom transformation on the data from limma-voom package. You can equally use rld or vst transformations from DESeq2 or simply log transformation on counts.

ADD REPLY • link 6.8 years ago Sandip Darji ▴ 30

score 0 · Answer 2 · 2018-04-01

If I remember correctly, it means that one of your variables (genes/transcripts) is perfectly correlated with trait, or is constant, and its corresponding association p-value with the trait is either infinite or NaN/NA. Filter out any constant genes (usually genes with all zeros); you may have to filter out all genes that are constant in any combination of two levels of your factor.

score 0 · Answer 3 · 2018-04-03

0

Entering edit mode

Sandip Darji ▴ 30

@sandip-darji-12513

Last seen 3.6 years ago

USA/New York/Nathan Kline Institute for…

Deer Peter,

I appreciate your prompt response and I would like to apologize as I was using untransformed data with sva function in R. As sson as I corrected this, it went smooth.

Thank you for your input.

Sandip

ADD COMMENT • link 6.8 years ago Sandip Darji ▴ 30