Question

Error in sva() function: missing values, NaNs produced

0

Entering edit mode

thkapell ▴ 10

@tkapell-14647

Last seen 4 months ago

Helmholtz Center Munich, Germany

Hi,

I want to clean my RNA-seq data with the SVA package, but the sva() function gives me error:

*Error in density.default(x, adjust = adj) : 'x' contains missing values
In addition: Warning message:
In pf(fstats, df1 = (df1 - df0), df2 = (n - df1)) : NaNs produced*

I attach the code I used since count table preparation and gene filtering:

*dds.txi <- DESeqDataSetFromTximport(txi = txi.kallisto, colData = sample_info, design = ~ Sequencing.lane+Disease)*

*sum(rowSums(counts(dds.txi))==0)
keep <- rowSums(counts(dds.txi)) >= 10
dds.txi <- dds.txi[keep,]*

*dds<-DESeq(dds.txi)    
dds
design<-design(dds)*

*rld<-rlog(dds, blind=T)                   
rld_df<-as.data.frame(assay(rld))*

*mod<-model.matrix(~ Age+Sequencing.lane+Disease, sample_info)
mod0<-model.matrix(~  Age+Sequencing.lane, sample_info)*

*dat<-counts(dds,normalized=T)
n.sv=num.sv(dat,mod,method='leek')
svseq <- sva(assay(rld), mod, mod0, n.sv = n.sv)*

... and my *SessionInfo()*:

*R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                            LC_TIME=English_United Kingdom.1252    
attached base packages:
 [1] parallel  stats4    grid      stats     graphics  grDevices utils     datasets  methods   base     
other attached packages:
 [1] org.Hs.eg.db_3.8.2           apeglm_1.6.0                 mixOmics_6.8.4               DOSE_3.10.2                 
 [5] pcaGoPromoter.Mm.mm9_1.20.0  pcaGoPromoter.Hs.hg19_1.20.0 pcaGoPromoter_1.28.0         Biostrings_2.52.0           
 [9] XVector_0.24.0               ellipse_0.4.1                ReactomePA_1.28.0            biomaRt_2.40.4              
[13] geneplotter_1.62.0           lattice_0.20-38              limma_3.40.6                 tximport_1.12.3             
[17] sva_3.32.1                   genefilter_1.66.0            mgcv_1.8-28                  nlme_3.1-140                
[21] GSEABase_1.46.0              graph_1.62.0                 annotate_1.62.0              XML_3.98-1.20               
[25] AnnotationDbi_1.46.1         clusterProfiler_3.12.0       IHW_1.12.0                   DESeq2_1.24.0               
[29] SummarizedExperiment_1.14.1  DelayedArray_0.10.0          BiocParallel_1.18.1          Biobase_2.44.0              
[33] GenomicRanges_1.36.1         GenomeInfoDb_1.20.0          IRanges_2.18.2               S4Vectors_0.22.1            
[37] BiocGenerics_0.30.0          rhdf5_2.28.0                 ggrepel_0.8.1                gtools_3.8.1                
[41] stringi_1.4.3                ggforce_0.3.1                MASS_7.3-51.4                rlang_0.4.0                 
[45] survival_2.44-1.1            gridBase_0.4-7               matrixStats_0.55.0           devtools_2.2.1              
[49] usethis_1.5.1                Rcpp_1.0.2                   RSQLite_2.1.2                data.table_1.12.4           
[53] bit_1.1-14                   magrittr_1.5                 scales_1.0.0                 ggbeeswarm_0.6.0            
[57] VennDiagram_1.6.20           futile.logger_1.4.3          colourlovers_0.3.5           pheatmap_1.0.12             
[61] reshape2_1.4.3               forcats_0.4.0                stringr_1.4.0                dplyr_0.8.3                 
[65] purrr_0.3.2                  readr_1.3.1                  tidyr_1.0.0                  tibble_2.1.3                
[69] ggplot2_3.2.1                tidyverse_1.2.1             
loaded via a namespace (and not attached):
  [1] rappdirs_0.3.1         coda_0.19-3            acepack_1.4.1          bit64_0.9-7            knitr_1.25            
  [6] rpart_4.1-15           RCurl_1.95-4.12        generics_0.0.2         callr_3.3.2            cowplot_1.0.0         
 [11] lambda.r_1.2.4         europepmc_0.3          enrichplot_1.4.0       xml2_1.2.2             lubridate_1.7.4       
 [16] assertthat_0.2.1       viridis_0.5.1          xfun_0.10              hms_0.5.1              progress_1.2.2        
 [21] readxl_1.3.1           igraph_1.2.4.1         DBI_1.0.0              htmlwidgets_1.3        rARPACK_0.11-0        
 [26] ellipsis_0.3.0         RSpectra_0.15-0        backports_1.1.5        vctrs_0.2.0            remotes_2.1.0         
 [31] withr_2.1.2            triebeard_0.3.0        checkmate_1.9.4        fdrtool_1.2.15         prettyunits_1.0.2     
 [36] cluster_2.1.0          lazyeval_0.2.2         crayon_1.3.4           pkgconfig_2.0.3        slam_0.1-45           
 [41] labeling_0.3           tweenr_1.0.1           vipor_0.4.5            pkgload_1.0.2          nnet_7.3-12           
 [46] lifecycle_0.1.0        modelr_0.1.5           cellranger_1.1.0       rprojroot_1.3-2        polyclip_1.10-0       
 [51] Matrix_1.2-17          urltools_1.7.3         lpsymphony_1.12.0      Rhdf5lib_1.6.1         base64enc_0.1-3       
 [56] beeswarm_0.2.3         ggridges_0.5.1         processx_3.4.1         png_0.1-7              viridisLite_0.3.0     
 [61] bitops_1.0-6           blob_1.2.0             qvalue_2.16.0          gridGraphics_0.4-1     reactome.db_1.68.0    
 [66] memoise_1.1.0          graphite_1.30.0        plyr_1.8.4             zlibbioc_1.30.0        compiler_3.6.1        
 [71] bbmle_1.0.20           RColorBrewer_1.1-2     cli_1.1.0              ps_1.3.0               htmlTable_1.13.2      
 [76] formatR_1.7            Formula_1.2-3          tidyselect_0.2.5       emdbook_1.3.11         yaml_2.2.0            
 [81] GOSemSim_2.10.0        locfit_1.5-9.1         latticeExtra_0.6-28    fastmatch_1.1-0        tools_3.6.1           
 [86] rstudioapi_0.10        foreign_0.8-71         gridExtra_2.3          farver_1.1.0           ggraph_2.0.0          
 [91] digest_0.6.21          rvcheck_0.1.5          BiocManager_1.30.4     broom_0.5.2            httr_1.4.1            
 [96] colorspace_1.4-1       rvest_0.3.4            fs_1.3.1               splines_3.6.1          graphlayouts_0.5.0    
[101] ggplotify_0.0.4        sessioninfo_1.1.1      xtable_1.8-4           jsonlite_1.6           futile.options_1.0.1  
[106] tidygraph_1.1.2        corpcor_1.6.9          UpSetR_1.4.0           zeallot_0.1.0          testthat_2.2.1        
[111] R6_2.4.0               Hmisc_4.2-0            pillar_1.4.2           htmltools_0.3.6        glue_1.3.1            
[116] fgsea_1.10.1           pkgbuild_1.0.5         numDeriv_2016.8-1.1    GO.db_3.8.2            desc_1.2.0            
[121] munsell_0.5.0          DO.db_2.9              GenomeInfoDbData_1.2.1 haven_2.1.1            gtable_0.3.0*

Any help would be greatly appreciated!

Theo

sva batch-correction • 2.2k views

ADD COMMENT • link updated 6.2 years ago by James W. MacDonald 68k • written 6.2 years ago by thkapell ▴ 10

score 0 · Answer 1 · 2019-11-05

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 56 minutes ago

United States

First off, note that sva is intended for continuous data that are sort of normal distributed, which is not true of count data. The svaseq function is intended for count data.

Second, when you type in your question, there is a preview pane right below where you are typing, that shows what your post will look like. Bracketing things with an asterisk makes things italicized, which doesn't help with code. Instead you want to either bracket with a triple back-tick (top left key on a QWERTY keyboard) or highlight and click the code button (with the10101010). If your preview doesn't look like what I fixed for you, above, then you need to adjust until it does.

So try using svaseq and post again if it still doesn't work.

ADD COMMENT • link 6.2 years ago James W. MacDonald 68k

0

Entering edit mode

Thanks James,

I followed your advice and used the svaseq() function instead. This time I had a different error:

dat<-counts(dds,normalized=T)
idx <- rowMeans(dat) > 1
dat <- dat[idx,]

mod<-model.matrix(~ Age+Sequencing.lane+Disease, colData(dds))
mod0<-model.matrix(~  Age+Sequencing.lane, colData(dds))

svseq <- svaseq(dat, mod, mod0)

Error in dstat0[, i] : subscript out of bounds

Would a deeper filtering of low expressed genes help here?

ADD REPLY • link 6.2 years ago thkapell ▴ 10

0

Entering edit mode

What do you get from

min(ncol(dat), nrow(dat)) - ceiling(sum(hat(mod)))

ADD REPLY • link 6.2 years ago James W. MacDonald 68k

0

Entering edit mode

 min(ncol(dat), nrow(dat)) - ceiling(sum(hat(mod)))

[1]1

ADD REPLY • link 6.2 years ago thkapell ▴ 10

0

Entering edit mode

Huh. Doesn't seem like you should get that error then. You have one remaining degree of freedom, so you are really close to an over-parameterized model anyway, but it shouldn't error like that? The error is coming from num.sv and seems to indicate that you have a fully parameterized model.

Fake example:

## design matrix that has 1 df
> mod2
  (Intercept) A B C D E F  continuous
1           1 0 0 0 0 0 0 -1.57605014
2           1 0 0 0 0 1 0 -1.07814260
3           1 0 0 1 0 0 1  0.43780250
4           1 1 0 1 0 0 0  0.62172593
5           1 1 0 0 1 1 0 -0.61158475
6           1 1 0 0 1 0 1 -0.20464747
7           1 0 1 0 0 0 0 -0.70204198
8           1 0 1 0 0 1 0 -1.83367875
9           1 0 1 0 0 0 1 -0.07603098

> y <- matrix(rnorm(9000), 100, 9)
> 9 - ceiling(sum(hat(mod2)))
[1] 1
> num.sv(y, mod2)
[1] 0
> mod2[,9] <- runif(9)
> 9 - ceiling(sum(hat(mod2)))
[1] 0
> num.sv(y, mod2)
Error in dstat0[, i] : subscript out of bounds

ADD REPLY • link 6.2 years ago James W. MacDonald 68k

0

Entering edit mode

I see. The only explanation I can think is that the batch that causes the problem is Age. There are a few patients that have the same value. Is it possible to remove this unwanted variation from my model?

ADD REPLY • link 6.2 years ago thkapell ▴ 10

0

Entering edit mode

Subjects having the same age isn't a problem. Were the samples really run on separate sequencing lanes? These days it seems like the usual MO is to barcode the samples, pool, and then run all the samples on each lane. In which case it's not useful to block on the sequencing lane.

ADD REPLY • link 6.2 years ago James W. MacDonald 68k

0

Entering edit mode

Yes, Sequencing.lane refers to different dates so they were really put on separate flow cells.

ADD REPLY • link 6.2 years ago thkapell ▴ 10