Error in sva() function: missing values, NaNs produced
Entering edit mode
thkapell ▴ 10
Last seen 6 weeks ago
Helmholtz Center Munich, Germany


I want to clean my RNA-seq data with the SVA package, but the sva() function gives me error:

*Error in density.default(x, adjust = adj) : 'x' contains missing values
In addition: Warning message:
In pf(fstats, df1 = (df1 - df0), df2 = (n - df1)) : NaNs produced*

I attach the code I used since count table preparation and gene filtering:

*dds.txi <- DESeqDataSetFromTximport(txi = txi.kallisto, colData = sample_info, design = ~ Sequencing.lane+Disease)*

keep <- rowSums(counts(dds.txi)) >= 10
dds.txi <- dds.txi[keep,]*


*rld<-rlog(dds, blind=T)                   

*mod<-model.matrix(~ Age+Sequencing.lane+Disease, sample_info)
mod0<-model.matrix(~  Age+Sequencing.lane, sample_info)*

svseq <- sva(assay(rld), mod, mod0, =*

... and my *SessionInfo()*:

*R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                            LC_TIME=English_United Kingdom.1252    
attached base packages:
 [1] parallel  stats4    grid      stats     graphics  grDevices utils     datasets  methods   base     
other attached packages:
 [1]           apeglm_1.6.0                 mixOmics_6.8.4               DOSE_3.10.2                 
 [5] pcaGoPromoter.Mm.mm9_1.20.0  pcaGoPromoter.Hs.hg19_1.20.0 pcaGoPromoter_1.28.0         Biostrings_2.52.0           
 [9] XVector_0.24.0               ellipse_0.4.1                ReactomePA_1.28.0            biomaRt_2.40.4              
[13] geneplotter_1.62.0           lattice_0.20-38              limma_3.40.6                 tximport_1.12.3             
[17] sva_3.32.1                   genefilter_1.66.0            mgcv_1.8-28                  nlme_3.1-140                
[21] GSEABase_1.46.0              graph_1.62.0                 annotate_1.62.0              XML_3.98-1.20               
[25] AnnotationDbi_1.46.1         clusterProfiler_3.12.0       IHW_1.12.0                   DESeq2_1.24.0               
[29] SummarizedExperiment_1.14.1  DelayedArray_0.10.0          BiocParallel_1.18.1          Biobase_2.44.0              
[33] GenomicRanges_1.36.1         GenomeInfoDb_1.20.0          IRanges_2.18.2               S4Vectors_0.22.1            
[37] BiocGenerics_0.30.0          rhdf5_2.28.0                 ggrepel_0.8.1                gtools_3.8.1                
[41] stringi_1.4.3                ggforce_0.3.1                MASS_7.3-51.4                rlang_0.4.0                 
[45] survival_2.44-1.1            gridBase_0.4-7               matrixStats_0.55.0           devtools_2.2.1              
[49] usethis_1.5.1                Rcpp_1.0.2                   RSQLite_2.1.2                data.table_1.12.4           
[53] bit_1.1-14                   magrittr_1.5                 scales_1.0.0                 ggbeeswarm_0.6.0            
[57] VennDiagram_1.6.20           futile.logger_1.4.3          colourlovers_0.3.5           pheatmap_1.0.12             
[61] reshape2_1.4.3               forcats_0.4.0                stringr_1.4.0                dplyr_0.8.3                 
[65] purrr_0.3.2                  readr_1.3.1                  tidyr_1.0.0                  tibble_2.1.3                
[69] ggplot2_3.2.1                tidyverse_1.2.1             
loaded via a namespace (and not attached):
  [1] rappdirs_0.3.1         coda_0.19-3            acepack_1.4.1          bit64_0.9-7            knitr_1.25            
  [6] rpart_4.1-15           RCurl_1.95-4.12        generics_0.0.2         callr_3.3.2            cowplot_1.0.0         
 [11] lambda.r_1.2.4         europepmc_0.3          enrichplot_1.4.0       xml2_1.2.2             lubridate_1.7.4       
 [16] assertthat_0.2.1       viridis_0.5.1          xfun_0.10              hms_0.5.1              progress_1.2.2        
 [21] readxl_1.3.1           igraph_1.2.4.1         DBI_1.0.0              htmlwidgets_1.3        rARPACK_0.11-0        
 [26] ellipsis_0.3.0         RSpectra_0.15-0        backports_1.1.5        vctrs_0.2.0            remotes_2.1.0         
 [31] withr_2.1.2            triebeard_0.3.0        checkmate_1.9.4        fdrtool_1.2.15         prettyunits_1.0.2     
 [36] cluster_2.1.0          lazyeval_0.2.2         crayon_1.3.4           pkgconfig_2.0.3        slam_0.1-45           
 [41] labeling_0.3           tweenr_1.0.1           vipor_0.4.5            pkgload_1.0.2          nnet_7.3-12           
 [46] lifecycle_0.1.0        modelr_0.1.5           cellranger_1.1.0       rprojroot_1.3-2        polyclip_1.10-0       
 [51] Matrix_1.2-17          urltools_1.7.3         lpsymphony_1.12.0      Rhdf5lib_1.6.1         base64enc_0.1-3       
 [56] beeswarm_0.2.3         ggridges_0.5.1         processx_3.4.1         png_0.1-7              viridisLite_0.3.0     
 [61] bitops_1.0-6           blob_1.2.0             qvalue_2.16.0          gridGraphics_0.4-1     reactome.db_1.68.0    
 [66] memoise_1.1.0          graphite_1.30.0        plyr_1.8.4             zlibbioc_1.30.0        compiler_3.6.1        
 [71] bbmle_1.0.20           RColorBrewer_1.1-2     cli_1.1.0              ps_1.3.0               htmlTable_1.13.2      
 [76] formatR_1.7            Formula_1.2-3          tidyselect_0.2.5       emdbook_1.3.11         yaml_2.2.0            
 [81] GOSemSim_2.10.0        locfit_1.5-9.1         latticeExtra_0.6-28    fastmatch_1.1-0        tools_3.6.1           
 [86] rstudioapi_0.10        foreign_0.8-71         gridExtra_2.3          farver_1.1.0           ggraph_2.0.0          
 [91] digest_0.6.21          rvcheck_0.1.5          BiocManager_1.30.4     broom_0.5.2            httr_1.4.1            
 [96] colorspace_1.4-1       rvest_0.3.4            fs_1.3.1               splines_3.6.1          graphlayouts_0.5.0    
[101] ggplotify_0.0.4        sessioninfo_1.1.1      xtable_1.8-4           jsonlite_1.6           futile.options_1.0.1  
[106] tidygraph_1.1.2        corpcor_1.6.9          UpSetR_1.4.0           zeallot_0.1.0          testthat_2.2.1        
[111] R6_2.4.0               Hmisc_4.2-0            pillar_1.4.2           htmltools_0.3.6        glue_1.3.1            
[116] fgsea_1.10.1           pkgbuild_1.0.5         numDeriv_2016.8-1.1    GO.db_3.8.2            desc_1.2.0            
[121] munsell_0.5.0          DO.db_2.9              GenomeInfoDbData_1.2.1 haven_2.1.1            gtable_0.3.0* 

Any help would be greatly appreciated!


sva batch-correction • 567 views
Entering edit mode
Last seen 15 hours ago
United States

First off, note that sva is intended for continuous data that are sort of normal distributed, which is not true of count data. The svaseq function is intended for count data.

Second, when you type in your question, there is a preview pane right below where you are typing, that shows what your post will look like. Bracketing things with an asterisk makes things italicized, which doesn't help with code. Instead you want to either bracket with a triple back-tick (top left key on a QWERTY keyboard) or highlight and click the code button (with the10101010). If your preview doesn't look like what I fixed for you, above, then you need to adjust until it does.

So try using svaseq and post again if it still doesn't work.

Entering edit mode

Thanks James,

I followed your advice and used the svaseq() function instead. This time I had a different error:

idx <- rowMeans(dat) > 1
dat <- dat[idx,]

mod<-model.matrix(~ Age+Sequencing.lane+Disease, colData(dds))
mod0<-model.matrix(~  Age+Sequencing.lane, colData(dds))

svseq <- svaseq(dat, mod, mod0)

Error in dstat0[, i] : subscript out of bounds

Would a deeper filtering of low expressed genes help here?

Entering edit mode

What do you get from

min(ncol(dat), nrow(dat)) - ceiling(sum(hat(mod)))

Entering edit mode
 min(ncol(dat), nrow(dat)) - ceiling(sum(hat(mod)))

Entering edit mode

Huh. Doesn't seem like you should get that error then. You have one remaining degree of freedom, so you are really close to an over-parameterized model anyway, but it shouldn't error like that? The error is coming from and seems to indicate that you have a fully parameterized model.

Fake example:

## design matrix that has 1 df
> mod2
  (Intercept) A B C D E F  continuous
1           1 0 0 0 0 0 0 -1.57605014
2           1 0 0 0 0 1 0 -1.07814260
3           1 0 0 1 0 0 1  0.43780250
4           1 1 0 1 0 0 0  0.62172593
5           1 1 0 0 1 1 0 -0.61158475
6           1 1 0 0 1 0 1 -0.20464747
7           1 0 1 0 0 0 0 -0.70204198
8           1 0 1 0 0 1 0 -1.83367875
9           1 0 1 0 0 0 1 -0.07603098

> y <- matrix(rnorm(9000), 100, 9)
> 9 - ceiling(sum(hat(mod2)))
[1] 1
>, mod2)
[1] 0
> mod2[,9] <- runif(9)
> 9 - ceiling(sum(hat(mod2)))
[1] 0
>, mod2)
Error in dstat0[, i] : subscript out of bounds
Entering edit mode

I see. The only explanation I can think is that the batch that causes the problem is Age. There are a few patients that have the same value. Is it possible to remove this unwanted variation from my model?

Entering edit mode

Subjects having the same age isn't a problem. Were the samples really run on separate sequencing lanes? These days it seems like the usual MO is to barcode the samples, pool, and then run all the samples on each lane. In which case it's not useful to block on the sequencing lane.

Entering edit mode

Yes, Sequencing.lane refers to different dates so they were really put on separate flow cells.


Login before adding your answer.

Traffic: 466 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6