Somaticsignature NMF problem
3
0
Entering edit mode
ns • 0
@ns-7636
Last seen 7.9 years ago
United States

Hello, 

I am trying to run somaticsignatures on mutation data.  

When I run the commands:

snpvr_mm = motifMatrix(snpvr_motif, group = "study", normalize = TRUE)

gof_nmf = identifySignatures(snpvr_mm, 4, decomposition = nmfDecomposition)

I get the following error:

Error: NMF::nmf - Input matrix x contains at least one null or NA-filled row.

The session info for R is:

R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ggplot2_2.2.1              SomaticSignatures_2.8.4    VariantAnnotation_1.18.7   Rsamtools_1.24.0           Biostrings_2.40.2         
 [6] XVector_0.12.1             SummarizedExperiment_1.2.3 Biobase_2.32.0             GenomicRanges_1.24.3       GenomeInfoDb_1.8.7        
[11] IRanges_2.6.1              S4Vectors_0.10.3           BiocGenerics_0.18.0       

loaded via a namespace (and not attached):
 [1] httr_1.2.1                    foreach_1.4.3                 AnnotationHub_2.4.2           splines_3.3.1                 Formula_1.2-1                
 [6] shiny_1.0.0                   assertthat_0.1                interactiveDisplayBase_1.10.3 latticeExtra_0.6-28           RBGL_1.48.1                  
[11] BSgenome_1.40.1               RSQLite_1.1-2                 backports_1.0.5               lattice_0.20-34               biovizBase_1.20.0            
[16] digest_0.6.12                 RColorBrewer_1.1-2            checkmate_1.8.2               colorspace_1.3-2              ggbio_1.20.2                 
[21] htmltools_0.3.5               httpuv_1.3.3                  Matrix_1.2-8                  plyr_1.8.4                    OrganismDbi_1.14.1           
[26] XML_3.98-1.5                  biomaRt_2.28.0                zlibbioc_1.18.0               xtable_1.8-2                  scales_0.4.1                 
[31] BiocParallel_1.6.6            proxy_0.4-16                  htmlTable_1.9                 tibble_1.2                    pkgmaker_0.22                
[36] GenomicFeatures_1.24.5        nnet_7.3-12                   lazyeval_0.2.0                survival_2.40-1               magrittr_1.5                 
[41] mime_0.5                      memoise_1.0.0                 GGally_1.3.0                  doParallel_1.0.10             NMF_0.20.6                   
[46] foreign_0.8-67                graph_1.50.0                  BiocInstaller_1.22.3          registry_0.3                  tools_3.3.1                  
[51] data.table_1.10.4             gridBase_0.4-7                stringr_1.1.0                 munsell_0.4.3                 rngtools_1.2.4               
[56] cluster_2.0.5                 AnnotationDbi_1.34.4          ensembldb_1.4.7               pcaMethods_1.64.0             grid_3.3.1                   
[61] RCurl_1.95-4.8                iterators_1.0.8               dichromat_2.0-0               htmlwidgets_0.8               bitops_1.0-6                 
[66] base64enc_0.1-3               codetools_0.2-15              gtable_0.2.0                  DBI_0.5-1                     reshape_0.8.6                
[71] reshape2_1.4.2                R6_2.2.0                      GenomicAlignments_1.8.4       gridExtra_2.2.1               knitr_1.15.1                 
[76] rtracklayer_1.32.2            Hmisc_4.0-2                   stringi_1.1.2                 Rcpp_0.12.9                   rpart_4.1-10                 
[81] acepack_1.4.1                

 

Can the problem be identified?

Thanks!

somaticsignatures NMF • 3.4k views
ADD COMMENT
0
Entering edit mode

Judging from "Error: NMF::nmf - Input matrix x contains at least one null or NA-filled row.", I would suspect that your input 'snpvr_mm' contains a full row, i.e. study, without any mutations. Can you check if this is the case? It is not possible to narrow it down further from the distance without knowing about the data.

Also, please consider updating your packages to the current Bioconductor release. The version you are using is not supported any more (see the Bioc help pages) and the new version of SomaticSignatures might behave differently.

ADD REPLY
0
Entering edit mode

Hi Julian,

I checked my data table, and I do not have any studies with no  mutation calls (studies are in the columns).  I do have rows where the trinucleotide motifs are present that have 0 values.  See the example below

  DJFS_1 DJFS_10 DJFS_100 DJFS_102 DJFS_107
CA A.A 0 0 0 0 0.0625
CA A.C 0.181818 0 0 0 0
CA A.G 0 0 0 0 0
CA A.T 0 0 0 0 0
CA C.A 0 0 0 0 0
CA C.C 0.090909 0 0 0 0
CA C.G 0 0 0 0 0.0625
CA C.T 0 0 0 0 0
CA G.A 0 0 0 0 0
CA G.C 0 0 0 0 0
CA G.G 0 0 0 0 0.0625

 

If a small number of mutations are present in a sample, it is likely to have some rows with a 0 value. (e.g. CA A.G)

How do I overcome this problem?

If I artificially add in values for the missing rows, I get this error:


Error in rowQ(imat, ncol(imat)) : cannot handle missing values.
In addition: Warning message:
In .local(x, rank, method, ...) :
  NMF residuals: final objective value is NA

I have now updated bioconductor.  Can you please help figure out why this problem persists?

Thanks!

Natalie

 

ADD REPLY
0
Entering edit mode

How many mutations do you have per sample/study? I.e. what does

rowSums(motifMatrix(snpvr_motif, group = "study", normalize = FALSE))

return?

ADD REPLY
0
Entering edit mode
ns • 0
@ns-7636
Last seen 7.9 years ago
United States

> rowSums(motifMatrix(snpvr_motif, group = "study", normalize = FALSE))
CA A.A CA A.C CA A.G CA A.T CA C.A CA C.C CA C.G CA C.T CA G.A CA G.C CA G.G CA G.T CA T.A CA T.C CA T.G CA T.T CG A.A CG A.C CG A.G CG A.T CG C.A CG C.C 
    15      4      8      5      6      9      7      6      5      1      4      3     16      4      6     14      8      4      7      4      4      0 
CG C.G CG C.T CG G.A CG G.C CG G.G CG G.T CG T.A CG T.C CG T.G CG T.T CT A.A CT A.C CT A.G CT A.T CT C.A CT C.C CT C.G CT C.T CT G.A CT G.C CT G.G CT G.T 
     0      6      6      1      3      1      6      3      0      3     12      6      6     13     17      3      9      4      5      1      5      6 
CT T.A CT T.C CT T.G CT T.T TA A.A TA A.C TA A.G TA A.T TA C.A TA C.C TA C.G TA C.T TA G.A TA G.C TA G.G TA G.T TA T.A TA T.C TA T.G TA T.T TC A.A TC A.C 
    11      3      3     12     14      6     13     11      3      5      5      7      6      4      4      7     10      6     10     16      7     13 
TC A.G TC A.T TC C.A TC C.C TC C.G TC C.T TC G.A TC G.C TC G.G TC G.T TC T.A TC T.C TC T.G TC T.T TG A.A TG A.C TG A.G TG A.T TG C.A TG C.C TG C.G TG C.T 
    15     10      4      3      2      2     10      4      7     12      9      7     13     20      2      4      7      2      1      1      6      1 
TG G.A TG G.C TG G.G TG G.T TG T.A TG T.C TG T.G TG T.T 
     2      1      7      2      2      0     10      2 
> colSums(motifMatrix(snpvr_motif, group = "study", normalize = FALSE))
DJFS_1 DJFS_2 DJFS_3 DJFS_4 DJFS_5 DJFS_6 DJFS_7 DJFS_8 DJFS_9 
    77     57     58     68    121     74     44     58     43 

ADD COMMENT
0
Entering edit mode
Julian Gehring ★ 1.3k
@julian-gehring-5818
Last seen 5.6 years ago

The error results from the fact that some mutational motifs have not been occurred in any of the samples, i.e. some of the rows of the motif matrix are fully zero. This prevents the decomposition of the matrix with the NMF. I'll try to capture this explicitly in the package and provide a more informative error message.

There are several potential approaches to address such cases, but looking at the excerpt from your data set, I doubt that any of them would be helpful here: The number of mutations per sample, and hence the signal of any mutational process, is very low, and I would not be confident that an analysis of this data would yield reliable or meaningful signatures. I would therefore suggest to think of other experimental designs for an analysis here, such as gathering data from more samples or pooling of samples to few distinct groups with higher number of variants.

ADD COMMENT
0
Entering edit mode
grimmmmer • 0
@grimmmmer-14147
Last seen 7.2 years ago

Are all of the other NMF-based tools similarly limited by this situation (e.g. WTSI Mutational Signature Framework, MutSpec, BayesNMF, signeR)?

SomaticSignatures would be a really great tool for unbiased classification of individual tumors into different groups, but given the discussion above, it seems this approach is limited to hindsight signatures of pre-defined groups of tumors. I am running into the same issue despite my large test dataset of ~4500 SNVs. 

ADD COMMENT

Login before adding your answer.

Traffic: 538 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6