SomaticSignatures package : motifMatrix function trouble
3
0
Entering edit mode
@guillaumedachy-11994
Last seen 4.5 years ago
Brussels

Hello everyone, 

I am trying to use the SomaticSignatures package, which have a very complete vignette but I encounter trouble with the motifMatrix function. 

I built a vr object with my reference genome and a VCF file. Then I get a "VRanges object with 6 ranges and 70 metadata columns" and from there I don't understand how to use the motifMatrix function and which argument to prefer to build the "M matrix" of the vignette and to go further with the evaluation of the signature. 

Here is my session (I'm sorry for the length): 

> head(Caf62_motifs)
VRanges object with 6 ranges and 70 metadata columns:
      seqnames                 ranges strand         ref              alt
         <Rle>              <IRanges>  <Rle> <character> <characterOrRle>
  [1]     chr5 [149495253, 149495253]      *           T                C
  [2]     chr5 [149495287, 149495287]      *           G                C
  [3]     chr5 [149495395, 149495395]      *           T                C
  [4]     chr5 [149500397, 149500397]      *           T                C
  [5]     chr5 [149505131, 149505131]      *           A                C
  [6]     chr5 [149509270, 149509270]      *           A                G
          totalDepth       refDepth       altDepth   sampleNames
      <integerOrRle> <integerOrRle> <integerOrRle> <factorOrRle>
  [1]          10522           <NA>           <NA>          none
  [2]          10548           <NA>           <NA>          none
  [3]           2957           <NA>           <NA>          none
  [4]            220           <NA>           <NA>          none
  [5]           3874           <NA>           <NA>          none
  [6]          48870           <NA>           <NA>          none
      softFilterMatrix |       QUAL        NS        HS        DP        RO
              <matrix> |  <numeric> <integer> <logical> <integer> <integer>
  [1]                  |    24.9075      <NA>     FALSE     10522     10120
  [2]                  |    31.1589      <NA>     FALSE     10548      8150
  [3]                  |   552.1050      <NA>     FALSE      2957      2673
  [4]                  |   336.5280      <NA>     FALSE       220       152
  [5]                  |  1766.6900      <NA>     FALSE      3874      3196
  [6]                  | 10129.5000      <NA>     FALSE     48870     25252
                 AO       MDP       MRO           MAO           MAF       SRF
      <IntegerList> <integer> <integer> <IntegerList> <NumericList> <integer>
  [1]           397      <NA>      <NA>            NA            NA      6302
  [2]           340      <NA>      <NA>            NA            NA      5401
  [3]           272      <NA>      <NA>            NA            NA      1121
  [4]            51      <NA>      <NA>            NA            NA        70
  [5]           674      <NA>      <NA>            NA            NA      1790
  [6]         23457      <NA>      <NA>            NA            NA     12603
            SRR           SAF           SAR       FDP       FRO           FAO
      <integer> <IntegerList> <IntegerList> <integer> <integer> <IntegerList>
  [1]      3818           241           156      1999      1930            69
  [2]      2749           199           141      1984      1913            71
  [3]      1552           118           154      1999      1807           192
  [4]        82            29            22       202       151            51
  [5]      1406           365           309      1996      1649           347
  [6]     12649         10795         12662      1996       997           999
                 AF      FSRF      FSRR          FSAF          FSAR
      <NumericList> <integer> <integer> <IntegerList> <IntegerList>
  [1]     0.0345173      1164       766            44            25
  [2]     0.0357863      1220       693            42            29
  [3]      0.096048       739      1068            86           106
  [4]      0.252475        70        81            29            22
  [5]      0.173848       927       722           191           156
  [6]      0.500501       531       466           447           552
                 TYPE           LEN          HRUN          MLLD          FWDB
      <CharacterList> <IntegerList> <IntegerList> <NumericList> <NumericList>
  [1]             snp             1             1       80.1793     0.0167707
  [2]             snp             1             7       418.825    0.00619089
  [3]             snp             1             1        273.03   -0.00983297
  [4]             snp             1             2       55.3924   -0.00899418
  [5]             snp             1             2       327.282    -0.0171705
  [6]             snp             1             1       266.923     0.0200888
               REVB          REFB          VARB           STB          STBP
      <NumericList> <NumericList> <NumericList> <NumericList> <NumericList>
  [1]    -0.0182017   0.000559867    -0.0158247      0.535403         0.574
  [2]    -0.0468386   0.000914669    -0.0030182      0.546863         0.441
  [3]    0.00872495   0.000172781   -0.00157773      0.535859         0.341
  [4]    -0.0578722    -0.0185919     0.0524898      0.578285         0.201
  [5]    -0.0168981   -0.00501503     0.0250835      0.509806         0.677
  [6]   -0.00744356    -0.0102243    0.00902207      0.542621             0
                RBI         QD         FXX              FR            INFO
      <NumericList>  <numeric>   <numeric> <CharacterList> <CharacterList>
  [1]     0.0247499  0.0498400 0.000499998               .                
  [2]      0.047246  0.0628203 0.007999960               .                
  [3]     0.0131458  1.1047600 0.000499998               .                
  [4]     0.0585669  6.6639200 0.081814500               .                
  [5]     0.0240909  3.5404600 0.001999990               .                
  [6]     0.0214235 20.2997000 0.001999990               .                
               SSSB          SSEN          SSEP            PB           PBP
      <NumericList> <NumericList> <NumericList> <NumericList> <NumericList>
  [1]     -0.014489             0             0            NA            NA
  [2]    -0.0755476             0             0            NA            NA
  [3]     0.0179381             0             0            NA            NA
  [4]      0.134877             0             0            NA            NA
  [5]    -0.0244993             0             0            NA            NA
  [6]    -0.0369503             0             0            NA            NA
                  OID          OPOS            OREF            OALT
      <CharacterList> <IntegerList> <CharacterList> <CharacterList>
  [1]               .     149495253               T               C
  [2]               .     149495287               G               C
  [3]               .     149495395               T               C
  [4]               .     149500397               T               C
  [5]               .     149505131               A               C
  [6]               .     149509270               A               G
              OMAPALT          GT        GQ      RO.1          AO.1     MDP.1
      <CharacterList> <character> <integer> <integer> <IntegerList> <integer>
  [1]               C         0/1        24     10120           397      <NA>
  [2]               C         0/1        27      8150           340      <NA>
  [3]               C         0/1       552      2673           272      <NA>
  [4]               C         0/1       336       152            51      <NA>
  [5]               C         0/1      1766      3196           674      <NA>
  [6]               G         0/1     10126     25252         23457      <NA>
          MRO.1         MAO.1         MAF.1     SRF.1     SRR.1         SAF.1
      <integer> <IntegerList> <NumericList> <integer> <integer> <IntegerList>
  [1]      <NA>            NA            NA      6302      3818           241
  [2]      <NA>            NA            NA      5401      2749           199
  [3]      <NA>            NA            NA      1121      1552           118
  [4]      <NA>            NA            NA        70        82            29
  [5]      <NA>            NA            NA      1790      1406           365
  [6]      <NA>            NA            NA     12603     12649         10795
              SAR.1     FDP.1     FRO.1         FAO.1          AF.1    FSRF.1
      <IntegerList> <integer> <integer> <IntegerList> <NumericList> <integer>
  [1]           156      1999      1930            69     0.0345173      1164
  [2]           141      1984      1913            71     0.0357863      1220
  [3]           154      1999      1807           192      0.096048       739
  [4]            22       202       151            51      0.252475        70
  [5]           309      1996      1649           347      0.173848       927
  [6]         12662      1996       997           999      0.500501       531
         FSRR.1        FSAF.1        FSAR.1        QT     alteration
      <integer> <IntegerList> <IntegerList> <integer> <DNAStringSet>
  [1]       766            44            25         1             TC
  [2]       693            42            29         1             CG
  [3]      1068            86           106         1             TC
  [4]        81            29            22         1             TC
  [5]       722           191           156         1             TG
  [6]       466           447           552         1             TC
             context
      <DNAStringSet>
  [1]            G.C
  [2]            C.C
  [3]            C.G
  [4]            T.C
  [5]            T.A
  [6]            A.C
  -------
  seqinfo: 25 sequences from GenomeA genome
  hardFilters: NULL

> Caf62_mm = motifMatrix(Caf62_motifs, normalize = TRUE)
> head(round(Caf62_mm, 4))
       none
CA A.A    0
CA A.C    0
CA A.G    0
CA A.T    0
CA C.A    0
CA C.C    0

> Caf62_mm = motifMatrix(Caf62_motifs, group = "sampleNames", normalize = TRUE)
> head(round(Caf62_mm))
       none
CA A.A    0
CA A.C    0
CA A.G    0
CA A.T    0
CA C.A    0
CA C.C    0
> Caf62_mm = motifMatrix(Caf62_motifs, group = "seqnames", normalize = TRUE)
> head(round(Caf62_mm))
       chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13
CA A.A  NaN  NaN  NaN  NaN    0  NaN  NaN  NaN  NaN   NaN   NaN   NaN     0
CA A.C  NaN  NaN  NaN  NaN    0  NaN  NaN  NaN  NaN   NaN   NaN   NaN     0
CA A.G  NaN  NaN  NaN  NaN    0  NaN  NaN  NaN  NaN   NaN   NaN   NaN     0
CA A.T  NaN  NaN  NaN  NaN    0  NaN  NaN  NaN  NaN   NaN   NaN   NaN     0
CA C.A  NaN  NaN  NaN  NaN    0  NaN  NaN  NaN  NaN   NaN   NaN   NaN     0
CA C.C  NaN  NaN  NaN  NaN    0  NaN  NaN  NaN  NaN   NaN   NaN   NaN     0
       chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX chrY chrM
CA A.A   NaN   NaN   NaN   NaN   NaN   NaN   NaN     0   NaN  NaN  NaN  NaN
CA A.C   NaN   NaN   NaN   NaN   NaN   NaN   NaN     0   NaN  NaN  NaN  NaN
CA A.G   NaN   NaN   NaN   NaN   NaN   NaN   NaN     0   NaN  NaN  NaN  NaN
CA A.T   NaN   NaN   NaN   NaN   NaN   NaN   NaN     0   NaN  NaN  NaN  NaN
CA C.A   NaN   NaN   NaN   NaN   NaN   NaN   NaN     0   NaN  NaN  NaN  NaN
CA C.C   NaN   NaN   NaN   NaN   NaN   NaN   NaN     0   NaN  NaN  NaN  NaN

 

Would anyone have an idea to allow me to go further in this ? 

Thank you so much,

Cheers

Guillaume

somaticsignatures • 1.5k views
ADD COMMENT
1
Entering edit mode
Haiying.Kong ▴ 110
@haiyingkong-9254
Last seen 5.7 years ago
Germany

I think it is because Caf62_motifs does not have any information for sample names.

So, when motifMatrix tries to sort the data by sampleNames, it cannot.

 

ADD COMMENT
1
Entering edit mode
Julian Gehring ★ 1.3k
@julian-gehring-5818
Last seen 5.6 years ago

In order to construct a matrix with the counts of mutational motifs, you need the variant calls annotated with the mutational context (the context column) and a grouping variable that is also present in your VRanges object. The grouping variable then defines the columns of your mutational matrix M, and this needs to be a categorical variable with at least two unique elements (otherwise, we won't really get a matrix). The 'sampleNames' in your VRanges object only seem to have one unique entry (none), but you can choose any variable that you deem interesting and meaningful for you analysis: If your data has e.g. a 'phenotype' column, you can group by this phenotype with

motifMatrix(vr, group = "phenotype")

In the vignette, we use for example a grouping according to the tumour type as a grouping variable.

ADD COMMENT
0
Entering edit mode
@guillaumedachy-11994
Last seen 4.5 years ago
Brussels

Sorry for the late response, thank you so much ! I am going to try to have a better annotated matrix. 

Cheers

ADD COMMENT

Login before adding your answer.

Traffic: 366 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6