Question

About "mutationsignature" package, how to use it with only one type of cancer?

0

Entering edit mode

shengfeng • 0

@shengfeng-9866

Last seen 9.0 years ago

Hi everyone,

I am a Postdoc majored in cancer genomic. Now, I am trying to use mutationsignature package to analyze my own data about breast cancer. And I faced with some problems that need help.

in order to explain the situation more clear, I saved my log file as the attachment.

My R Version is 3.2.3(2015-12-10), platform: x86_64-mingw32/x64 (64 bit). In the package, the author use "SomaticCancerAlterations" to save the sample data. And "> sca_metada=scaMetadata()" should be the code to input it.

For the log file, I try to input all your data, and only keep "ov" as single type of cancer. Questions faced are as follows:

(1) as the log file shows, at least two codes didn't work in this situation, such as "stat=assessNumberSignatures(sca_mm1,nSigs,nReplicates=5)", "sigs_nmf=identifySignatures(sca_mm1,n_Sigs,nmfDecomposition)". And I don't know how to modify the codes. And what's the problem.

(2) there is another R package "rmsignature" (https://github.com/friend1ws/pmsignature), which could do similar work as this package. And I want to only save ovary cancer records of the sample data, and use "rmsignature" to create signatures with this same database. However, I could not output "sca_vr1" (only ov ) as any types of datasets. Thus it's impossible for me to compare those two packages.

BTW, Does anybody know why the type of "Samplenames" is "NULL" , and I guess it should be "rle" type. But when I use runLength or runValue, neither works.

Thank you very much.

Best.

Under contents is the log file.

----------------------------------------------------------------------------------------------------------------------------------------------------------------

> library(rtracklayer)
> library(BSgenome.Hsapiens.1000genomes.hs37d5)
> library(SomaticSignatures)
> library(SomaticCancerAlterations)
> sca_metada=scaMetadata()
> sca_data=unlist(scaLoadDatasets())
> sca_data$study = factor(gsub("(.*)_(.*)", "\\1", toupper(names(sca_data))))
> sca_data = unname(subset(sca_data, Variant_Type %in% "SNP"))
> sca_data = keepSeqlevels(sca_data, hsAutosomes())
> sca_vr = VRanges(seqnames = seqnames(sca_data),ranges = ranges(sca_data),ref = sca_data$Reference_Allele,alt = sca_data$Tumor_Seq_Allele2,sampleNames = sca_data$Patient_ID,seqinfo = seqinfo(sca_data),study = sca_data$study)
> sca_vr = ucsc(sca_vr)
> sca_vr
> sca_vr1 <-sca_vr[sca_vr$study=="OV",]
> sca_vr1
> library(BSgenome.Hsapiens.UCSC.hg19)
> sca_motifs = mutationContext(sca_vr1, BSgenome.Hsapiens.UCSC.hg19, unify = TRUE)
> head(sca_motifs)
> sort(table(sca_vr$study),decreasing=TRUE)

LUAD SKCM HNSC LUSC KIRC GBM THCA OV
208724 200589 67125 61485 24158 19938 6716 5872
> sca_vr1$study=factor(sca_vr1$study)
> sort(table(sca_vr1$study),decreasing=TRUE)
OV
5872
> sca_motifs1 = mutationContext(sca_vr1, BSgenome.Hsapiens.UCSC.hg19, unify = TRUE)
> sca_mm1 = motifMatrix(sca_motifs1, group = "study", normalize = TRUE)
> sca_mm1
OV
CA A.A 0.012602180
CA A.C 0.019243869
CA A.G 0.005960490
CA A.T 0.009196185
CA C.A 0.012772480
CA C.C 0.014305177
CA C.G 0.005790191
CA C.T 0.013113079
CA G.A 0.010558583
CA G.C 0.012091281
CA G.G 0.004768392
CA G.T 0.009707084
CA T.A 0.011580381
CA T.C 0.016178474
CA T.G 0.005790191
CA T.T 0.012942779
CG A.A 0.013113079
CG A.C 0.007493188
CG A.G 0.006641689
CG A.T 0.009877384
CG C.A 0.012602180
CG C.C 0.009877384
CG C.G 0.004938692
CG C.T 0.014475477
CG G.A 0.007663488
CG G.C 0.007833787
CG G.G 0.003405995
CG G.T 0.010558583
CG T.A 0.016348774
CG T.C 0.017200272
CG T.G 0.005449591
CG T.T 0.027247956
CT A.A 0.016008174
CT A.C 0.013283379
CT A.G 0.047343324
CT A.T 0.011920981
CT C.A 0.014305177
CT C.C 0.015156676
CT C.G 0.051089918
CT C.T 0.015667575
CT G.A 0.018222071
CT G.C 0.017881471
CT G.G 0.044448229
CT G.T 0.017881471
CT T.A 0.023671662
CT T.C 0.021117166
CT T.G 0.028269755
CT T.T 0.018051771
TA A.A 0.004087193
TA A.C 0.005960490
TA A.G 0.007663488
TA A.T 0.006130790
TA C.A 0.003065395
TA C.C 0.008344687
TA C.G 0.008514986
TA C.T 0.008004087
TA G.A 0.003576294
TA G.C 0.003235695
TA G.G 0.006301090
TA G.T 0.005108992
TA T.A 0.001873297
TA T.C 0.005619891
TA T.G 0.003916894
TA T.T 0.004427793
TC A.A 0.011069482
TC A.C 0.008174387
TC A.G 0.010558583
TC A.T 0.017029973
TC C.A 0.004938692
TC C.C 0.010728883
TC C.G 0.010217984
TC C.T 0.009877384
TC G.A 0.008004087
TC G.C 0.005790191
TC G.G 0.004598093
TC G.T 0.008004087
TC T.A 0.003746594
TC T.C 0.007493188
TC T.G 0.005619891
TC T.T 0.005619891
TG A.A 0.002554496
TG A.C 0.002384196
TG A.G 0.003235695
TG A.T 0.003405995
TG C.A 0.002043597
TG C.C 0.002213896
TG C.G 0.006811989
TG C.T 0.005790191
TG G.A 0.002213896
TG G.C 0.001873297
TG G.G 0.006471390
TG G.T 0.004938692
TG T.A 0.001362398
TG T.C 0.003405995
TG T.G 0.003916894
TG T.T 0.004427793
> nSigs=2:5
> stat=assessNumberSignatures(sca_mm1,nSigs,nReplicates=5)
Error in `colnames<-`(`*tmp*`, value = sig_names) :
attempt to set 'colnames' on an object with less than two dimensions
: Warning messages:
1: In validityMethod(object) :
Dimensions of W and H look strange [ncol(W)= 2 > ncol(H)= 1 ]
2: In validityMethod(object) :
Dimensions of W and H look strange [ncol(W)= 2 > ncol(H)= 1 ]
3: In validityMethod(object) :
Dimensions of W and H look strange [ncol(W)= 2 > ncol(H)= 1 ]
> n_Sigs=3
> sigs_nmf=identifySignatures(sca_mm1,n_Sigs,nmfDecomposition)
Error in `colnames<-`(`*tmp*`, value = sig_names) :
attempt to set 'colnames' on an object with less than two dimensions
: Warning messages:
1: In validityMethod(object) :
Dimensions of W and H look strange [ncol(W)= 3 > ncol(H)= 1 ]
2: In validityMethod(object) :
Dimensions of W and H look strange [ncol(W)= 3 > ncol(H)= 1 ]
3: In validityMethod(object) :
Dimensions of W and H look strange [ncol(W)= 3 > ncol(H)= 1 ]
> sca_vr1
VRanges object with 5872 ranges and 1 metadata column:
seqnames ranges strand ref alt totalDepth refDepth altDepth sampleNames softFilterMatrix | study
<Rle> <IRanges> <Rle> <character> <characterOrRle> <integerOrRle> <integerOrRle> <integerOrRle> <factorOrRle> <matrix> | <factor>
[1] chr1 [1334552, 1334552] * C T <NA> <NA> <NA> TCGA-24-2262 | OV
[2] chr1 [1961652, 1961652] * C T <NA> <NA> <NA> TCGA-24-1552 | OV
[3] chr1 [2420688, 2420688] * C G <NA> <NA> <NA> TCGA-13-1484 | OV
[4] chr1 [2452247, 2452247] * T G <NA> <NA> <NA> TCGA-20-0991 | OV
[5] chr1 [3328714, 3328714] * C T <NA> <NA> <NA> TCGA-25-2042 | OV
... ... ... ... ... ... ... ... ... ... ... ... ...
[5868] chr22 [50724253, 50724253] * C T <NA> <NA> <NA> TCGA-13-1409 | OV
[5869] chr22 [50724633, 50724633] * G T <NA> <NA> <NA> TCGA-13-1409 | OV
[5870] chr22 [50857872, 50857872] * C T <NA> <NA> <NA> TCGA-23-1027 | OV
[5871] chr22 [50960778, 50960778] * G T <NA> <NA> <NA> TCGA-57-1993 | OV
[5872] chr22 [50966138, 50966138] * C T <NA> <NA> <NA> TCGA-59-2363 | OV
-------
seqinfo: 22 sequences from an unspecified genome
hardFilters: NULL
> class(sca_vr1$sampleNames)
[1] "NULL"
> class(sca_vr1$study)
[1] "factor"
>

mutational signature • 2.0k views

ADD COMMENT • link updated 9.6 years ago by Julian Gehring ★ 1.3k • written 9.6 years ago by shengfeng • 0

score 0 · Answer 1 · 2016-03-07

0

Entering edit mode

Julian Gehring ★ 1.3k

@julian-gehring-5818

Last seen 6.4 years ago

The short answer is that you cannot identify multiple signatures if there is only one group of interest, i.e. study, in your data. The concept of inferring mutational signatures via matrix decomposition relies on a mutational spectrum across different samples/groups. The number of groups in your original data define the upper limit for the number of samples one can identify. For one group, there is no meaningful decomposition of the mutational spectrum possible. To understand why this is not possible, you can have a look at the vignette of the package, and especially the original papers that underly this method. To overcome your problem, you could think of other groups of interest, or, alternatively, combine your data with those of other studies.

ADD COMMENT • link 9.6 years ago Julian Gehring ★ 1.3k

0

Entering edit mode

Hi Julian, thank you so much for your reply.

I understand what's you mean now. Although we could get the probability for 64 combinations in one group, we couldn't use it to create signatures due to lack of comparative information from other groups.

One more question to trouble you.

I also want to use "SampleNames" as the group variable instead of "study", which is more feasible for me. However, it doesn't work. Could you tell me more about the type of "Samplenames".

I think most of my questions are result from its special type (Rle?) . I don't know how to deal with it now.

It could not use like "Study". Please give me a brief explanation.

Thank you so much.

Bests,

ADD REPLY • link 9.6 years ago shengfeng • 0

0

Entering edit mode

As this seems to be a separate issue, can you open a new question with a minimal, reproducible example? At the moment, it is hard to see what errors relate to the original issue of having only one group, and what may not work for you anyway. You can also have a look at the posting guide for help (http://www.bioconductor.org/help/support/posting-guide/).

ADD REPLY • link 9.6 years ago Julian Gehring ★ 1.3k

0

Entering edit mode

Thank you so much.

The link you provided could not open. You can try it in your computer.

For the "new" question, I just use your sample data. As I mentioned, I could not save your sample data in any format. So it's impossible for me to give a example.

Could you just re-submit the codes that I used in the log file? And you will see what I talk about may totally due to the "SampleNames" variables.

Now I try to use "SampleNames" as group var already. Could you do any operation with "SampleNames" ? such as save it ? or describe it (summary)?

Thank you .

Bests,

ADD REPLY • link 9.6 years ago shengfeng • 0