Question

limma-voom warning "Coefficients not estimable"

0

Entering edit mode

dagsbio • 0

@dagsbio-23131

Last seen 15 months ago

Spain

Hi there,

When trying to see tumor vs normal diferences in a subgroup fashion regarding the tumors but using all normals (tumor1 vs all normals, tumor2 vs all normals...etc) I am getting this error:

> vdata <- voom(all, design, plot=TRUE)
Coefficients not estimable: tumorSCC_4 
Warning message:
Partial NA coefficients for 395 probe(s)

Design matrix looks like this

> design %>% head
   normal tumor normalADC_2 normalADC_3 normalADC_4 normalLCC_1 normalLCC_2 normalSCC_1 normalSCC_2 normalSCC_3 normalSCC_4
T1      0     1           0           0           0           0           0           0           0           0           0
T2      0     1           0           0           0           0           0           0           0           0           0
T3      0     1           0           0           0           0           0           0           0           0           0
T4      0     1           0           0           0           0           0           0           0           0           0
T5      0     1           0           0           0           0           0           0           0           0           0
T6      0     1           0           0           0           0           0           0           0           0           0
   tumorADC_1 tumorADC_2 tumorADC_3 tumorADC_4 tumorLCC_1 tumorLCC_2 tumorSCC_1 tumorSCC_2 tumorSCC_3 tumorSCC_4
T1          0          1          0          0          0          0          0          0          0          0
T2          0          0          1          0          0          0          0          0          0          0
T3          1          0          0          0          0          0          0          0          0          0
T4          1          0          0          0          0          0          0          0          0          0
T5          0          1          0          0          0          0          0          0          0          0
T6          0          0          1          0          0          0          0          0          0          0

So, for this specific analysis, I would be only interested in the columns "normal" and all tumors (tumorADC1 tumorADC2 tumorADC3 tumorADC4 tumorLCC1 tumorLCC2 tumorSCC1 tumorSCC2 tumorSCC3 tumorSCC4).

Any thoughts?

Thank you

limma • 1.7k views

ADD COMMENT • link 3.6 years ago dagsbio • 0

score 1 · Answer 1 · 2020-10-09

1

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 10 hours ago

WEHI, Melbourne, Australia

It's difficult to tell what you're trying to do from the information shown. It would help us better to show

How many samples you have
The information you have on each sample, e.g., is more than one sample from the same patient?
What groups you want to compare
The formula you used to construct the design matrix

Note, if you have only have 4 groups (Normal, ADC, LCC and SCC) then the design matrix should only have 4 columns.

Are there only 395 rows in your data? That seems pretty small.

ADD COMMENT • link 3.6 years ago Gordon Smyth 50k

0

Entering edit mode

Hi Gordon,

Sorry for being a little short on info in my previous answer. I'll try to make it clearer by answering those:

We have 196 tumor samples divided in 3 groups: ADC, SCC and LCC. Out of those very 196 patients, we also have sequenced the normal tissue from 159 of them, so we a total of 355 samples coming from 196 different patients.
Summarizing last sentence: tumor and normal characterization of 159 patients and tumor-only characterization of 37 patients.
Mainly, we want to compare at first only TUMOR subgroups of each of the three histologies we have (ADC, SCC, LCC) internally. For ADC and SCC we have 4 different subgroups inferred with unsupervised clustering performed on tumor samples. For LCC, we have 2 subgroups.

BUT, we also would like to check the effect of NORMAL tissues in any possible form:

By comparing tumor vs normal
Doing a paired analysis (only possible for 159 patients)
Doing a paired analysis using also tumor-only (with duplicateCorrelation function).

And, in this particular case I was asking now:

Each of the tumor subgroups at a time compared to all normal samples and not only against the corresponding (by cluster) normal samples (as we know all normal samples behave more or less similarly between them by looking at a PCA plot).
- The formula used for this particular question was:

design <-  model.matrix(~ 0  + type + group, all_metadata)

Whereas the metadata looks like:

 all_metadata %>% head
   patient  type subtype subcluster      group
T1       1 tumor     ADC      ADC_2 tumorADC_2
T2       2 tumor     ADC      ADC_3 tumorADC_3
T3       3 tumor     ADC      ADC_1 tumorADC_1
T4       4 tumor     ADC      ADC_1 tumorADC_1
T5       5 tumor     ADC      ADC_2 tumorADC_2
T6       6 tumor     ADC      ADC_3 tumorADC_3

What im doing wrong in this example?

Additionally, do you think doing a paired analysis makes sense when actually the correlation between tumor and normal is very low?:

corfit <- duplicateCorrelation(all,design,block=all_metadata$patient)
corfit$consensus 
[1] 0.01093831

PD: All data come from a expression panel of only 395 genes, not the typical RNA-Seq.

ADD REPLY • link 3.6 years ago dagsbio • 0

0

Entering edit mode

The type variable is redundant because you have already coded the difference between tumor and normal into the group variable. So you need to remove type from the model.

ADD REPLY • link 3.6 years ago Gordon Smyth 50k

0

Entering edit mode

But then how do I specifically contrast each of the ADC tumors (4 groups) to normal, found in columnd type? I can't find the way of doing that.

ADD REPLY • link 3.6 years ago dagsbio • 0

0

Entering edit mode

I still can't tell how many groups you have or what you want to compare. What does normalADC_2 mean for example?

Anyway, just use contrasts in the usual way. It's straightforward to compare one group to the average of several others, or the average or normals to the average of tumors, or anything you need really.

ADD REPLY • link 3.6 years ago Gordon Smyth 50k

0

Entering edit mode

Each number at the end of every column name in the design matrix means the cluster number of each sample for every tumor.

With your reply I just realized I was assigning a cluster also to normal samples (eg: "normalADC_2"), when actually every normal sample should remain as "normal".

Now I am able to run every comparison without the "Coefficients not estimable" warning.

Thank you!

ADD REPLY • link 3.6 years ago dagsbio • 0