Question

Limma model design: is it better to just build a separate design for each specific comparison?

0

Entering edit mode

Ahdee ▴ 50

@ahdee-8938

Last seen 18 months ago

United States

Hi suppose I have these 3 groups, a b and c.

data= c ( rep("a", 3), rep("b",3),  rep("c",3) )
mm = model.matrix(~0  + data )  
colnames( mm ) = c("a","b","c")

makeContrasts( avsb = a - b, levels=mm)

  a b c
1 1 0 0
2 1 0 0
3 1 0 0
4 0 1 0
5 0 1 0
6 0 1 0
7 0 0 1
8 0 0 1
9 0 0 1

so what I want to do is compare a vs b and a vs c separately. In the above I created a contrast a -b however I'm not sure if it is correct to do this since column a and column for example has extra row of data 7-9. The contrast seem to suggest its ok to do this however I did a comparison with this.

data2= c ( rep("a", 3), rep("b",3) )
mm2 = model.matrix(~0  + data2 )  
colnames( mm2 ) = c("a","b")

makeContrasts( avsb = a - b, levels=mm2)

when I run these two design through limma its giving me slightly different pvalues and importantly logfc suggesting that the first model is not completely eliminating the rows 7-9 the c column? I'm super curious about this and wondering if anyone has a comment for this! thanks.

limma • 670 views

ADD COMMENT • link updated 4.9 years ago by James W. MacDonald 65k • written 4.9 years ago by Ahdee ▴ 50

score 2 · Accepted Answer · 2019-06-04

2

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 8 hours ago

United States

The answer is 'it depends'. In general, if the within-group variability for the 'c' samples is comparable to the 'a' and 'b' samples, then you gain by keeping those samples in your model. That's because the variance estimate used for the contrast is based on (sort of) the average within-group variability of all the groups. And since the sample variance isn't a particularly efficient statistic, having more data to estimate the variance is better.

But if the 'c' group has a much higher or lower variability than 'a' and 'b' (not that you could really tell from just three samples anyway), then it might tend to bias your variance estimate somewhat. I tend to keep all data in the model myself, particularly when I have a really underpowered study like this.

Unless you are using array weights, the logFC shouldn't be changing. Are you?

ADD COMMENT • link 4.9 years ago James W. MacDonald 65k

1

Entering edit mode

My advice is more categorical than James'. Since you want to compare a vs b and a vs c, you are assuming that all the samples are comparable. Given this, I can't think of any circumstances where it would not be best to analyse all the sample together.

Subsetting the samples just removes information for no reason.

If you really did think that one of the groups was more variable than the others, then you could model that using voomWithQualityWeights but you would still analyse all the samples together. Subsetting would not solve that problem.

ADD REPLY • link 4.9 years ago Gordon Smyth 50k

0

Entering edit mode

Thanks James, I should be more specific. This is for an RNA-seq that has been TMM normalized and voom. I'm using voom as the input. So in general what you are saying is that leaving everything as a single model and defining the contrast is "better" but only if the variance between the samples are not too off?

ADD REPLY • link 4.9 years ago Ahdee ▴ 50

1

Entering edit mode

When you use voom you are using the variance of the data twice - first when you compute the observation-level weights, and then as part of your statistic. In that situation there may be more reason to keep all samples in your model, in order to get the best weight estimates possible.

It is often difficult if not impossible to say if the variance of one set is 'off' as compared to another, so unless I see something crazy on an MDS plot I tend to just use everything I have.

ADD REPLY • link 4.9 years ago James W. MacDonald 65k

0

Entering edit mode

Thanks James and Gordon! This is very reassuring - I been using single models ( that is analyzing everything together and using contrast to parse out the different comparison ) however started to freak out mostly because the logfc were different. I though that logFC was simply calculated from the just the counts between each group and not take into account the variance between groups? If not then why is logFC different when combining or analyzing separately?

ADD REPLY • link 4.9 years ago Ahdee ▴ 50

0

Entering edit mode

The logFCs do not change if you subset to two groups when running 'lmFit.

However the logFCs will change if you subset to just two groups right back to running voom. This is because the voom weights will change and the voom weights are used in the logFC calculation. When we compute logFCs we give more weight to log-counts that are more precise and less weight to log-counts that are less precise. That is, after all, the whole purpose of computing the voom precision weights.

ADD REPLY • link 4.9 years ago Gordon Smyth 50k