How to pool subgroups for makeContrasts() and subsequent limma analysis?

0

Entering edit mode

René ▴ 30

@rene-5722

Last seen 9.7 years ago

> > Hi Ren?, > > > > > > You are almost there. Note that you want the mean of the three groups, > > not the sum. So > > > > makeContrasts((B1 + B2 + B3)/3 - A) > > > > will e.g., do the comparison of B vs A. > > > > Best, > > > > Jim Dear James, I performed the pooled analysis as you suggested and compared the results to a pure B - A comparison (no subgroups specified). Interestingly, both analyses give different results (497 vs 15 genes with log2FC >= 1 and p < 0.05). Could you explain this huge difference? Best regards, Ren?

• 1.2k views

ADD COMMENT • link 11.2 years ago René ▴ 30

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 2 minutes ago

United States

Hi Rene, On 2/6/2013 11:29 AM, Ren? wrote: >>> Hi Ren?, >>> >>> >>> You are almost there. Note that you want the mean of the three groups, >>> not the sum. So >>> >>> makeContrasts((B1 + B2 + B3)/3 - A) >>> >>> will e.g., do the comparison of B vs A. >>> >>> Best, >>> >>> Jim > Dear James, > > I performed the pooled analysis as you suggested and compared the results to a > pure B - A comparison (no subgroups specified). Interestingly, both analyses > give different results (497 vs 15 genes with log2FC>= 1 and p< 0.05). > Could you explain this huge difference? If I assume that by a pure B-A comparison you redefined your design matrix so you only have three columns (A,B,C), and then did the B-A comparison, then it is simple to explain. I would also guess that the C-A comparison gives different results as well, depending on how you define your design matrix. Note that the contrast calculates the difference between the means of the two groups in the numerator and a measure of intra-group variability in the denominator. So in heuristic terms, the numerator says how different the groups are, and the denominator tells you if that difference is 'large' or not, by comparing to the within group variability. So if the groups are really 'tight' then a small difference in means might result in a significant test, but if the groups are really variable then the mean differences have to be pretty big as well to achieve significance. How you define your groups has no bearing on the numerator, because the difference of B-A is the same if you do B-A or if you do (B1+B2+B3)/3-A. However, the denominator may well be quite different, depending on the B1, B2, and B3 groups. In the instance where you did (B1+B2+B3)/3-A, the intra-group variability for the denominator is based in the variability within the A, B1, B2, B3, and C groups. So if all the B-type groups are pretty tight, then you will likely get more differentially expressed genes. If you do the 'pure' B-A comparison, then the denominator is based on the intra-group variability of the A,B,C groups. If the B1, B2, B3 groups are pretty tight, but not really similar, then the combined B group will be highly variable, so your denominator will tend to be larger, resulting in fewer differentially expressed genes. Since the denominator is the same for all contrasts, I would imagine the C-A comparison has fewer genes as well. Does that help? Best, Jim > > Best regards, > Ren? > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

ADD COMMENT • link 11.2 years ago James W. MacDonald 65k

0

Entering edit mode

Dear James, > Hi Rene, > > On 2/6/2013 11:29 AM, Ren? wrote: >> Dear James, >> >> I performed the pooled analysis as you suggested and compared the >> results to a >> pure B - A comparison (no subgroups specified). Interestingly, both >> analyses >> give different results (497 vs 15 genes with log2FC>= 1 and p< 0.05). >> Could you explain this huge difference? > > If I assume that by a pure B-A comparison you redefined your design > matrix so you only have three columns (A,B,C), and then did the B-A > comparison, then it is simple to explain. I would also guess that the > C-A comparison gives different results as well, depending on how you > define your design matrix. > > Note that the contrast calculates the difference between the means of > the two groups in the numerator and a measure of intra-group > variability in the denominator. So in heuristic terms, the numerator > says how different the groups are, and the denominator tells you if > that difference is 'large' or not, by comparing to the within group > variability. So if the groups are really 'tight' then a small > difference in means might result in a significant test, but if the > groups are really variable then the mean differences have to be pretty > big as well to achieve significance. > > How you define your groups has no bearing on the numerator, because > the difference of B-A is the same if you do B-A or if you do > (B1+B2+B3)/3-A. However, the denominator may well be quite different, > depending on the B1, B2, and B3 groups. > > In the instance where you did (B1+B2+B3)/3-A, the intra-group > variability for the denominator is based in the variability within the > A, B1, B2, B3, and C groups. So if all the B-type groups are pretty > tight, then you will likely get more differentially expressed genes. > > If you do the 'pure' B-A comparison, then the denominator is based on > the intra-group variability of the A,B,C groups. If the B1, B2, B3 > groups are pretty tight, but not really similar, then the combined B > group will be highly variable, so your denominator will tend to be > larger, resulting in fewer differentially expressed genes. Since the > denominator is the same for all contrasts, I would imagine the C-A > comparison has fewer genes as well. > > Does that help? > > Best, > > Jim > >> >> Best regards, >> Ren? Thank you for your very detailed explanation. Unfortunately, I observe the opposite result, so more genes are found when testing B - A than (B1+B2+B3)/3 - A. If I understand your explanation correctly, it means that I select more stringently in case of (B1+B2+B3)/3 - A due to a higher variation between the subgroups. Therefore, lowering the cutoff values would again correct my list of genes. Unfortunately I am doing a meta analysis of two independent data sets and I want to apply the same cutoff values for both data sets. This in turn would increase my second result list (same group comparison, i.e. B - A) from ~ 200 to a couple of thousand genes and thereby also introduce additional noise. Hence my question is: is there a possibility to somehow combine the results of both comparisons? Or is there a way to correct for the increased variance between the subgroups? Best regards, Ren?

ADD REPLY • link 11.2 years ago René ▴ 30

0

Entering edit mode

René ▴ 30

@rene-5722

Last seen 9.7 years ago

Dear all, As I am still stuck on this topic, I would like to ask whether it is OK to completely replace the comparison (B1+B2+B3)/3 - A by a separate run of B-A in order to obtain some meaningful results. By heart I would say that this is not the case, as it would mean combining two independent design matrices and therefore two different models. Nevertheless, I am running out of ideas on how to perform this analysis and I would like to finish it by the end of February. Any recommendations are more than welcome. Best regards

ADD COMMENT • link 11.2 years ago René ▴ 30

Login before adding your answer.