How to pool subgroups for makeContrasts() and subsequent limma analysis?
2
0
Entering edit mode
René ▴ 30
@rene-5722
Last seen 9.7 years ago
> > Hi Ren?, > > > > > > You are almost there. Note that you want the mean of the three groups, > > not the sum. So > > > > makeContrasts((B1 + B2 + B3)/3 - A) > > > > will e.g., do the comparison of B vs A. > > > > Best, > > > > Jim Dear James, I performed the pooled analysis as you suggested and compared the results to a pure B - A comparison (no subgroups specified). Interestingly, both analyses give different results (497 vs 15 genes with log2FC >= 1 and p < 0.05). Could you explain this huge difference? Best regards, Ren?
• 1.2k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 minutes ago
United States
Hi Rene, On 2/6/2013 11:29 AM, Ren? wrote: >>> Hi Ren?, >>> >>> >>> You are almost there. Note that you want the mean of the three groups, >>> not the sum. So >>> >>> makeContrasts((B1 + B2 + B3)/3 - A) >>> >>> will e.g., do the comparison of B vs A. >>> >>> Best, >>> >>> Jim > Dear James, > > I performed the pooled analysis as you suggested and compared the results to a > pure B - A comparison (no subgroups specified). Interestingly, both analyses > give different results (497 vs 15 genes with log2FC>= 1 and p< 0.05). > Could you explain this huge difference? If I assume that by a pure B-A comparison you redefined your design matrix so you only have three columns (A,B,C), and then did the B-A comparison, then it is simple to explain. I would also guess that the C-A comparison gives different results as well, depending on how you define your design matrix. Note that the contrast calculates the difference between the means of the two groups in the numerator and a measure of intra-group variability in the denominator. So in heuristic terms, the numerator says how different the groups are, and the denominator tells you if that difference is 'large' or not, by comparing to the within group variability. So if the groups are really 'tight' then a small difference in means might result in a significant test, but if the groups are really variable then the mean differences have to be pretty big as well to achieve significance. How you define your groups has no bearing on the numerator, because the difference of B-A is the same if you do B-A or if you do (B1+B2+B3)/3-A. However, the denominator may well be quite different, depending on the B1, B2, and B3 groups. In the instance where you did (B1+B2+B3)/3-A, the intra-group variability for the denominator is based in the variability within the A, B1, B2, B3, and C groups. So if all the B-type groups are pretty tight, then you will likely get more differentially expressed genes. If you do the 'pure' B-A comparison, then the denominator is based on the intra-group variability of the A,B,C groups. If the B1, B2, B3 groups are pretty tight, but not really similar, then the combined B group will be highly variable, so your denominator will tend to be larger, resulting in fewer differentially expressed genes. Since the denominator is the same for all contrasts, I would imagine the C-A comparison has fewer genes as well. Does that help? Best, Jim > > Best regards, > Ren? > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT
0
Entering edit mode
Dear James, > Hi Rene, > > On 2/6/2013 11:29 AM, Ren? wrote: >> Dear James, >> >> I performed the pooled analysis as you suggested and compared the >> results to a >> pure B - A comparison (no subgroups specified). Interestingly, both >> analyses >> give different results (497 vs 15 genes with log2FC>= 1 and p< 0.05). >> Could you explain this huge difference? > > If I assume that by a pure B-A comparison you redefined your design > matrix so you only have three columns (A,B,C), and then did the B-A > comparison, then it is simple to explain. I would also guess that the > C-A comparison gives different results as well, depending on how you > define your design matrix. > > Note that the contrast calculates the difference between the means of > the two groups in the numerator and a measure of intra-group > variability in the denominator. So in heuristic terms, the numerator > says how different the groups are, and the denominator tells you if > that difference is 'large' or not, by comparing to the within group > variability. So if the groups are really 'tight' then a small > difference in means might result in a significant test, but if the > groups are really variable then the mean differences have to be pretty > big as well to achieve significance. > > How you define your groups has no bearing on the numerator, because > the difference of B-A is the same if you do B-A or if you do > (B1+B2+B3)/3-A. However, the denominator may well be quite different, > depending on the B1, B2, and B3 groups. > > In the instance where you did (B1+B2+B3)/3-A, the intra-group > variability for the denominator is based in the variability within the > A, B1, B2, B3, and C groups. So if all the B-type groups are pretty > tight, then you will likely get more differentially expressed genes. > > If you do the 'pure' B-A comparison, then the denominator is based on > the intra-group variability of the A,B,C groups. If the B1, B2, B3 > groups are pretty tight, but not really similar, then the combined B > group will be highly variable, so your denominator will tend to be > larger, resulting in fewer differentially expressed genes. Since the > denominator is the same for all contrasts, I would imagine the C-A > comparison has fewer genes as well. > > Does that help? > > Best, > > Jim > >> >> Best regards, >> Ren? Thank you for your very detailed explanation. Unfortunately, I observe the opposite result, so more genes are found when testing B - A than (B1+B2+B3)/3 - A. If I understand your explanation correctly, it means that I select more stringently in case of (B1+B2+B3)/3 - A due to a higher variation between the subgroups. Therefore, lowering the cutoff values would again correct my list of genes. Unfortunately I am doing a meta analysis of two independent data sets and I want to apply the same cutoff values for both data sets. This in turn would increase my second result list (same group comparison, i.e. B - A) from ~ 200 to a couple of thousand genes and thereby also introduce additional noise. Hence my question is: is there a possibility to somehow combine the results of both comparisons? Or is there a way to correct for the increased variance between the subgroups? Best regards, Ren?
ADD REPLY
0
Entering edit mode
René ▴ 30
@rene-5722
Last seen 9.7 years ago
Dear all, As I am still stuck on this topic, I would like to ask whether it is OK to completely replace the comparison (B1+B2+B3)/3 - A by a separate run of B-A in order to obtain some meaningful results. By heart I would say that this is not the case, as it would mean combining two independent design matrices and therefore two different models. Nevertheless, I am running out of ideas on how to perform this analysis and I would like to finish it by the end of February. Any recommendations are more than welcome. Best regards
ADD COMMENT

Login before adding your answer.

Traffic: 865 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6