Dear all,
I would like to get some advice over how to do a post hoc test on a nested interaction model. This is related to another question I asked previously ( Nested design in edgeR ) .
The question is similar to the previous one (with higher number of levels), we would like to know how do different brain regions (13 regions) correlate to the level of a protein B measure by immuno-staining.
Here is the current approach i am taking:
## Nested model B_region_design=model.matrix(~0+region+B:region,meta) ## calculate commond, trend, tagwise dispersion for EdgeR model fitting subcounts_disp=estimateDisp(subcounts,B_region_design,robust=T) fit=glmQLFit(design = B_region_design,y = subcounts_disp,robust = T) ## interaction coefficient starts from column 14 to 26 res_QL=glmQLFTest(fit,coef =14:26) tb=topTags(res_QL,n= Inf )$table ## create a contrast matrix with pairwise comparison between all regions head(contrast)
| COM:B-AUD:B | CTXsp:B-AUD:B | ENTI:B-AUD:B | FB:B-AUD:B | HPd:B-AUD:B | HPs:B-AUD:B | HY:B-AUD:B | OLF:B-AUD:B | PTL:B-AUD:B | RSP:B-AUD:B | ⋯ | PTL:B-OLF:B | RSP:B-OLF:B | SSp:B-OLF:B | TH:B-OLF:B | RSP:B-PTL:B | SSp:B-PTL:B | TH:B-PTL:B | SSp:B-RSP:B | TH:B-RSP:B | TH:B-SSp:B | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AUD | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 
| COM | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 
| CTXsp | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 
| ENTI | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 
| FB | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 
| HPd | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 
##
res_QL_contrast=glmQLFTest(fit,contrast = contrast)
I used res_QL_contrast to find genes which have a regional difference in association with protein B, and the topTags outputs a table with contains logFC for each comparison and a single pvalue.
So i guess any significant test will only indicate that there is at least one comparison (between two regions) are different.
And to know which two regions are significantly different, I have to test on individual contrast separately.
I am planning to do something as the following :
toptags_list[]
for (i in 1:ncol(contrast){
 res_list=glmQLFTest(fit,contrast = level2_contrast[,i])
 toptags_list[i]=topTags(res_list,n=Inf)$table
}
And after finished testing all comparisons, I will concatenate all the tables in the toptags_list and run a global FDR p-values correction. 
I would like to know if this a valid way to do post-hoc comparison.
And also is there a better way of doing this analysis, as my data is very large (40000 genes 4700 samples), and doing all these comparison in one combined contrast already taking a long time, not mentioning if I want to do it by individual columns.
Thank you very much in advance,
Ashley
