I am trying to analyse the differential gene expression between groups in a complicated multi-factor experiment and am having trouble setting up the contrast matrix. In this experiment, samples were collected from three different geographic origins (Brazil, California and Yemen), were kept under one of four different regimes (ancestral, hot, cold, switch), and RNA was assayed at one of three temperatures (23, 29, 35). I wish for the assay temperature to be treated as a factor rather than covariate.
I initially added the group info during the readDGE stage:
counts <- readDGE(files$Files, header=FALSE, labels = files$Summary, group = files$group)
where 'files' is a spreadsheet of file names, and 'group' is a unique indicator for each possible combination of origin, regime, and assay temp (36 groups). I tried using numeric and character string names for each group, for what it's worth. I did it at this stage because the filterByExpr() function works best when you define groups.
The issue arises at a later stage, with:
design <- model.matrix(~0 + as.factor(files$Assay) + files$Regime + files$Origin)
Which I'm attempting to generate an intercept-free model for each of the groups. The design matrix generated has columns for all three assay temps but is missing a column for regime (only has hot, cold and switch), and for origin (only California and Yemen, no Brazil). I understand that this is because the information is redundant - i.e. the presence of Ancestral is implied with by an absence of hot, cold, or switch, that's how it's encoded. What I can't figure out is how to set up the resulting DE testing, because what would be nice and easy would be:
regime.HotvsAnc <- makeContrasts(Regime.Hot-Regime.Anc, levels=colnames(design)) assay.HotvsAnc.results <- glmQLFTest(fit, contrast = regime.HotvsAnc)
but this is not possible, because there's no ancestral column in the design matrix. Instead I've ended up with this absolute monstrosity, which is based on the files$groups:
regime.HotvsAnc <- makeContrasts( (BraAnc23 + BraAnc29 + BraAnc35 + CalAnc23 + CalAnc29 + CalAnc35 + YemAnc23 + YemAnc29 + YemAnc35)/9 - (BraHot23 + BraHot29 + BraHot35 + CalHot23 + CalHot29 + CalHot35 + YemHot23 + YemHot29 + YemHot35)/9, levels=colnames(design)) regime.HotvsAnc <- glmQLFTest(fit, contrast = regime.HotvsAnc)
Does anyone know how I can take a less convoluted approach? Or is it necessary to make these horrible collections for every DE comparison?
Thank you so much!