Hi all, I am using edgeR to identify genes differentially expressed in my experiment. I have cells which were grown in 4 different cell media and I would like to perform an ANOVA to identify DE genes comparing all media vs all (e.g. media1 vs media2; media1 vs media3; media1 vs media4; media2 vs media3....).

I have been reading the manual for edgeR and this is my code:

```
> library(edgeR)
Loading required package: limma
> library(statmod)
> rawdata <- read.delim("RawCounts_wNames.txt", check.names=FALSE, stringsAsFactors=FALSE)
> gruppi <- sub("-.*S[[:digit:]]+","",colnames(rawdata[,3:14]))
> y <- DGEList(counts=rawdata[,3:14],genes=rawdata[,1:2],group=gruppi)
> keep <- rowSums(cpm(y)>1) >=2 #filtering
> y <- y[keep, , keep.lib.sizes=FALSE]
> rownames(y$counts) <- rownames(y$genes) <- y$genes$Genei #Replace row.names (==numbers) with gene names.
> y$genes$Geneid <- NULL #Same as above
> y <- calcNormFactors(y)
> group <- factor(y$samples$group) #Design for one-way ANOVA and decide contrasts:
> design <- model.matrix(~0+group, data=y$samples)
> colnames(design) <- levels(group)
> design
H I M T
H-23_S10 1 0 0 0
H-24_S2 1 0 0 0
H-48_S6 1 0 0 0
I-23_S12 0 1 0 0
I-24_S4 0 1 0 0
I-48_S8 0 1 0 0
M-23_S9 0 0 1 0
M-24_S1 0 0 1 0
M-48_S5 0 0 1 0
T-23_S11 0 0 0 1
T-24_S3 0 0 0 1
T-48_S7 0 0 0 1
attr(,"assign")
[1] 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$group
[1] "contr.treatment"
> con <- makeContrasts(HvsI = H - I, HvsM = H - M, HvsT = H - T, levels=design)
> con2 <- makeContrasts(HvsI = H - I, levels=design)
> y <- estimateDisp(y,design,robust=TRUE)
> fit <- glmQLFit(y, design,robust=TRUE) #Fitta una glm
> qlf <- glmQLFTest(fit, contrast=con) #Test on glm. Not filtered on FC
> tr <- glmTreat(fit, contrast=con, lfc=log2(1.2))
Error in data.frame(logCPM = glmfit$AveLogCPM, PValue = p.value, row.names = rownames(glmfit)) :
row names supplied are of the wrong length
```

I cannot find any answer in relation to this error so any help would be mostly appreciated. Thanks Luca

Thanks James for your reply. I am sorry but I don't understand what you mean. According to this guide (link) the object

`con`

is a matrix...Plus, sorry, what do you mean with "it makes no sense in the context of an ANOVA-like test of multiple comparisons"? Thanks Luca

I'm not sure how you managed to post this three times, but will you delete the extras?

So yes, a contrast matrix is a matrix, and you will be able to find examples of such things in the

edgeR User's Guide. But what youwon'tfind is an example of`glmTreat`

that uses a contrast matrix. Anyway, the help page is the definitive resource anyway, and here's what it has to say:Where 'numeric vector' means exactly that - a vector of numbers. The reason for this is you are testing the hypothesis of H

_{0}: β ≤ |c| versus the alternative of H_{A}: β > |c|, for some log fold change c. In other words, you are asking for evidence that the log fold change fora particular comparisonis greater than some value.With edgeR (and limma) you can make ANOVA like comparisons where you specify multiple contrasts at one time (by passing in a contrast matrix with more than one column), and it will then perform a test that asks if there is any evidence that the coefficient for any of the specified comparisons is different from zero. So when you use

`glmQLFTest`

with a contrast matrix, you are asking for a quasi-likelihood F-test that any of the coefficients are different from zero.It is not possible (so far as I know) to specify an F-test based on a minimum fold change for a set of comparisons, and even if it is, Gordon hasn't implemented anything like that in edgeR, so that's why it makes no sense to ask for a minimum fold change in the context of an ANOVA-like test.

Sorry for the three posts, I honestly don't know how I did it... Thanks for the detailed explanation, now I get what you mean about the

`contrast`

. So basically with`glmQLFTest`

I will find all genes that are significantly differentially expressed (not up-reg or down-reg, just significantly DE) at least in one cell type. Whereas if I want to look in detail one particular comparison (e.g. media1 vs media2), I can perform`glmTreat`

and implement a fold-change filter. Am I right? From a statistical point of view, given my experimental set-up, would this last option (using`glmTreat`

to perform multiple times a comparison with a different couple each time) be correct? Or would it be like doing multiple t-test on a dataset instead of performing an ANOVA? Thanks Luca