Question

EdgeR glmTreat error

0

Entering edit mode

lucap • 0

@lucap-20484

Last seen 2.3 years ago

Italy

Hi all, I am using edgeR to identify genes differentially expressed in my experiment. I have cells which were grown in 4 different cell media and I would like to perform an ANOVA to identify DE genes comparing all media vs all (e.g. media1 vs media2; media1 vs media3; media1 vs media4; media2 vs media3....).

I have been reading the manual for edgeR and this is my code:

> library(edgeR)
Loading required package: limma
> library(statmod)
> rawdata <- read.delim("RawCounts_wNames.txt", check.names=FALSE, stringsAsFactors=FALSE)
> gruppi <- sub("-.*S[[:digit:]]+","",colnames(rawdata[,3:14]))
> y <- DGEList(counts=rawdata[,3:14],genes=rawdata[,1:2],group=gruppi)
> keep <- rowSums(cpm(y)>1) >=2   #filtering
> y <- y[keep, , keep.lib.sizes=FALSE]
> rownames(y$counts) <- rownames(y$genes) <- y$genes$Genei  #Replace row.names (==numbers) with gene names.
> y$genes$Geneid <- NULL  #Same as above
> y <- calcNormFactors(y)
> group <- factor(y$samples$group)  #Design for one-way ANOVA and decide contrasts:
> design <- model.matrix(~0+group, data=y$samples)
> colnames(design) <- levels(group)
> design
         H I M T
H-23_S10 1 0 0 0
H-24_S2  1 0 0 0
H-48_S6  1 0 0 0
I-23_S12 0 1 0 0
I-24_S4  0 1 0 0
I-48_S8  0 1 0 0
M-23_S9  0 0 1 0
M-24_S1  0 0 1 0
M-48_S5  0 0 1 0
T-23_S11 0 0 0 1
T-24_S3  0 0 0 1
T-48_S7  0 0 0 1
attr(,"assign")
[1] 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$group
[1] "contr.treatment"
> con <- makeContrasts(HvsI = H - I, HvsM = H - M, HvsT = H - T, levels=design)
> con2 <- makeContrasts(HvsI = H - I, levels=design)
> y <- estimateDisp(y,design,robust=TRUE)
> fit <- glmQLFit(y, design,robust=TRUE)  #Fitta una glm
> qlf <- glmQLFTest(fit, contrast=con)  #Test on glm. Not filtered on FC
> tr <- glmTreat(fit, contrast=con, lfc=log2(1.2))
Error in data.frame(logCPM = glmfit$AveLogCPM, PValue = p.value, row.names = rownames(glmfit)) : 
  row names supplied are of the wrong length

I cannot find any answer in relation to this error so any help would be mostly appreciated. Thanks Luca

edger error • 2.2k views

ADD COMMENT • link updated 6.7 years ago by James W. MacDonald 68k • written 6.7 years ago by lucap • 0

score 1 · Answer 1 · 2019-04-10

1

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 4 days ago

United States

The contrast argument for glmTreat is a vector, not a matrix. In other words, you can incorporate a fold change into a direct comparison between two groups, but it makes no sense in the context of an ANOVA-like test of multiple comparisons.

ADD COMMENT • link 6.7 years ago James W. MacDonald 68k

0

Entering edit mode

Thanks James for your reply. I am sorry but I don't understand what you mean. According to this guide (link) the object con is a matrix...
Plus, sorry, what do you mean with "it makes no sense in the context of an ANOVA-like test of multiple comparisons"? Thanks Luca

ADD REPLY • link 6.7 years ago lucap • 0

1

Entering edit mode

I'm not sure how you managed to post this three times, but will you delete the extras?

So yes, a contrast matrix is a matrix, and you will be able to find examples of such things in the edgeR User's Guide. But what you won't find is an example of glmTreat that uses a contrast matrix. Anyway, the help page is the definitive resource anyway, and here's what it has to say:

contrast: numeric vector specifying the contrast of the linear model
          coefficients to be tested against the log2-fold-change
          threshold. Length must equal to the number of columns of
          'design'. If specified, then takes precedence over 'coef'.

Where 'numeric vector' means exactly that - a vector of numbers. The reason for this is you are testing the hypothesis of H₀: β ≤ |c| versus the alternative of H_A: β > |c|, for some log fold change c. In other words, you are asking for evidence that the log fold change for a particular comparison is greater than some value.

With edgeR (and limma) you can make ANOVA like comparisons where you specify multiple contrasts at one time (by passing in a contrast matrix with more than one column), and it will then perform a test that asks if there is any evidence that the coefficient for any of the specified comparisons is different from zero. So when you use glmQLFTest with a contrast matrix, you are asking for a quasi-likelihood F-test that any of the coefficients are different from zero.

It is not possible (so far as I know) to specify an F-test based on a minimum fold change for a set of comparisons, and even if it is, Gordon hasn't implemented anything like that in edgeR, so that's why it makes no sense to ask for a minimum fold change in the context of an ANOVA-like test.

ADD REPLY • link 6.7 years ago James W. MacDonald 68k

0

Entering edit mode

Sorry for the three posts, I honestly don't know how I did it... Thanks for the detailed explanation, now I get what you mean about the contrast. So basically with glmQLFTest I will find all genes that are significantly differentially expressed (not up-reg or down-reg, just significantly DE) at least in one cell type. Whereas if I want to look in detail one particular comparison (e.g. media1 vs media2), I can perform glmTreat and implement a fold-change filter. Am I right? From a statistical point of view, given my experimental set-up, would this last option (using glmTreat to perform multiple times a comparison with a different couple each time) be correct? Or would it be like doing multiple t-test on a dataset instead of performing an ANOVA? Thanks Luca

ADD REPLY • link 6.7 years ago lucap • 0