EdgeR glmTreat error
1
0
Entering edit mode
lucap • 0
@lucap-20484
Last seen 6 months ago
Italy

Hi all, I am using edgeR to identify genes differentially expressed in my experiment. I have cells which were grown in 4 different cell media and I would like to perform an ANOVA to identify DE genes comparing all media vs all (e.g. media1 vs media2; media1 vs media3; media1 vs media4; media2 vs media3....).

I have been reading the manual for edgeR and this is my code:

> library(edgeR)
Loading required package: limma
> library(statmod)
> rawdata <- read.delim("RawCounts_wNames.txt", check.names=FALSE, stringsAsFactors=FALSE)
> gruppi <- sub("-.*S[[:digit:]]+","",colnames(rawdata[,3:14]))
> y <- DGEList(counts=rawdata[,3:14],genes=rawdata[,1:2],group=gruppi)
> keep <- rowSums(cpm(y)>1) >=2   #filtering
> y <- y[keep, , keep.lib.sizes=FALSE]
> rownames(y$counts) <- rownames(y$genes) <- y$genes$Genei  #Replace row.names (==numbers) with gene names.
> y$genes$Geneid <- NULL  #Same as above
> y <- calcNormFactors(y)
> group <- factor(y$samples$group)  #Design for one-way ANOVA and decide contrasts:
> design <- model.matrix(~0+group, data=y$samples)
> colnames(design) <- levels(group)
> design
         H I M T
H-23_S10 1 0 0 0
H-24_S2  1 0 0 0
H-48_S6  1 0 0 0
I-23_S12 0 1 0 0
I-24_S4  0 1 0 0
I-48_S8  0 1 0 0
M-23_S9  0 0 1 0
M-24_S1  0 0 1 0
M-48_S5  0 0 1 0
T-23_S11 0 0 0 1
T-24_S3  0 0 0 1
T-48_S7  0 0 0 1
attr(,"assign")
[1] 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$group
[1] "contr.treatment"
> con <- makeContrasts(HvsI = H - I, HvsM = H - M, HvsT = H - T, levels=design)
> con2 <- makeContrasts(HvsI = H - I, levels=design)
> y <- estimateDisp(y,design,robust=TRUE)
> fit <- glmQLFit(y, design,robust=TRUE)  #Fitta una glm
> qlf <- glmQLFTest(fit, contrast=con)  #Test on glm. Not filtered on FC
> tr <- glmTreat(fit, contrast=con, lfc=log2(1.2))
Error in data.frame(logCPM = glmfit$AveLogCPM, PValue = p.value, row.names = rownames(glmfit)) : 
  row names supplied are of the wrong length

I cannot find any answer in relation to this error so any help would be mostly appreciated. Thanks Luca

edger error • 1.4k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 14 minutes ago
United States

The contrast argument for glmTreat is a vector, not a matrix. In other words, you can incorporate a fold change into a direct comparison between two groups, but it makes no sense in the context of an ANOVA-like test of multiple comparisons.

ADD COMMENT
0
Entering edit mode

Thanks James for your reply. I am sorry but I don't understand what you mean. According to this guide (link) the object con is a matrix...
Plus, sorry, what do you mean with "it makes no sense in the context of an ANOVA-like test of multiple comparisons"? Thanks Luca

ADD REPLY
1
Entering edit mode

I'm not sure how you managed to post this three times, but will you delete the extras?

So yes, a contrast matrix is a matrix, and you will be able to find examples of such things in the edgeR User's Guide. But what you won't find is an example of glmTreat that uses a contrast matrix. Anyway, the help page is the definitive resource anyway, and here's what it has to say:

contrast: numeric vector specifying the contrast of the linear model
          coefficients to be tested against the log2-fold-change
          threshold. Length must equal to the number of columns of
          'design'. If specified, then takes precedence over 'coef'.

Where 'numeric vector' means exactly that - a vector of numbers. The reason for this is you are testing the hypothesis of H0: β ≤ |c| versus the alternative of HA: β > |c|, for some log fold change c. In other words, you are asking for evidence that the log fold change for a particular comparison is greater than some value.

With edgeR (and limma) you can make ANOVA like comparisons where you specify multiple contrasts at one time (by passing in a contrast matrix with more than one column), and it will then perform a test that asks if there is any evidence that the coefficient for any of the specified comparisons is different from zero. So when you use glmQLFTest with a contrast matrix, you are asking for a quasi-likelihood F-test that any of the coefficients are different from zero.

It is not possible (so far as I know) to specify an F-test based on a minimum fold change for a set of comparisons, and even if it is, Gordon hasn't implemented anything like that in edgeR, so that's why it makes no sense to ask for a minimum fold change in the context of an ANOVA-like test.

ADD REPLY
0
Entering edit mode

Sorry for the three posts, I honestly don't know how I did it... Thanks for the detailed explanation, now I get what you mean about the contrast. So basically with glmQLFTest I will find all genes that are significantly differentially expressed (not up-reg or down-reg, just significantly DE) at least in one cell type. Whereas if I want to look in detail one particular comparison (e.g. media1 vs media2), I can perform glmTreat and implement a fold-change filter. Am I right? From a statistical point of view, given my experimental set-up, would this last option (using glmTreat to perform multiple times a comparison with a different couple each time) be correct? Or would it be like doing multiple t-test on a dataset instead of performing an ANOVA? Thanks Luca

ADD REPLY

Login before adding your answer.

Traffic: 749 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6