Question: why two different makeContrasts have the same results?
0
3.4 years ago by
xiaoaozqd0 wrote:

When I used the edgeR analyzed the DE genes, there are two different Contrasts:

con_RS <- makeContrasts(
(groupingNR.32.18hpi-groupingNR.MK.0hpi)-(groupingNR.26.18hpi-groupingNR.MK.0hpi),
(groupingNR.32.18hpi-groupingNR.MK.0hpi)-(groupingNS.32.18hpi-groupingNS.MK.0hpi),
(groupingNR.32.24hpi-groupingNR.MK.0hpi)-(groupingNR.26.24hpi-groupingNR.MK.0hpi),
(groupingNR.32.24hpi-groupingNR.MK.0hpi)-(groupingNS.32.24hpi-groupingNS.MK.0hpi),
(groupingNR.32.48hpi-groupingNR.MK.0hpi)-(groupingNR.26.48hpi-groupingNR.MK.0hpi),
(groupingNR.32.48hpi-groupingNR.MK.0hpi)-(groupingNS.32.48hpi-groupingNS.MK.0hpi),
(groupingNR.32.96hpi-groupingNR.MK.0hpi)-(groupingNR.26.96hpi-groupingNR.MK.0hpi),
(groupingNR.32.96hpi-groupingNR.MK.0hpi)-(groupingNS.32.96hpi-groupingNS.MK.0hpi),
(groupingNR.32.168hpi-groupingNR.MK.0hpi)-(groupingNR.26.168hpi-groupingNR.MK.0hpi),
(groupingNR.32.168hpi-groupingNR.MK.0hpi)-(groupingNS.32.168hpi-groupingNS.MK.0hpi),
levels=design)

lrt_RS <- glmLRT(f,contrast=con_RS)
levels=d_RS <- glmLRT(f,contrast=con_RS)
tt_RS <- topTags(lrt_RS, n=Inf)$table write.table(tt_RS, file="LRT_RS.xls", row.names=FALSE, sep="\t", quote=FALSE) con_RS_3 <- makeContrasts( (groupingNR.32.18hpi-groupingNR.MK.0hpi)-(groupingNR.26.18hpi-groupingNR.MK.0hpi), (groupingNR.32.18hpi-groupingNR.MK.0hpi)-(groupingNS.32.18hpi-groupingNS.MK.0hpi), (groupingNS.32.18hpi-groupingNS.MK.0hpi)-(groupingNR.26.18hpi-groupingNR.MK.0hpi), (groupingNR.32.24hpi-groupingNR.MK.0hpi)-(groupingNR.26.24hpi-groupingNR.MK.0hpi), (groupingNR.32.24hpi-groupingNR.MK.0hpi)-(groupingNS.32.24hpi-groupingNS.MK.0hpi), (groupingNS.32.24hpi-groupingNS.MK.0hpi)-(groupingNR.26.24hpi-groupingNR.MK.0hpi), (groupingNR.32.48hpi-groupingNR.MK.0hpi)-(groupingNR.26.48hpi-groupingNR.MK.0hpi), (groupingNR.32.48hpi-groupingNR.MK.0hpi)-(groupingNS.32.48hpi-groupingNS.MK.0hpi), (groupingNS.32.48hpi-groupingNS.MK.0hpi)-(groupingNR.26.48hpi-groupingNR.MK.0hpi), (groupingNR.32.96hpi-groupingNR.MK.0hpi)-(groupingNR.26.96hpi-groupingNR.MK.0hpi), (groupingNR.32.96hpi-groupingNR.MK.0hpi)-(groupingNS.32.96hpi-groupingNS.MK.0hpi), (groupingNS.32.96hpi-groupingNS.MK.0hpi)-(groupingNR.26.96hpi-groupingNR.MK.0hpi), (groupingNR.32.168hpi-groupingNR.MK.0hpi)-(groupingNR.26.168hpi-groupingNR.MK.0hpi), (groupingNR.32.168hpi-groupingNR.MK.0hpi)-(groupingNS.32.168hpi-groupingNS.MK.0hpi), (groupingNS.32.168hpi-groupingNS.MK.0hpi)-(groupingNR.26.168hpi-groupingNR.MK.0hpi), levels=design) lrt_RS_3 <- glmLRT(f,contrast=con_RS_3) tt_RS_3 <- topTags(lrt_RS_3, n=Inf)$table
write.table(tt_RS_3, file="LRT_RS_3.xls", row.names=FALSE, sep="\t", quote=FALSE)

While the file "LRT_RS.xls" and "LRT_RS_3.xls" have the same result.

The first row are both "genes    logFC..groupingNR.32.18hpi...groupingNR.MK.0hpi.....groupingNR.26.18hpi...groupingNR.MK.0hpi.    logFC..groupingNR.32.18hpi...groupingNR.MK.0hpi.....groupingNS.32.18hpi...groupingNS.MK.0hpi.    logFC..groupingNR.32.24hpi...groupingNR.MK.0hpi.....groupingNR.26.24hpi...groupingNR.MK.0hpi.    logFC..groupingNR.32.24hpi...groupingNR.MK.0hpi.....groupingNS.32.24hpi...groupingNS.MK.0hpi.    logFC..groupingNR.32.48hpi...groupingNR.MK.0hpi.....groupingNR.26.48hpi...groupingNR.MK.0hpi.    logFC..groupingNR.32.48hpi...groupingNR.MK.0hpi.....groupingNS.32.48hpi...groupingNS.MK.0hpi.    logFC..groupingNR.32.96hpi...groupingNR.MK.0hpi.....groupingNR.26.96hpi...groupingNR.MK.0hpi.    logFC..groupingNR.32.96hpi...groupingNR.MK.0hpi.....groupingNS.32.96hpi...groupingNS.MK.0hpi.    logFC..groupingNR.32.168hpi...groupingNR.MK.0hpi.....groupingNR.26.168hpi...groupingNR.MK.0hpi.    logFC..groupingNR.32.168hpi...groupingNR.MK.0hpi.....groupingNS.32.168hpi...groupingNS.MK.0hpi.    logCPM    LR    PValue    FDR"

the file "LRT_RS_3.xls" without the except colume "

logFC..groupingNS.32.18hpi...groupingNS.MK.0hpi.....groupingNR.26.18hpi...groupingNR.MK.0hpi.    logFC..groupingNS.32.24hpi...groupingNS.MK.0hpi.....groupingNR.26.24hpi...groupingNR.MK.0hpi.    logFC..groupingNS.32.48hpi...groupingNS.MK.0hpi.....groupingNR.26.48hpi...groupingNR.MK.0hpi.    logFC..groupingNS.32.96hpi...groupingNS.MK.0hpi.....groupingNR.26.96hpi...groupingNR.MK.0hpi.    logFC..groupingNS.32.168hpi...groupingNS.MK.0hpi.....groupingNR.26.168hpi...groupingNR.MK.0hpi."

I want to know thre are some error about my code?

edger makecontrasts • 634 views
modified 3.4 years ago • written 3.4 years ago by xiaoaozqd0

plant = c(rep("NR",3),rep("NS",3),rep("NR",15),rep("NS",15),rep("NR",15))

fungi <- c(rep("MK",6),rep("32",30),rep("26",15))

time <- c(rep("0hpi",6),rep("18hpi",3),rep("24hpi",3),rep("48hpi",3),rep("96hpi",3),rep("168hpi",3),rep("18hpi",3),rep("24hpi",3),rep("48hpi",3),rep("96hpi",3),rep("168hpi",3),rep("18hpi",3),rep("24hpi",3),rep("48hpi",3),rep("96hpi",3),rep("168hpi",3))

batch <-  c(rep(c("a","b","c"),17))

batch <- factor(batch)

grouping <- factor(paste(plant, fungi, time, sep="."))

grouping

d <- DGEList(counts = counts[,2:52], genes = counts[,1], group = grouping)

dim(d)

keep <- rowSums(cpm(d)>1)>=3

d <- d[keep,]

dim(d)

d  =  calcNormFactors(d)

design <- model.matrix(~ 0 + grouping + batch)

d = estimateGLMCommonDisp(d,design)

d = estimateGLMTrendedDisp(d, design)

d = estimateGLMTagwiseDisp(d, design)

f = glmFit(d, design)

Answer: why two different makeContrasts have the same results?
0
3.4 years ago by
Aaron Lun25k
Cambridge, United Kingdom
Aaron Lun25k wrote:

Let's take the "18hpi" contrasts as an example ("hours post-infection", I presume). In your first set of contrasts, you have two contrasts involving the 18hpi groups:

(groupingNR.32.18hpi-groupingNR.MK.0hpi)-(groupingNR.26.18hpi-groupingNR.MK.0hpi)
(groupingNR.32.18hpi-groupingNR.MK.0hpi)-(groupingNS.32.18hpi-groupingNS.MK.0hpi)

Under the null hypothesis, both of these expressions are equal to zero. A bit of arithmetic will give you:

(groupingNR.32.18hpi-groupingNR.MK.0hpi)-(groupingNR.26.18hpi-groupingNR.MK.0hpi)
- [(groupingNR.32.18hpi-groupingNR.MK.0hpi)-(groupingNS.32.18hpi-groupingNS.MK.0hpi)]
= -(groupingNR.26.18hpi-groupingNR.MK.0hpi) + (groupingNS.32.18hpi-groupingNS.MK.0hpi)
= (groupingNS.32.18hpi-groupingNS.MK.0hpi) -(groupingNR.26.18hpi-groupingNR.MK.0hpi)
= 0 # as each expression is equal to zero.


... which is the same as the extra contrast for 18hpi in your second set:

(groupingNS.32.18hpi-groupingNS.MK.0hpi)-(groupingNR.26.18hpi-groupingNR.MK.0hpi)

In other words, if you've specified the first two contrasts, the extra contrast is implied and does not need to be specified explicitly. This is why you end up with the same significance results from both sets of contrasts.

Thank you very much for your explain!

You got it,"hpi" symbol as "hours post-infection" , "NR/NS" symbol as the different plant, "32/26" symbol as different fungi.

You means in the "con_RS" the software had been contrasts between groupingNR.32.xx-groupingNS.32.xx, groupingNR.32.xx-groupingNR.26.xx and groupingNS.32.xx-groupingNR.26.xx, while how can I get the result of  except colume "

logFC..groupingNS.32.18hpi...groupingNS.MK.0hpi.....groupingNR.26.18hpi...groupingNR.MK.0hpi.    logFC..groupingNS.32.24hpi...groupingNS.MK.0hpi.....groupingNR.26.24hpi...groupingNR.MK.0hpi.    logFC..groupingNS.32.48hpi...groupingNS.MK.0hpi.....groupingNR.26.48hpi...groupingNR.MK.0hpi.    logFC..groupingNS.32.96hpi...groupingNS.MK.0hpi.....groupingNR.26.96hpi...groupingNR.MK.0hpi.    logFC..groupingNS.32.168hpi...groupingNS.MK.0hpi.....groupingNR.26.168hpi...groupingNR.MK.0hpi."

Now I just use another contrast

con_NS.32_NR.26 <- makeContrasts(
(groupingNS.32.18hpi-groupingNS.MK.0hpi)-(groupingNR.26.18hpi-groupingNR.MK.0hpi),
(groupingNS.32.24hpi-groupingNS.MK.0hpi)-(groupingNR.26.24hpi-groupingNR.MK.0hpi),
(groupingNS.32.48hpi-groupingNS.MK.0hpi)-(groupingNR.26.48hpi-groupingNR.MK.0hpi),
(groupingNS.32.96hpi-groupingNS.MK.0hpi)-(groupingNR.26.96hpi-groupingNR.MK.0hpi),
(groupingNS.32.168hpi-groupingNS.MK.0hpi)-(groupingNR.26.168hpi-groupingNR.MK.0hpi),
levels=design)

While in the result of con_RS there are a set of DE genes (groupingNR.32.xx-groupingNS.32.xx, groupingNR.32.xx-groupingNR.26.xx, about 50,000 DE genes with FDR<0.05 in same file with the same PValue and FDR), in the result of con_NS.32_NR.26 there are another set of DE genes (groupingNS.32.xx-groupingNR.26.xx,about 30,000 DE genes with FDR<0.05 in one file has different PValue and FDR from groupingNR.32.xx-groupingNS.32.xx, groupingNR.32.xx-groupingNR.26.xx in con_RS).

1.If the "con_NS.32_NR.26" is necessary?

2.If the "con_NS.32_NR.26" is not necessary. Since "con_RS" is same as "con_RS", can just use the result of "con_RS" replace the the result of "con_NS.32_NR.26". In this sistuation how can I get the logFC(groupingNS.32.xx-groupingNR.26.xx)?

3.To make contrast of the different time, three other contrast have been introduced:

con_NS.32 <- makeContrasts(
(groupingNS.32.18hpi-groupingNS.MK.0hpi)  ,
(groupingNS.32.24hpi-groupingNS.MK.0hpi)  ,
(groupingNS.32.48hpi-groupingNS.MK.0hpi)  ,
(groupingNS.32.96hpi-groupingNS.MK.0hpi)  ,
(groupingNS.32.168hpi-groupingNS.MK.0hpi) ,
levels=design)

con_NR.32 <- makeContrasts(
(groupingNR.32.18hpi-groupingNR.MK.0hpi)  ,
(groupingNR.32.24hpi-groupingNR.MK.0hpi)  ,
(groupingNR.32.48hpi-groupingNR.MK.0hpi)  ,
(groupingNR.32.96hpi-groupingNR.MK.0hpi)  ,
(groupingNR.32.168hpi-groupingNR.MK.0hpi) ,
levels=design)

con_NR.26 <- makeContrasts(
(groupingNR.26.18hpi-groupingNR.MK.0hpi)  ,
(groupingNR.26.24hpi-groupingNR.MK.0hpi)  ,
(groupingNR.26.48hpi-groupingNR.MK.0hpi)  ,
(groupingNR.26.96hpi-groupingNR.MK.0hpi)  ,
(groupingNR.26.168hpi-groupingNR.MK.0hpi) ,
levels=design)

Are  they correct?

Thank you very much again!

I don't understand some of your English (what is "except colume"?) so I'm just going to guess at what you're trying to ask:

1. The result of DE testing with con_NS.32_NR.26 will obviously be different from the test with con_RS, as it contains sets of contrasts that are arithmetically unique. My point in my first answer was that testing with con_RS is the same as testing with con_RS_3, because those two sets of contrasts are mathematically identical (i.e., the extra contrasts in con_RS_3 can be derived from the existing ones in con_RS). So I don't know what you mean when you ask if con_NS.32_NR.26 is "necessary", because it's testing something completely different.
2. glmLRT will automatically remove redundant contrasts, which is why the NS.32/NR.26 contrasts don't show up in your result table. You can get the fold change of those contrasts by using con_NS.32_NR.26. That is to say, you run con_RS through glmLRT and collect the log-fold changes and p-values; then you run con_NS.32_NR.26 through glmLRT and collect only the log-fold changes. (Best to use sort.by="none" in topTags to ensure that the rows match up when you cbind the second set of log-fold changes to the first set of log-fold changes.)
3. I guess so, if you're testing for a time effect within each fungi/plant combination.

Thank you very much for your explain and sorry for the bad English!

"except colume" is the result of the  fold change between goupingNS.32.xx and goupingNR.26.xx, you guess  is very accurate.

My last question is about the FDR. Can I use both the FDR and p-values in the result of con_RS for con_NS.32_NR.26? Forgive me for the poor knowledge about statistics, I'm just not so sure!

Thank you very much for so many times help！

For the comparison you're trying to do, the only p-values/FDRs you should be using are those from con_RS (or equivalently, con_RS_3). The only reason you're using con_NS.32_NR.26 at all is to get the log-fold changes. It doesn't make sense to "use" the p-values from con_RS for con_NS.32_NR.26; once you take the log-fold changes from the con_NS.32_NR.26, you don't need the other statistics from that set of results, so why would you bother replacing the p-values? Just cbind the extracted log-fold changes to the results from con_RS (where the p-values are of interest) and do the rest of your work on the combined results.

Answer: why two different makeContrasts have the same results?
0
3.4 years ago by
xiaoaozqd0 wrote:

I will extract log-fold changes from con_NS.32_NR.26 and combine them to the result of con_RS with correct order. At last I will get all of the LogFC combined by con_NS.32_NR.26 and con_RS, while the p-values/FDRs just from the result of con_RS. Then I will just analysis the DE genes with FDR<0.05.

Thank you again!