edgeR contrasts
1
0
Entering edit mode
ignadb • 0
@70b9b76c
Last seen 2.3 years ago
Denmark

Hi, I am analyzing RNA-seq data of a tree infected with 2 pathogens. The experiment lay-out is like this:

  • condition 1(C): tree + water (control)
  • condition 2(P1): tree + pathogen1
  • condition 3(P1_P2): tree + pathogen1 + pathogen2

I inherited this code and I succeeded using it to call DEGs for condition 2 (contrast = 2-1; 2401 DEGs) and 3 (contrast = 3-1; 5209 DEGs), but when I tried calling DEGs for differences between condition 2 and 3 (contrast = either 2-3 or 3-2), the result is always 0. I expect there are some differences between the two conditions given that, when compared to control, the number of DEGs found in the two conditions are quite different. May you please help enlighten if I have use wrong codes or this is what I can expect if there is no difference between the two conditions?

I also wonder if I really need the line "colnames(design) <- levels(group)" in my code. I tried reading up about a function of colnames, but I don't understand what it actually does.

All suggestions and advice are appreciated. Thanks a lot in advance for your help! :)

Code should be placed in three backticks as shown below

> rawCountTable <- read.table("counts.txt", header=TRUE, sep="\t", row.names="Geneid")
> sampleInfo <- read.table("sampleInfo.csv", header=TRUE, sep=",", row.names="file")
> group <- factor(sampleInfo$condition)
> y <- DGEList(counts=rawCountTable,group=sampleInfo$condition)
> keep <- filterByExpr(y)
> y <- y[keep,,keep.lib.sizes=FALSE]
> y <- calcNormFactors(y, method="TMM")
> design <- model.matrix(~0+group)
$design
         C  P1  P1_P2
1       1   0       0
2       1   0       0
3       1   0       0
4       0   1       0
5       0   1       0
6       0   0       1
7       0   0       1
8       0   0       1
> colnames(design) <- levels(group)
> y <- estimateDisp(y,design,robust=TRUE)
> y$common.dispersion
> fit <- glmQLFit(y,design,robust=TRUE)
> my.contrasts <- makeContrasts(vsP1=P1-C, vsP1_P2=P1_P1-C, vsdiff=(P1_P2-C)-(P1-C), vsdiff2=(P1-C)-(P1_P2-C)), levels=design)
> my.contrasts
  Contrasts
Levels        P1_P2    P1 vsdiff vsdiff2
  control        -1    -1      0       0
  P1              0     1     -1       1
  P1_P2           1     0      1      -1
> qvsdiff <- glmQLFTest(fit, contrast=my.contrasts[,"vsdiff"])
> toptags.qvsdiff <- topTags(qvsdiff, n=nrow(qvsdiff$table),adjust.method="fdr")
> sig005.qvsdiff <- toptags.qvsdiff$table[toptags.qvsdiff$table$FDR<0.05,] 
> sum(toptags.qvsdiff$table$FDR < 0.05)
0
edgeR • 4.2k views
ADD COMMENT
0
Entering edit mode

It hard to be sure what you've done because the output you show doesn't correspond to the code. The design matrix you show is not as output by model.matrix and the contrast matrix you show is not produced by the makeContrasts call that you show. Can you please run the code again and regenerate the output?

ADD REPLY
0
Entering edit mode

Hi Prof. Smyth, Thanks again for your reply. I am sorry for the previous code. I was trying to make the factor names to be easily recognised, but it seems I am not that smart. I am sorry for this. Anyway, here is the more or less original code and in this case, A = control, B = P1 and C = P1_P2. I I tried rerunning it again to be sure that it works with some of the results. :)

Thanks a lot again for your time and have a great day!

> rawCountTable <- read.table("counts.txt", header=TRUE, sep="\t", row.names="Geneid")
> sampleInfo <- read.table("sampleInfo.csv", header=TRUE, sep=",", row.names="file")
> sampleInfo
                  condition group
CH24_B12_C          control     A
CH52_B12_C          control     A
CH113_B12_C         control     A
CH49_B12_P1              P1     B
CH110_B12_P1             P1     B
CH23_B12_P1_P2        P1_P2     C
CH50_B12_P1_P2        P1_P2     C
CH111_B12_P1_P2       P1_P2     C
> group <- factor(sampleInfo$group)
> y <- DGEList(counts=rawCountTable,group=group)
> keep <- filterByExpr(y)
> y <- [keep,,keep.lib.sizes=FALSE]
> y <- calcNormFactors(y)
> design <- model.matrix(~0+group)
> colnames(design) <- levels(group)
> design
   A B C
1 1 0 0
2 1 0 0
3 1 0 0
4 0 1 0
5 0 1 0
6 0 0 1
7 0 0 1
8 0 0 1
attr(,"assign")
[1] 1 1 1
attr(,"contrasts")
attr(,"contrasts")$group
[1] "contr.treatment"
> y <- estimateDisp(y,design)
> fit <- glmQLFit(y,design)
> my.contrasts <- makeContrasts(vsB=B-A, vsC=C-A, vsBACA=(B-A)-(C-A), vsCABA=(C-A)-(B-A), levels=design)
> my.contrasts
  Contrasts
Levels vsB vsC vsBACA vsCABA
    A  -1  -1      0      0
    B   1   0      1     -1
    C   0   1     -1      1
> qlf.vsB <- glmQLFTest(fit, contrast=my.contrasts[,"vsB"])
> toptags.vsB <- topTags(qlf.vsB, n=nrow(qlf.vsB$table),adjust.method="fdr")
> sum(toptags.vsB$table$FDR < 0.05)
[1] 2574
> qlf.vsC <- glmQLFTest(fit, contrast=my.contrasts[,"vsC"])
> toptags.vsC <- topTags(qlf.vsC, n=nrow(qlf.vsC$table),adjust.method="fdr")
> sum(toptags.vsC$table$FDR < 0.05)
[1] 5414
> qlf.vsBACA <- glmQLFTest(fit, contrast=my.contrasts[,"vsBACA"])
> toptags.vsBACA <- topTags(qlf.vsBACA, n=nrow(qlf.vsBACA$table),adjust.method="fdr")
> sum(toptags.vsBACA$table$FDR < 0.05)
[1] 0    
> qlf.vsCABA <- glmQLFTest(fit, contrast=my.contrasts[,"vsCABA"])
> toptags.vsCABA <- topTags(qlf.vsCABA, n=nrow(qlf.vsCABA$table),adjust.method="fdr")
> sum(toptags.vsCABA$table$FDR < 0.05)
[1] 0
ADD REPLY
0
Entering edit mode

OK, thanks. Your code seems fine but do you appreciate that vsBACA=(B-A)-(C-A) and vsCABA=(C-A)-(B-A) are the same contrast (just with signs reversed) and that both are the same as B-C? The control group A just cancels out of both contrasts, so it is unnecessary.

ADD REPLY
0
Entering edit mode

Hi, I am sorry, but I don't understand why they both are B-C. From the contrast matrix, one is B-C and another is C-B. I read the edgeR manual (section 3.3.1) where they use DrugvsPlacebo.2h = (Drug.2h-Drug.0h)-(Placebo.2h-Placebo.0h). I guess the reason the control group A is unnecessary in my case is that I am working with only one control? And for my learning, may you explain me how to achieve the contrast C-B, please?

Thank you very much and have a great day!

ADD REPLY
0
Entering edit mode

B-C and C-B are the same thing, just with reversed signs for the logFCs. They have exactly the same t-statistics, p-values and so on. There is no need to do both.

ADD REPLY
0
Entering edit mode

Hi Prof. Smyth, thanks so much for the explanation. It is clear now for the B-C and C-B. I however don't understand why the control group A is unnecessary. I understand that (B-A)-(C-A) would detail DE genes (after contrasting with group A) that is significant in B but not in C, whereas B-C would detail ANY genes that is differentially expressed in B but not in C. I am sorry for having too many questions, but I hope you can help to correct me.

Thanks again and have a great day!

ADD REPLY
0
Entering edit mode

No, you are over-interpretting the contrasts, adding on interpretations that they don't have. The B-C contrast simply finds genes that are differentially expressed between B and C, nothing more and nothing less. The null hypothesis for B-C is that muB - muC = 0 where muB and muC are the expected log-expressions for the two groups.

You can see that A cancels out just by elementary math: (B-A)-(C-A) = B - A - C + A = B - C + A - A = B - C.

ADD REPLY
0
Entering edit mode

Aha, it is crystal clear now. I just followed the section 3.3.1 in the edgeR manual without being aware that it's another situation there. Thanks so much for your time and the explanation! Have a great day! :)

ADD REPLY
0
Entering edit mode

res1 <- edger_ftest_contrasts(d, type = pheno_frame$Name, contrasts = contrasts, add.means = FALSE, prefix = "T2DvsCon", test = "QLFT") Error in edger_ftest_contrasts(d, type = pheno_frame$Name, contrasts = contrasts, : could not find function "edger_ftest_contrasts" this code doesnt work

ADD REPLY
0
Entering edit mode

Please don't add random comments on 11 month old posts. I've already noted in your other post that this isn't a Bioconductor question, and you should ask for help elsewhere.

ADD REPLY
0
Entering edit mode
@gordon-smyth
Last seen 6 hours ago
WEHI, Melbourne, Australia

I expect there are some differences between the two conditions given that, when compared to control, the number of DEGs found in the two conditions are quite different.

No, that reasoning doesn't follow. It is perfectly possible to get more DE genes for P1_P2 vs C than P1 vs C but to still get no DE genes for P1_P2 vs P1. It may be that there are lots of genes changing in the same direction in both cases just making the FDR cutoff for P1_P2 vs C and just missing out for P1 vs C,.

May you please help enlighten if I have use wrong codes

Your contrast code is unnecessarily complicated but it all looks correct.

or this is what I can expect if there is no difference between the two conditions?

Yes, this what would be expected if P1_P2 and P1 are both very different to C but similar to one another. It would be logical to explore this visually with plotMDS(). See the edgeR workflow for a case study of a complete edgeR analysis with diagnostic plots.

I also wonder if I really need the line colnames(design) <- levels(group) in my code.

Yes you do, if you want to use the column headings C, P1 and P1_P2 when forming contrasts.

I tried reading up about a function of colnames, but I don't understand what it actually does

It simply sets the columns names. Just looking at the results would immediatel show you what it does:

> design <- model.matrix(~0+group)
> design
> colnames(design) <- levels(group)
> design
ADD COMMENT

Login before adding your answer.

Traffic: 406 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6