Hi, There:
We have RNAseq data with 4 cell lines with different knockout genotypes (WT, A, G, GA), and for each cell lines, we have two treatment types: treated (H) and untreated (C) with 5 replicates for each condition. we are interested in the treatment effect on each of these cell lines: WT.H_C, A.H_C, G.H_C, or GA.H_C (H_C mean treatment vs untreat). These are easily to be done in edgeR (see the code below). But we are also interested to see if there is difference between the treatment effects between different cell lines. for example, is there any difference between the treatment effect on cell line A (A.H_C) and the treatment effect on cell line GA (GA.H_C)? also, is there difference between the treatment effect on cell line WT (WT.H_C) and thetreatment effect on cell line G(G.H_C)? Although we can directly compare the derived DEGs of A.H_C and DEGs of GA.H_C for the 1st question, and directly compare the derived DEGs of WT.H_C and DEGs of G.H_C for the 2nd question, but I wonder if we can do this using edgeR directly and if the DEGs derived from such complicated contrast are what we are interested in terms of biological question that I asked earlier: if there is difference between the treatment effects between different cell lines, please see the code below for the contrast settings for intended complicated contrasts G_WT.H_C and GA_A.H_C:
G_WT.H_C=(H.G-C.G)-(H.WT-C.WT)
GA_A.H_C=(H.GA-C.GA)-(H.A-C.A)
any advice/suggestion would be greatly appreciated!!
Thx in advance and regards!
Mike
Code should be placed in three backticks as shown below
##the critical part of the codes are below
> show(tar5)
Types KO TypesLines Types.KOStat
C.A.1 C A C.A.1 C.A
C.A.2 C A C.A.2 C.A
C.A.3 C A C.A.3 C.A
C.A.4 C A C.A.4 C.A
C.A.5 C A C.A.5 C.A
C.G.1 C G C.G.1 C.G
C.G.2 C G C.G.2 C.G
C.G.3 C G C.G.3 C.G
C.G.4 C G C.G.4 C.G
C.G.5 C G C.G.5 C.G
C.GA.1 C GA C.GA.1 C.GA
C.GA.2 C GA C.GA.2 C.GA
C.GA.3 C GA C.GA.3 C.GA
C.GA.4 C GA C.GA.4 C.GA
C.GA.5 C GA C.GA.5 C.GA
C.WT.1 C WT C.WT.1 C.WT
C.WT.2 C WT C.WT.2 C.WT
C.WT.3 C WT C.WT.3 C.WT
C.WT.4 C WT C.WT.4 C.WT
C.WT.5 C WT C.WT.5 C.WT
H.A.1 H A H.A.1 H.A
H.A.2 H A H.A.2 H.A
H.A.3 H A H.A.3 H.A
H.A.4 H A H.A.4 H.A
H.A.5 H A H.A.5 H.A
H.G.1 H G H.G.1 H.G
H.G.2 H G H.G.2 H.G
H.G.3 H G H.G.3 H.G
H.G.4 H G H.G.4 H.G
H.G.5 H G H.G.5 H.G
H.GA.1 H GA H.GA.1 H.GA
H.GA.2 H GA H.GA.2 H.GA
H.GA.3 H GA H.GA.3 H.GA
H.GA.4 H GA H.GA.4 H.GA
H.GA.5 H GA H.GA.5 H.GA
H.WT.1 H WT H.WT.1 H.WT
H.WT.2 H WT H.WT.2 H.WT
H.WT.3 H WT H.WT.3 H.WT
H.WT.4 H WT H.WT.4 H.WT
H.WT.5 H WT H.WT.5 H.WT
> show(rawdata[1:5,1:10]);
SYMBOL ENSID C.A.1 C.A.2 C.A.3
ENSMUSG00000000001.4_Gnai3 Gnai3 ENSMUSG00000000001.4 6227 5210 5183
ENSMUSG00000000003.15_Pbsn Pbsn ENSMUSG00000000003.15 0 0 0
ENSMUSG00000000028.15_Cdc45 Cdc45 ENSMUSG00000000028.15 655 541 571
ENSMUSG00000000031.16_H19 H19 ENSMUSG00000000031.16 607 1074 792
ENSMUSG00000000037.16_Scml2 Scml2 ENSMUSG00000000037.16 74 62 45
C.A.4 C.A.5 C.G.1 C.G.2 C.G.3
ENSMUSG00000000001.4_Gnai3 5187 13509 4944 4885 5650
ENSMUSG00000000003.15_Pbsn 0 0 0 0 0
ENSMUSG00000000028.15_Cdc45 538 1402 287 305 236
ENSMUSG00000000031.16_H19 1459 3263 3757 9408 9656
ENSMUSG00000000037.16_Scml2 72 164 77 72 50
> library(edgeR)
> Group <- factor(tar5$Types.KOStat)
> y <- DGEList(counts=rawdata[,3:ncol(rawdata)], genes=rawdata[,1:2], group=Group)
>keep <- filterByExpr(y,group=Group)
>y <- y[keep, , keep.lib.sizes=FALSE]
>y <- calcNormFactors(y)
> design <- model.matrix(~0+Group)
> colnames(design) <- levels(Group)
> rownames(design) <- colnames(y)
>y <- estimateDisp(y,design, robust=TRUE)
> fit <- glmQLFit(y,design,robust=TRUE);
>my.contrasts <- makeContrasts(
+ ##basic contrast
+ GA.H_C=H.GA-C.GA, A.H_C=H.A-C.A,
+ G.H_C=H.G-C.G,WT.H_C=H.WT-C.WT,
+ ###complicated contrasts
+ G_WT.H_C=(H.G-C.G)-(H.WT-C.WT),
+ GA_A.H_C=(H.GA-C.GA)-(H.A-C.A),
+ levels=design)
for(i in 1:length(colnames(my.contrasts))){
+ currContrast<-colnames(my.contrasts)[i];
+ qlf <- glmQLFTest(fit, contrast=my.contrasts[,currContrast])
+ AllTags<-topTags(qlf, n=Inf);
+ res<-as.data.frame(AllTags);
}
sessionInfo( )
> show(sessionInfo( ));
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /mnt/nasapps/development/R/4.0.2/lib64/R/lib/libRblas.so
LAPACK: /mnt/nasapps/development/R/4.0.2/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] org.Mm.eg.db_3.12.0 AnnotationDbi_1.52.0 IRanges_2.26.0
[4] S4Vectors_0.30.0 Biobase_2.50.0 BiocGenerics_0.38.0
[7] pheatmap_1.0.12 edgeR_3.32.1 limma_3.46.0
[10] reshape_0.8.9 reshape2_1.4.4 ggrepel_0.9.1
[13] ggplot2_3.3.6
loaded via a namespace (and not attached):
[1] statmod_1.4.36 tidyselect_1.1.2 locfit_1.5-9.4 purrr_0.3.4
[5] splines_4.0.2 lattice_0.20-45 colorspace_2.0-3 vctrs_0.4.1
[9] generics_0.1.2 utf8_1.2.2 blob_1.2.3 rlang_1.0.2
[13] pillar_1.7.0 glue_1.6.2 withr_2.5.0 DBI_1.1.2
[17] bit64_4.0.5 RColorBrewer_1.1-3 lifecycle_1.0.1 plyr_1.8.7
[21] stringr_1.4.0 munsell_0.5.0 gtable_0.3.0 memoise_2.0.1
[25] fastmap_1.1.0 fansi_1.0.3 Rcpp_1.0.8.3 scales_1.2.0
[29] cachem_1.0.6 bit_4.0.4 stringi_1.7.6 dplyr_1.0.9
[33] grid_4.0.2 cli_3.3.0 tools_4.0.2 magrittr_2.0.3
[37] tibble_3.1.7 RSQLite_2.2.13 crayon_1.5.1 pkgconfig_2.0.3
[41] ellipsis_0.3.2 assertthat_0.2.1 R6_2.5.1 compiler_4.0.2
Thx a lot for the input, Dr. Gordon Smyth! Yes, it is interaction contrast that we are interested. I am asking if my setting is correct or not. we did see your paper (URL: https://f1000research.com/articles/5-1438) with an example at section of "Complicated contrasts":
which is very similar to what I did here. But since one of the contrasts we got almost just 0 DEGs at FDR<0.05 and FC>1.5 in one of the complicated contrasts, although we expected there shall be some DEGs here, which made me wonder if anything is wrong with my model or contrast setting, or my code has issue, and so Just want confirmation from expertise' eyes. Or instead of using combined string as "Group" here (Group <- factor(tar5$Types.KOStat)), I shall use syntax like design <- model.matrix(~0+Types:KOStat)? Your advice would be really helpful! Thx, Mike
Yes, the settings in your question seem correct and I think I already answered your original question.
On the other hand, the additional setting you mention in your comment is not what I recommend. I strongly, strongly advise against using FC cutoffs for assessing statistical significance. I understand that it is very common practice in the literature, but it is very bad practice. The F1000 article you link to says:
"A commonly used approach is to apply FDR and logFC cutoffs simultaneously. However this tends to favor lowly expressed genes, and also fails to control the FDR correctly."
The only possible motivation for using additional criteria on top of edgeR's FDR to select DE genes would be if you have too many DE genes to interpret (and in that case we recommend treat) but applying a FC cutoff and throwing away the DE genes that you wanted doesn't make any sense to me.
Even without the FC cutoff, it is more difficult to achieve statistical significance for contrasts than for simple comparisons because interactions involve more terms and hence the standard errors are larger. There's nothing that can be done about that. Testing more complex questions is just intrinsically more difficult. Changing the design matrix cannot possibly help.
Thx a lot for the info and confirmation! also thx a lot for your input and detailed info against using FC cutoff as part of criteria for DEGs. Indeed, I used to use FDR alone, but my wet lab colaborators preferred and insisted on with FC cutoff even at 1.5 on basis of their experience that derived DEGs with FC cutoff (1.5) seems relatively easy to be validated from their experience. Also in some contrasts, we do have lot of DEGs, and having additional FC cutoffs indeed help reducing the size of DEGs as you mentioned as one possile motivation. sometimes in practice, it is hard to say which way is better indeed especillay on validation. Thx again for your great input and info, very helpful!! regards, Mike.