We have RNAseq data with 4 cell lines with different knockout genotypes (WT, A, G, GA), and for each cell lines, we have two treatment types: treated (H) and untreated (C) with 5 replicates for each condition. we are interested in the treatment effect on each of these cell lines: WT.H_C, A.H_C, G.H_C, or GA.H_C (H_C means treatment vs untreat comparison). These are easily to be done in DESeq2 based on user guide (see the code below, such as:
myConL<-list();
k<-1;
myConL[[k]]<-c("Types.KOStat","H.GA", "C.GA");
names(myConL)[k]<-"GA.H_C";
...
But we are also interested to see if there is difference between the treatment effects between different cell lines. for example, is there any difference between the treatment effect on cell line A (A.H_C) and the treatment effect on cell line GA (GA.H_C)? also, is there any difference between the treatment effect on cell line G (G.H_C) and the treatment effect on cell line WT(WT.H_C)?It is indeed the interaction between the lines and treatment, which I annotated as contrasts: GA_A.H_C and G_WT.H_C respectively. But from DESeq2 user guide, did not see example code for this level of complicated contrast, not sure how this can be done in syntax although in theory it shall be able to be done. I did try the following way as shown in the code section as well:
k<-k+1;
myConL[[k]]<-c(0,0,-1,0,-1,0,1,0);
names(myConL)[k]<-"GA_A.H_C";
k<-k+1;
myConL[[k]]<-c(0,-1,0,1,0,1,0,-1);
names(myConL)[k]<-"G_WT.H_C";
this is based on the output as below and also other related posts:
show(resultsNames(dds)) # lists the coefficients
[1] "Intercept" "Types.KOStat_C.G_vs_C.A"
[3] "Types.KOStat_C.GA_vs_C.A" "Types.KOStat_C.WT_vs_C.A"
[5] "Types.KOStat_H.A_vs_C.A" "Types.KOStat_H.G_vs_C.A"
[7] "Types.KOStat_H.GA_vs_C.A" "Types.KOStat_H.WT_vs_C.A"
want to confirm with you if this is correct way to be done in DESeq2, or you have more simple way to do it? any advice/suggestion would be greatly appreciated!
Thx in advance and regards!
Mike
Code should be placed in three backticks as shown below
##the critical part of the codes are below
> show(tar5)
Types KO TypesLines Types.KOStat
C.A.1 C A C.A.1 C.A
C.A.2 C A C.A.2 C.A
C.A.3 C A C.A.3 C.A
C.A.4 C A C.A.4 C.A
C.A.5 C A C.A.5 C.A
C.G.1 C G C.G.1 C.G
C.G.2 C G C.G.2 C.G
C.G.3 C G C.G.3 C.G
C.G.4 C G C.G.4 C.G
C.G.5 C G C.G.5 C.G
C.GA.1 C GA C.GA.1 C.GA
C.GA.2 C GA C.GA.2 C.GA
C.GA.3 C GA C.GA.3 C.GA
C.GA.4 C GA C.GA.4 C.GA
C.GA.5 C GA C.GA.5 C.GA
C.WT.1 C WT C.WT.1 C.WT
C.WT.2 C WT C.WT.2 C.WT
C.WT.3 C WT C.WT.3 C.WT
C.WT.4 C WT C.WT.4 C.WT
C.WT.5 C WT C.WT.5 C.WT
H.A.1 H A H.A.1 H.A
H.A.2 H A H.A.2 H.A
H.A.3 H A H.A.3 H.A
H.A.4 H A H.A.4 H.A
H.A.5 H A H.A.5 H.A
H.G.1 H G H.G.1 H.G
H.G.2 H G H.G.2 H.G
H.G.3 H G H.G.3 H.G
H.G.4 H G H.G.4 H.G
H.G.5 H G H.G.5 H.G
H.GA.1 H GA H.GA.1 H.GA
H.GA.2 H GA H.GA.2 H.GA
H.GA.3 H GA H.GA.3 H.GA
H.GA.4 H GA H.GA.4 H.GA
H.GA.5 H GA H.GA.5 H.GA
H.WT.1 H WT H.WT.1 H.WT
H.WT.2 H WT H.WT.2 H.WT
H.WT.3 H WT H.WT.3 H.WT
H.WT.4 H WT H.WT.4 H.WT
H.WT.5 H WT H.WT.5 H.WT
> show(mydata5[1:5,1:8]);
C.A.1 C.A.2 C.A.3 C.A.4 C.A.5 C.G.1 C.G.2 C.G.3
ENSMUSG00000000001.4_Gnai3 6227 5210 5183 5187 13509 4944 4885 5650
ENSMUSG00000000003.15_Pbsn 0 0 0 0 0 0 0 0
ENSMUSG00000000028.15_Cdc45 655 541 571 538 1402 287 305 236
ENSMUSG00000000031.16_H19 607 1074 792 1459 3263 3757 9408 9656
ENSMUSG00000000037.16_Scml2 74 62 45 72 164 77 72 50
> library("DESeq2")
> dds <- DESeqDataSetFromMatrix(countData = mydata5,
+ colData = tar5,
+ design = ~Types.KOStat)
> dds$Types.KOStat <- relevel(dds$Types.KOStat, ref = "C.A")
> dds<- DESeq(dds)
> show(resultsNames(dds)) # lists the coefficients
[1] "Intercept" "Types.KOStat_C.G_vs_C.A"
[3] "Types.KOStat_C.GA_vs_C.A" "Types.KOStat_C.WT_vs_C.A"
[5] "Types.KOStat_H.A_vs_C.A" "Types.KOStat_H.G_vs_C.A"
[7] "Types.KOStat_H.GA_vs_C.A" "Types.KOStat_H.WT_vs_C.A"
> myConL<-list();
> k<-1;
> myConL[[k]]<-c("Types.KOStat","H.GA", "C.GA");
> names(myConL)[k]<-"GA.H_C";
> k<-k+1;
> myConL[[k]]<-c("Types.KOStat","H.A", "C.A");
> names(myConL)[k]<-"A.H_C";
> k<-k+1;
> myConL[[k]]<-c("Types.KOStat","H.G", "C.G");
> names(myConL)[k]<-"G.H_C";
> k<-k+1;
> myConL[[k]]<-c("Types.KOStat","H.WT", "C.WT");
> names(myConL)[k]<-"WT.H_C";
> k<-k+1;
> myConL[[k]]<-c(0,0,-1,0,-1,0,1,0);
> names(myConL)[k]<-"GA_A.H_C";
> k<-k+1;
> myConL[[k]]<-c(0,-1,0,1,0,1,0,-1);
> names(myConL)[k]<-"G_WT.H_C";
> for(i in 1:length(myConL)){
+resLFC <- lfcShrink(dds, contrast = myConL[[i]], type="ashr")
+ resLFC<-as.data.frame(resLFC[order(resLFC$pvalue),]);
+ ###LFC shrink results#######################################################33
+ NAG<-resLFC[is.na(resLFC$padj),]
+ resLFC<-resLFC[!(rownames(resLFC) %in% rownames(NAG)),];
+ resRpt<-resLFC;
+ resRpt<-data.frame(SYMBOL=sapply(strsplit(rownames(resRpt), split="\\_"), function(x) x[2]),
+ ENSID=sapply(strsplit(rownames(resRpt), split="\\_"), function(x) x[1]), IDs=rownames(resRpt),resRpt,stringsAsFactors = FALSE);
+ resRpt2<-resRpt[resRpt$padj<=0.05 & resRpt$log2FoldChange!=0,];
+ filename1<-paste("DESeq2_",names(myConL)[i],"_adjP0_05LFC.csv",sep="")
+ filename1<-paste(outPath,filename1,sep="/")
+ write.table(resRpt2, file=filename1,sep=",", eol="\r\n",quote = FALSE, row.names=FALSE);
+}
# include your problematic code here with any corresponding output
# please also include the results of running the following in an R session
sessionInfo( )
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /mnt/nasapps/development/R/4.0.2/lib64/R/lib/libRblas.so
LAPACK: /mnt/nasapps/development/R/4.0.2/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] pheatmap_1.0.12 DESeq2_1.30.1
[3] SummarizedExperiment_1.22.0 Biobase_2.50.0
[5] MatrixGenerics_1.2.1 matrixStats_0.62.0
[7] GenomicRanges_1.42.0 GenomeInfoDb_1.28.1
[9] IRanges_2.26.0 S4Vectors_0.30.0
[11] BiocGenerics_0.38.0 limma_3.46.0
[13] reshape_0.8.9 reshape2_1.4.4
[15] ggrepel_0.9.1 ggplot2_3.3.6
loaded via a namespace (and not attached):
[1] httr_1.4.2 bit64_4.0.5 splines_4.0.2
[4] assertthat_0.2.1 mixsqp_0.3-43 blob_1.2.3
[7] GenomeInfoDbData_1.2.4 pillar_1.7.0 RSQLite_2.2.13
[10] lattice_0.20-45 glue_1.6.2 RColorBrewer_1.1-3
[13] XVector_0.30.0 colorspace_2.0-3 Matrix_1.4-1
[16] plyr_1.8.7 XML_3.99-0.9 pkgconfig_2.0.3
[19] invgamma_1.1 genefilter_1.72.1 zlibbioc_1.36.0
[22] purrr_0.3.4 xtable_1.8-4 scales_1.2.0
[25] BiocParallel_1.24.1 tibble_3.1.7 annotate_1.68.0
[28] generics_0.1.2 ellipsis_0.3.2 cachem_1.0.6
[31] withr_2.5.0 ashr_2.2-54 cli_3.3.0
[34] survival_3.3-1 magrittr_2.0.3 crayon_1.5.1
[37] memoise_2.0.1 fansi_1.0.3 truncnorm_1.0-8
[40] tools_4.0.2 lifecycle_1.0.1 stringr_1.4.0
[43] munsell_0.5.0 locfit_1.5-9.4 DelayedArray_0.18.0
[46] irlba_2.3.5 AnnotationDbi_1.52.0 compiler_4.0.2
[49] rlang_1.0.2 grid_4.0.2 RCurl_1.98-1.6
[52] bitops_1.0-7 gtable_0.3.0 DBI_1.1.2
[55] R6_2.5.1 dplyr_1.0.9 fastmap_1.1.0
[58] bit_4.0.4 utf8_1.2.2 stringi_1.7.6
[61] SQUAREM_2021.1 Rcpp_1.0.8.3 vctrs_0.4.1
[64] geneplotter_1.68.0 tidyselect_1.1.2
Thx for asking. Indeed, this is not what I am asking. I was wondering the code syntax for interaction contrast in DESeq2 and just want to see expertise' way of doing it in DESeq2 and if the way I did is correct or not. Thx Mike
I unfortunately don't have time to go over user analysis scripts, I instead recommend users settle on experimental design and analysis choices by working with a local statistician or someone familiar with linear models in R.