Hi Mike,
I am trying to use DESeq2 to do some analysis on differential translation efficiency analysis using Ribosomal profiling and RNAseq datasets. I have two group of samples (control and treatment) with 3 replicates each group. I have both RNAseq and Ribosome profiling datasets. My analysis goal is to identify a gene list which are genes with significantly changed translation efficiency. Translation efficiency=(RPKM of Riboseq)/(RPKM of RNASeq)
I have compiled the read counts file count_table which is something like this:
GL1.Riboseq GL2.Riboseq GL3.Riboseq PM1.Riboseq PM2.Riboseq PM3.Riboseq GL1.RNAseq GL2.RNAseq GL3.RNAseq PM1.RNAseq PM2.RNAseq PM3.RNAseq LOC100506869 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0 0 0 0 0 0 LOC102724604 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0.00000000 0 0 0 0 0 0 MTVR2 3.00000000 1.00000000 1.00000000 1.00000000 1.00000000 1.00000000 1 1 3 0 1 1 LOC100506860 0.00000000 2.00000000 0.00000000 2.00000000 0.00000000 0.00000000 5 7 8 7 9 10 LOC102724601 0.00000000 0.00000000 0.00000000 1.00000000 0.00000000 0.00000000 0 1 1 5 1 2 ATRX 468.00000000 451.00000000 352.00000000 463.00000000 157.00000000 65.00000000 555 384 495 350 815 389 LOC441204 0.00000000 1.00000000 2.00000000 1.00000000 3.00000000 1.00000000 0 0 0 0 2 0 TCOF1 1369.00000000 1377.00000000 969.00000000 1275.00000000 611.00000000 330.00000000 5680 5988 5520 7483 7086 7913 NSRP1 307.00000000 259.00000000 251.00000000 328.00000000 124.00000000 66.00000000 651 583 574 493 837 511 SPPL3 124.00000000 103.00000000 97.00000000 117.00000000 44.00000000 27.00000000 1099 1252 1203 1349 1256 1260
I also compiled the sample table sample_table which is something like this:
condition sampleType control1.Riboseq control riboseq control2.Riboseq control riboseq control3.Riboseq control riboseq treatment1.Riboseq treatment riboseq treatment2.Riboseq treatment riboseq treatment3.Riboseq treatment riboseq control1.RNAseq control rnaseq control2.RNAseq control rnaseq control3.RNAseq control rnaseq treatment1.RNAseq treatment rnaseq treatment2.RNAseq treatment rnaseq treatment3.RNAseq treatment rnaseq
My code:
dds <- DESeqDataSetFromMatrix(countData = count_table, colData = sample_table,design = ~sampleType+condition+sampleType:condition) dds <- DESeq(dds) res <- results(dds) resOrdered <- res[order(res$padj),] write.delim(as.data.frame(resOrdered),file="diffTE_DESeq2.txt",row.names = T)
The result file diffTE_DESeq2.txt shows:
"baseMean" "log2FoldChange" "lfcSE" "stat" "pvalue" "padj" "EIF4B" 6332.26385509347 -0.668271435938997 0.0719422396978361 -9.28899960226144 1.55745244593353e-20 1.59950366197374e-16 "RPL7" 18760.8711555484 -0.664037246334803 0.0991791168656775 -6.69533332540294 2.15180579951482e-11 1.10495227805086e-07 "RPS3A" 18548.9009854286 -0.691914863488827 0.105378268654482 -6.56601092733359 5.16810250882598e-11 1.76921375885476e-07 "TPT1" 12666.0153749445 -0.66412280362569 0.104253670496368 -6.3702582409204 1.88710186387301e-10 4.84513403549395e-07 "EEF1B2" 6012.12004960793 -0.533457411938409 0.0859752788772395 -6.2047767556545 5.47746012346194e-10 1.12507030935908e-06
1). Is the analysis correctly done?
2). The log2Foldchange seems to be opposite, which means it shows the log2Foldchange of Control vs Treatment. I want to get the log2Foldchange of Treatment vs Control. How should I revise the code?
3). Also, What does the baseMean mean under this analysis?
Thanks,
Thanks, Mike. I read the "Note on factor levels". I am still a little bit confused. I did not specify the factor levels in my code. Deseq should compare based on the alphabetical order of the levels. That means DEseq should have compared the control and treatment and produced the log2FC of (treatment vs control). But from the results, I can tell the log2FC is control vs treatment, but not treatment vs control. Am I understanding something wrong? My apologize if this is a minor issue.
Yes, it is "treatment" over "control" from alphabetic order, but it is also using "rnaseq" over "ribosomal", whereas you said you want the opposite (ribo / RNA). You need both ratios to be correct or else the sign will be flipped from what you are expecting.
Thanks a lot, Mike. I got it. I changed the code and now the log2FC shows correct comparison. I used padj 0.05 as the cut off for significant genes. I noticed that the gene list number decreased from 600 to 500 using padj 0.05 cutoff compared with my previous code. Is this normal? After I corrected the comparison order, the gene number also changed. Is this reasonable?
Yes, it is expected that there would be less genes at a more stringent FDR threshold.
The number would not change after changing the order of the levels of the factors. You should make certain that there was nothing else in your code that you changed in between analyses.
The code is changed from:
To
dds <- DESeqDataSetFromMatrix(countData = count_table,colData = sample_table,design = ~ sampleType+condition+sampleType:condition)
dds$condition <- factor(dds$condition, levels=c("GL","PM"))
dds$sampleType <- factor(dds$sampleType, levels=c("rnaseq","riboseq"))
dds <- DESeq(dds)
res <- results(dds)
resOrdered <- res[order(res$padj),]
write.table(as.data.frame(resOrdered),file="diffTE_DESeq2.txt",row.names = T)
I did not change anything else. But the gene number decreased from 600 to 500 using the same padj 0.05 criterion. Do you know why?
Are you certain that it is different? Can you show the full code and output:
From:
library(DESeq2)
count_table <- read.table("combineCount.txt",sep="\t",header=T,row.names=1)
samplesCondition <- c("GL","GL","GL","PM","PM","PM","GL","GL","GL","PM","PM","PM")
sampleType <- c(rep("riboseq",6),rep("rnaseq",6))
sample_table <- data.frame(sampleName = colnames(count_table),condition=samplesCondition,sampleType=sampleType,row.names=1)
dds <- DESeqDataSetFromMatrix(countData = count_table,colData = sample_table,design = ~ sampleType+condition+sampleType:condition)
dds <- DESeq(dds)
res <- results(dds)
resOrdered <- res[order(res$padj),]
write.delim(as.data.frame(resOrdered),file="diffTE_DESeq2.txt",row.names = T)
To
library(DESeq2)
count_table <- read.table("combineCount.txt",sep="\t",header=T,row.names=1)
samplesCondition <- c("GL","GL","GL","PM","PM","PM","GL","GL","GL","PM","PM","PM")
sampleType <- c(rep("riboseq",6),rep("rnaseq",6))
sample_table <- data.frame(sampleName = colnames(count_table),condition=samplesCondition,sampleType=sampleType,row.names=1)
dds <- DESeqDataSetFromMatrix(countData = count_table,colData = sample_table,design = ~ sampleType+condition+sampleType:condition)
dds$condition <- factor(dds$condition, levels=c("GL","PM"))
dds$sampleType <- factor(dds$sampleType, levels=c("rnaseq","riboseq"))
dds <- DESeq(dds)
res <- results(dds)
resOrdered <- res[order(res$padj),]
write.table(as.data.frame(resOrdered),file="diffTE_DESeq2.txt",row.names = T)
table(res1$padj<0.05)
FALSE TRUE
9688 582
table(res2$padj<0.05)
FALSE TRUE
9777 494
Sorry to keep going back and forth, but the point of my question was to see if I can spot a difference (which maybe snuck in accidentally) between your building of res1 and res2. But you copied some code which doesn't have res1 or res2 defined.
If you could, just run this code exactly as I have it:
I ran it and it shows:
> table(res1$padj<0.05)
FALSE TRUE
9688 582
> table(res2$padj<0.05)
FALSE TRUE
9777 494
Do you know what is going on?