We recently commissioned some RNASeq work and I am getting my first taste of using bioconductor. This is probably a bit of a silly question for all you experts, but I just cannot seem to figure it out: Our experiment involved two cell lines that were each treated with four compounds. Each treatment was done in triplicate and there are untreated controls in triplicate as well. We are interested in comparing differential expression between the treatments vs. control in each cell line as well as comparing the same treatment between the two cell lines.
My coldata table looks something like this:
cell compound rep c1c1_1 c1 c1 1 c1c1_2 c1 c1 2 c1c1_3 c1 c1 3 c1c2_1 c1 c2 1 ... c2c3_3 c2 c3 3 c2c4_1 c2 c4 1 c2c4_2 c2 c4 2 c2c4_3 c2 c4 3 c2cc_1 c2 cc 1 c2cc_2 c2 cc 2 c2cc_3 c2 cc 3
Now to my questions:
- Does it make sense to work with both cell lines in the same DESeqDataSet object?
- If yes, would '~cell + cell:compound' be a reasonable design?
- When comparing results between cell lines, I assume I need to compare fold-changes. Are there methods in bioconductor that can compare two DESeqResults objects?
Any feedback, pointers, or advice greatly appreciated!

Can you provide the code you are using, for example the
DESeq()call, and the output ofresultsNames(dds).Here is what I did:
> counts <- read.csv("counts.csv", row.names="ID") > counts <- na.omit(counts[,1:24]) > > coldata <- read.table("sample_data.txt", header=T,row.names=1,sep = ",") > coldata[,2] <- factor(coldata[,2], levels = c("cc", "c1", "c2", "c3")) > coldata[,3] <- factor(coldata[,3]) > > expmat <- DESeqDataSetFromMatrix(countData=counts, colData=coldata, design=~cell + cell:compound) > > counts <- read.csv("counts.csv", row.names="ID") > counts <- na.omit(counts[,1:24]) > > coldata <- read.table("sample_data.txt", header=T,row.names=1,sep = ",") > coldata[,2] <- factor(coldata[,2], levels = c("cc", "c1", "c2", "c3")) > coldata[,3] <- factor(coldata[,3]) > > expmat <- DESeqDataSetFromMatrix(countData=counts, colData=coldata, design=~cell + cell:compound) > > dds <- DESeq(expmat) > > resultsNames(dds) [1] "Intercept" "cellc1" "cellc2" "cellc1.compoundcc" "cellc2.compoundcc" [6] "cellc1.compoundc1" "cellc2.compoundc1" "cellc1.compoundc2" "cellc2.compoundc2" "cellc1.compoundc3" [11] "cellc2.compoundc3" > > res <- results(dds, contrast=list("cellc2.compoundc2", "cellc1.compoundc2")) > > resFilt <- res[!is.na(res$padj) & res$padj < 0.05, ] > o <- order(resFilt$log2FoldChange) > top <- c(row.names(head(resFilt[o,])), row.names(tail(resFilt[o,]))) > > resFilt[o,] log2 fold change (MAP): cellc2.compoundc2 vs cellc1.compoundc2 Wald test p-value: cellc2.compoundc2 vs cellc1.compoundc2 DataFrame with 507 rows and 6 columns baseMean log2FoldChange lfcSE stat pvalue padj GALNT15 22.93248 -1.2025121 0.14530738 -8.275644 1.277629e-16 3.016738e-13 TXNIP 900.18497 -1.0814032 0.06160768 -17.553060 5.635982e-69 6.653840e-65 KYNU 574.17227 -1.0373937 0.15224958 -6.813771 9.507325e-12 7.015217e-09 GJA5 35.18981 -1.0258112 0.12891364 -7.957352 1.757600e-15 2.593778e-12 FILIP1 62.81634 -0.8727636 0.17465356 -4.997113 5.819498e-07 1.073516e-04 ... ... ... ... ... ... ... ZNF831 22.04470 0.7939976 0.16197967 4.901835 9.494557e-07 1.648423e-04 CXCL8 1520.71535 0.7943013 0.09764349 8.134709 4.129301e-16 8.125088e-13 PPAP2B 730.27992 0.8298616 0.07264891 11.422905 3.213093e-30 1.896689e-26 MITF 146.34764 0.8583198 0.11280002 7.609216 2.757634e-14 2.959694e-11 DMBT1 44.09129 0.8944678 0.14110499 6.339023 2.312263e-10 1.137441e-07 > > counts(dds)[top,] c1c1_1 c1c1_2 c1c1_3 c1c2_1 c1c2_2 c1c2_3 c1c3_1 c1c3_2 c1c3_3 c1cc_1 c1cc_2 c1cc_3 c2c1_1 c2c1_2 c2c1_3 c2c2_1 GALNT15 0 0 0 0 1 0 0 0 0 0 0 0 127 110 136 4 TXNIP 266 246 305 352 441 857 316 823 910 169 212 320 1276 1269 1455 891 KYNU 962 833 1102 497 682 1206 342 707 874 717 806 1170 37 27 52 3 GJA5 0 0 0 0 0 1 0 0 0 0 0 0 251 212 263 21 FILIP1 0 1 5 2 4 7 1 0 3 3 1 3 245 309 280 73 DKK2 0 0 2 0 0 0 0 0 0 1 0 1 144 117 140 21 ABCG1 42 33 62 18 26 42 8 28 50 28 42 59 234 207 219 317 ZNF831 45 48 54 10 10 24 5 7 18 50 47 68 1 0 0 2 CXCL8 291 295 342 42 71 95 20 51 50 204 262 372 6338 4755 5603 3917 PPAP2B 444 344 500 143 176 299 50 127 185 349 362 582 1228 1005 1141 2063 MITF 251 220 318 67 110 200 43 88 108 178 216 333 59 54 67 107 DMBT1 80 89 125 19 22 51 13 36 42 82 85 138 1 0 1 0 c2c2_2 c2c2_3 c2c3_1 c2c3_2 c2c3_3 c2cc_1 c2cc_2 c2cc_3 GALNT15 13 17 7 7 5 93 124 164 TXNIP 900 881 3639 3231 3419 988 1327 1503 KYNU 2 0 0 2 3 37 44 61 GJA5 27 25 5 2 1 109 154 179 FILIP1 95 87 47 40 42 208 347 386 DKK2 24 20 16 15 9 85 124 139 ABCG1 282 313 183 168 184 179 214 260 ZNF831 0 1 0 0 0 0 1 1 CXCL8 3160 3404 2221 1759 2031 4384 5955 6028 PPAP2B 1851 1852 2498 2059 2228 787 1037 1010 MITF 98 104 110 91 112 49 43 60 DMBT1 1 1 0 0 1 0 0 0 >I edited the second
results()call above, to directly compare c1 over control. I had previously given the command which would be appropriate if one of these arguments had been used: test="LRT", betaPrior=FALSE or modelMatrixType="standard", in which case the base level of compound (control) would have been absorbed into the intercept. But for the standard DESeq() run, we need to explicitly contrast compound 1 against control, so this is reflected in the new code.What you are getting back with this command is the log fold change between the compound 1 vs control effect in the two cell lines. It's log2( (compound 1 vs control in cell 2) / (compound 1 vs control in cell 1) ). The rearrangement of terms above works because the multiplications and divisions become addition and subtraction on the log scale.
"With the second command, I seem to get genes that are behaving differently in each cell line but in different ways. Some have higher counts for the compound of interest in one cell line compared with the other, others have about the same counts but they markedly differ from the other treatments and the control."
Exactly, it's the difference between compound 1 and control which is being examined. This means that the normalized counts for compound 1 in cell 1 and cell 2 might be similar, but if the control's normalized counts are different, then the ratio of ratios will not be close to 1 (and the log fold change will not be close to 0).
Great! Thank you very much. This makes much more sense now. I did not know you could give two vectors to contrast.