Question

How to setup sample design in DESeq2

0

Entering edit mode

dequattro.concetta ▴ 10

@dequattroconcetta-21510

Last seen 2.2 years ago

Italy

Hi!

I am performing a differential expression analysis comparing cell lines responders to my treatment against cell lines not responders to the treatment (responders vs not resonders).

Here the sampleTable:

Name    Cell_line   condition
cell_line1_rep1 Cell_line_1 RESPONDERS
cell_line2_rep1 Cell_line_2 NOT_responders
cell_line3_rep1 Cell_line_3 NOT_responders
cell_line4_rep1 Cell_line_4 RESPONDERS
cell_line1_rep2 Cell_line_1 RESPONDERS
cell_line2_rep2 Cell_line_2 NOT_responders
cell_line3_rep2 Cell_line_3 NOT_responders
cell_line4_rep2 Cell_line_4 RESPONDERS
cell_line1_rep3 Cell_line_1 RESPONDERS
cell_line2_rep3 Cell_line_2 NOT_responders
cell_line3_rep3 Cell_line_3 NOT_responders
cell_line4_rep3 Cell_line_4 RESPONDERS

In my design I have two cell lines responders and two cell lines not responders. For each cell-line I have three biological replicates.

I construct the dds model considering as covariate both cell_line and condition but I got the error message that the variable are linear.

ddsTxi <- DESeqDataSetFromTximport(txi,
                                   colData = samples,
                                   design = ~ Cell_line + condition)
Error in checkFullRank(modelMatrix) :
  the model matrix is not full rank, so the model cannot be fit as specified.
  One or more variables or interaction terms in the design formula are linear
  combinations of the others and must be removed.

      Please read the vignette section 'Model matrix not full rank':

I was wondering how I can inform the DESeq function that my replicates come from different cell lines?

Thank you for your help!

Concetta

DESeq2 • 1.9k views

ADD COMMENT • link written 3.2 years ago by dequattro.concetta ▴ 10

0

Entering edit mode

There is not much that you can do here. You want to compare responders to non-responders, but 'response' (condition) is confounded by Cell_line. You can make an assumption that there are no cell-line effects in your data, and remove Cell_line from your design, but this assumption is almost certainly incorrect, and, therefore, the statistical inferences would be incorrect.

Edit: it seems that you did not show the correct sample table in your question. See below

ADD REPLY • link 3.1 years ago Kevin Blighe ★ 3.9k

score 0 · Answer 1 · 2021-02-23

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 14 hours ago

United States

I'd recommend to collaborate with a statistician to pick an appropriate design, and to help with interpretation of results. We have information in the vignette section above, but beyond that, it would be a good idea to discuss with a statistician.

ADD COMMENT • link 3.2 years ago Michael Love 41k

score 0 · Answer 2 · 2021-02-23

0

Entering edit mode

swbarnes2 ★ 1.3k

@swbarnes2-14086

Last seen 14 hours ago

San Diego

I think this part of the vignette addresses your situation

http://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#group-specific-condition-effects-individuals-nested-within-groups

ADD COMMENT • link 3.2 years ago swbarnes2 ★ 1.3k

0

Entering edit mode

Many tanks for the help and sorry for my late reply. I have read the vignette and my experiment addresses the part of the vignette related to “Group-specific condition effects, individuals nested within groups”.

I rearranged the sample table including the column of the treatment (A=control and B=treated). In particular, in I assigned Cell_line.n I assigned the same number to biological replicates coming from the same cell lines.

Cell_line   condition   responders  Cell_line.n
cell_line1_rep1 A   RESPONDERS  1
cell_line1_rep1 B   RESPONDERS  1
cell_line1_rep2 A   RESPONDERS  1
cell_line1_rep2 B   RESPONDERS  1
cell_line1_rep5 A   RESPONDERS  1
cell_line1_rep6 B   RESPONDERS  1
cell_line2_rep1 A   RESPONDERS  2
cell_line2_rep1 B   RESPONDERS  2
cell_line2_rep2 A   RESPONDERS  2
cell_line2_rep2 B   RESPONDERS  2
cell_line2_rep3 A   RESPONDERS  2
cell_line2_rep3 B   RESPONDERS  2
cell_line3_rep1 A   NOT_responders  1
cell_line3_rep1 B   NOT_responders  1
cell_line3_rep2 A   NOT_responders  1
cell_line3_rep2 B   NOT_responders  1
cell_line3_rep3 A   NOT_responders  1
cell_line3_rep3 B   NOT_responders  1
cell_line4_rep1 A   NOT_responders  2
cell_line4_rep1 B   NOT_responders  2
cell_line4_rep2 A   NOT_responders  2
cell_line4_rep2 B   NOT_responders  2
cell_line4_rep3 A   NOT_responders  2
cell_line4_rep3 B   NOT_responders  2

Subsequently I ran the analysis with the following command:

ddsTxi <- DESeqDataSetFromTximport(txi.rsem, colData = samples, design = ~ grp + grp:Cell_line.n + grp:condition)
dds <- DESeq(ddsTxi)
resultsNames(dds)
[1] "Intercept"                        "grp_RESPONDERS_vs_NOT_responders" "grpNOT_responders.Cell_line.n2"   "grpRESPONDERS.Cell_line.n2"      
[5] "grpNOT_responders.conditionB"     "grpRESPONDERS.conditionB"

I extracted the results as indicated in the vignette:

resApe <- lfcShrink(dds, coef = "grpNOT_responders.conditionB", type="apeglm") 
resApe <- lfcShrink(dds, coef = "grpResponders.conditionB", type="apeglm")

In order to test if the treatment effect is different across group I should run the following command:

results(dds, contrast=list("grpY.cndB","grpX.cndB"))

However when I tried to shrink the log2FoldChange I got the following error:

res <- lfcShrink(dds, contrast = c("grpRESPONDERS.conditionB", "grpNOT_responders.conditionB"), type="apeglm")
Error in checkContrast(contrast, resNames) : 
'contrast', as a character vector of length 3, should have the form:
contrast = c('factorName','numeratorLevel','denominatorLevel'),
see the manual page of ?results for more information

I have three questions:

Concerning the sampleTable, is it correct to assign the same number to biological replicated coming from the same cell line?
How can I get the log2foldChange shrinked with the method apeglm for the comparison results(dds, contrast=list("grpY.cndB","grpX.cndB")) ?
How can I get the differences between the cell lines at basal level, i.e. responders vs not responders in control condition? Should I put as reference level B condition (the treated one) and re-run the analysis?

Thank you for your help.

ADD REPLY • link 3.1 years ago dequattro.concetta ▴ 10

0

Entering edit mode

Re: 2) you can add in a main effect for condition, which will turn the interaction term into the difference in condition effects across group.

3) is not possible here with fixed effects.

ADD REPLY • link 3.1 years ago Michael Love 41k

0

Entering edit mode

Thank you for your help. As you suggested I included the condition in the main effect as follow:

ddsTxi <- DESeqDataSetFromTximport(txi.rsem, colData = samples, design = ~grp + condition + grp:Cell_line.n + grp:condition)
dds <- DESeq(ddsTxi)
   estimating size factors
   using 'avgTxLength' from assays(dds), correcting for library size
   estimating dispersions
   gene-wise dispersion estimates
   mean-dispersion relationship
   final dispersion estimates
   fitting model and testing

resultsNames(dds)
 [1] "Intercept"                        "grp_RESPONDERS_vs_NOT_responders" "condition_B_vs_A"                
 [4] "grpNOT_responders.Cell_line.n2"   "grpRESPONDERS.Cell_line.n2"       "grpRESPONDERS.conditionB"

I extracted the results with the fcShrink function to compare the condition effect across the two groups:

res <- lfcShrink(dds, coef = "grpRESPONDERS.conditionB", type="apeglm")

I was wondering if my procedure is correct. Thank you!

ADD REPLY • link 3.1 years ago dequattro.concetta ▴ 10

0

Entering edit mode

Yes, that coefficient is now the comparison of the condition effect across groups (RESP vs NOT).

ADD REPLY • link 3.1 years ago Michael Love 41k

0

Entering edit mode

Hi!

Concerning the same experimental design I would like to check the differences of responders vs not-responders at the basal level.

I tested the following code in which I considered the treatment as samples group:

ddsTxi <- DESeqDataSetFromTximport(txi.rsem, colData = samples, design = ~ condition + condition:Cell_line.n + condition:grp)
using counts and average transcript lengths from tximport 

dds <- DESeq(ddsTxi)
  estimating size factors
  using 'avgTxLength' from assays(dds), correcting for library size
  estimating dispersions
  gene-wise dispersion estimates
  mean-dispersion relationship
  final dispersion estimates
  fitting model and testing

resultsNames(dds) [1] "Intercept" "condition_B_vs_A" "conditionA.Cell_line.n2" [4] "conditionB.Cell_line.n2" "conditionA.grpRESPONDERS" "conditionB.grpRESPONDERS"

Then I extracted the results of responders vs NOT-responders at basal level (conditionA=untreated) as follow:

 summary(results(dds, name = "conditionA.grpRESPONDERS"))

   out of 35199 with nonzero total read count
   adjusted p-value < 0.1
   LFC > 0 (up)       : 5974, 17%
   LFC < 0 (down)     : 5723, 16%
   outliers [1]       : 41, 0.12%
   low counts [2]     : 0, 0%
   (mean count < 0)
   [1] see 'cooksCutoff' argument of ?results
   [2] see 'independentFiltering' argument of ?results

Is this approach make sense? If not, could you suggest one possible approach to get this result?

Thank you for your help!

ADD REPLY • link 3.0 years ago dequattro.concetta ▴ 10

0

Entering edit mode

Sorry, I don't have sufficient time to provide statistical design consultation here. At some point I have restrict myself to software questions as I have a limited amount of time to field questions.

ADD REPLY • link 3.0 years ago Michael Love 41k

0

Entering edit mode

Hi! Thank you, I really appreciated the help that you provide.

Reading other posts on Bioconductor support I have another question about my design, that I reattached below:

ddsTxi <- DESeqDataSetFromTximport(txi.rsem, colData = samples, design = ~grp + condition 
+ grp:Cell_line.n + grp:condition)

dds <- DESeq(ddsTxi)

resultsNames(dds)
[1] "Intercept"      "grp_RESPONDERS_vs_NOT_responders"   "condition_B_vs_A"                
[4] "grpNOT_responders.Cell_line.n2"   "grpRESPONDERS.Cell_line.n2"       
"grpRESPONDERS.conditionB"

I was wondering if the comparison "grp_RESPONDERS_vs_NOT_responders" indicates the differences of responders vs NOT-responders for control condition (condition A).

Thank you for your help.

ADD REPLY • link 3.0 years ago dequattro.concetta ▴ 10