DeSeq2 design for sgRNA fold-change
1
0
Entering edit mode
XTR5 ▴ 10
@p1000
Last seen 3.2 years ago
United States

I am reanalyzing some CRISPR-Cas9 screening data to look for sgRNAs effective across cell lines.

countData:

enter image description here

colData:

enter image description here

Condition here corresponds to time (initial vs. final).

I can set up the DESeqData like this, where the fold-change result is from the effect of time:

DESeq2::DESeqDataSetFromMatrix(countData = counts, colData = colData, design = ~ condition)

Or I can try to correct for the fold-change that results from cell line:

DESeq2::DESeqDataSetFromMatrix(countData = counts, colData = colData, design = ~ cellType + condition)

The distribution of fold-changes is similar regardless of design choice, which makes sense as there should be a high degree of overlap across cell lines:

design = ~ condition:

enter image description here

design = ~ cellType + condition:

enter image description here

Is one design strategy recommended over the other? Thanks in advance.

I am thinking about this section in the FAQ: http://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#multi-factor-designs

"Experiments with more than one factor influencing the counts can be analyzed using design formula that include the additional variables...By adding variables to the design, one can control for additional variation in the counts." I think controlling for cell line may be useful here.

DESeq2 • 896 views
ADD COMMENT
2
Entering edit mode
Kevin Blighe ★ 4.0k
@kevin
Last seen 24 days ago
Republic of Ireland

Is one design strategy recommended over the other? Thanks in advance.

I think that you may want to consult with a local statistician about this (this website is more for technical issues relating to Bioconductor packages). The design formula should capture each of those components that you want to compare, statistically, in your data, and permit that one can adequately answer the hypothesis being posed. The design formula can also address issues of confounding and allow for covariate adjustment.

Do you have evidence that you need to adjust for cellType? Usually, one would have concrete evidence, like, output from a PCA bi-plot, or, some independent laboratory or statistical test that alluded to cellType-specific effects.

One can also use packages like sva in order to identify 'extraneous' / unknown effects that may exist in your data, and to adjust for these.

Kevin

ADD COMMENT
1
Entering edit mode

Agree with Kevin, if you see samples clustering by celltype in PCA, that would be a good reason to use ~celltype + condition.

ADD REPLY

Login before adding your answer.

Traffic: 520 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6