Question

DESEQ2 - What is the recommended workflow in terms of pre-filtering and counts normalization?

0

Entering edit mode

Peter ▴ 20

@fca336eb

Last seen 11 months ago

Portugal

Hello,

We've received the code processing our RNASeq data from the microarray facility personnel, and I wanted to check some details because as I'm learning more about the workflow, I realize there's no clear-cut way to perform this analysis. Michael Love I've been following your comments with great interest and I was hoping you could assist me with your expertise.

(1) Pre-filtering. I've seen some evidence for performing either

A) no filtering, and rely on DESeq2 independent filtering;

B) rowSums Counts > 0; to reduce statistic burden

C) countData.keep <- countData[rowSums(countData >= 10) >= 3,] - Appears more robust than (B), as it requires atleast 3 samples to have >10 counts.

D) CPM > 1 on atleast 3 samples (or lower depending on library size, should be around the range of 10 counts, from what I've seen)

(2) The personnel then did the following processing, which I'm not sure by going through the vignette if this would be standard/recommended:

dds <- DESeqDataSetFromMatrix(countData=countData.keep,colData=tcolData, design= ~treatment)

dds <- DESeq(dds)

dds <- estimateDispersions(dds)

dds <- nbinomWaldTest(dds)

result_table = results(dds)

Could you please comment on this processing? From my extremely basic experience it seems that this would not be standard workflow in DESeq2. What would your recommendations be to generate the results table?

(3) It was also suggested by the personnel that we could run DESeq entirely from unfiltered raw counts and then filter through counts, which appears to me counterintuitive - shouldn't pre-filtering take place before running DESeq, and not afterwards? This was the suggestion:

dds <- DESeqDataSetFromMatrix(countData=countData,colData=tcolData, design= ~treatment)
dds <- DESeq(dds)
dds <- dds[rowSums(counts(dds))>30]
dds <- estimateDispersions(dds)
dds <- nbinomWaldTest(dds)
results_table = results(dds)

I greatly appreciate any help in this context!

Best regards,

Peter

DESeq2 Transcriptomics • 1.3k views

ADD COMMENT • link updated 12 months ago by Wolfgang Huber ★ 13k • written 12 months ago by Peter ▴ 20

score 1 · Answer 1 · 2023-04-08

1

Entering edit mode

ATpoint ★ 4.0k

@atpoint-13662

Last seen 10 hours ago

Germany

The vignette covers prefiltering and recommended standard workflows

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#pre-filtering

ADD COMMENT • link 12 months ago ATpoint ★ 4.0k

0

Entering edit mode

Hi ATpoint Thank you for your feedback. Yes I've went through the vignette, which is why I am confused regarding the recommendation that we received and I was hoping to get confirmation from the experienced community here to make sure we're doing everything right.

So, what would your recommendation be on this matter? Also I would appreciate your kind feedback concerning the "heavy" processing of dds into estimateDispersion and nbinomWaldtest before running results(dds). How would you go about this processing? Would you recommend me doing pre-filtering, followed by running DESeqDataSetFromMatrix -> DESeq -> results output? I have read the vignette, the user guide and the specific ?helper descriptions from estimateDispersions() and nbinomWaldTest() and I am having difficulties understanding if it would be advisable to use them here, especially before the results output function.

ADD REPLY • link 12 months ago Peter ▴ 20

1

Entering edit mode

Prefiltering comes before running DESeq() and that function is a wrapper around the steps you mention, see https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#wald-test-individual-steps

ADD REPLY • link 12 months ago ATpoint ★ 4.0k

0

Entering edit mode

Thank you so much for your help!

So since DESeq() is a wrapper that already includes estimateDispersions() and nbinomWaldTest() by default, it seems like it would be redudant to follow their code suggestion, which is repeating two of these functions after DESeq(). So in my case, I can just pre-filter before running DESeq and then go straight to calling out results()? And thus delete the estimateDispersions() and nbinomWaldTest() steps?

Going through the workflow by Michael Love the suggested order in the code appears to be: create DESeq object -> Counts filtering -> estimateSizeFactors() -> DESeq(). Does filtering counts before or after creating the DESeq object make no difference? Why is estimateSizeFactors() being performed before running DESeq(), considering this is a wrapper function that already includes estimateSizeFactors()? I realize here estimateSizeFactors was ran to plot "transformplot" before running DESeq down the line, but is there any issue in doing this processing repeatedly?

dds <- DESeqDataSet(gse, design = ~ cell + dex)
keep <- rowSums(counts(dds) >= 10) >= 3
dds <- estimateSizeFactors(dds)
dds <- DESeq(dds)

I apologize for all of these questions as I'm still inexperienced in R, thank you for your help!

ADD REPLY • link 12 months ago Peter ▴ 20

2

Entering edit mode

Not much to add to ATpoint's responses. The DESeq function is intended to do everything automagically, and there is no need to call its internal component functions manually—except for creating additional visualizations, trouble-shooting, additional QA/QC, or deviations from the standard workflow. It seems that you are right to reconsider and carefully think through the recommendations you report getting from someone.

A better name for 'pre-filtering' is 'independent filtering', and that name also suggests that the order of the steps (whether you do it before or after) is largely inconsequential. The order does affect computation time, and perhaps some of the empirical-Bayes information sharing of dispersion and effect estimation, but for non-adversarial data, negligibly.

A better alternative to independent filtering, especially in situations where power is limiting, is independent hypothesis weighting, also described in the DESeq2 vignette.

ADD REPLY • link 12 months ago Wolfgang Huber ★ 13k

0

Entering edit mode

That is exactly what the manual recommends https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#quick-start

ADD REPLY • link 12 months ago ATpoint ★ 4.0k

0

Entering edit mode

Thank you for the quick response. Sorry for the ninja edit above as I had additional questions. I'm quite inexperienced in this topic, so I greatly appreciate your patience in helping me

ADD REPLY • link 12 months ago Peter ▴ 20

0

Entering edit mode

I will pull out here now as the support site is not for hands-on guidance. Follow the manual, really. Just apply the steps and be fine.

ADD REPLY • link 12 months ago ATpoint ★ 4.0k