Difference between DESeqDataSetFromMatrix() function and DESeq() function
1
0
Entering edit mode
Abir.khazaal ▴ 10
@3e9efee3
Last seen 8 months ago
Australia

Hi, I am currently performing differential expression analysis using DESeq2.

I want to filter out lowly expressed genes, although I read on another post here that this may not be necessary because IndependentFiltering within results() kind of does that. However, I am comparing different approaches for differential expression analysis and I need to follow the same "criteria" kind of.

What I want to know is, what is the difference between

Code should be placed in three backticks as shown below


DESeqDataSetFromMatrix()
# and
DESeq()

I have seen some performing filtering before utilising DESeq() function


dds <- DESeqDataSetFromMatrix(countData = countData,
                              colData = metaData,
                              design = ~ condition) 

keep <- rowSums(counts(dds) >= 10) >= 10
dds <- dds[keep,]

dds <- DESeq(dds)
normalizedCounts <- counts(dds, normalized=TRUE)

Whilst the developer utilised DESeq() function and then performed filtering


dds <- DESeqDataSetFromMatrix(countData = countData,
                              colData = metaData,
                              design = ~ condition) 
dds <- DESeq(dds)
dds <- estimateSizeFactors(dds)

# Apply the filtering criteria
idx <- rowSums(counts(dds, normalized=TRUE) >= 10) >= 10
dds <- dds[idx,]

dds <- DESeq(dds)

So I just want to understand which approach is the right one and why :)

Thanks

DESeq DESeq2 • 871 views
ADD COMMENT
0
Entering edit mode
ATpoint ★ 4.0k
@atpoint-13662
Last seen 17 hours ago
Germany

Please follow the manual.

This is estimateSizeFactors() is part of DESeq() so skip that. Filter on raw, nor normalized counts, see vignette.

ADD COMMENT
0
Entering edit mode

Thank you for that @atpoint.

I have seen the steps above in the vignette but got confused when I saw a thread where the developer performed prefiltering using estimateSizeFactors(). Here deseq2 filter the low counts

One question, I didn't quite understand your last sentence. Why should I filter on raw data? doing so will not take into account the differences in library sizes and sequencing depths?! When I performed DE using edgeR, I performed pre-filtering on cpm values. I added my edgeR (pre-filtering) code below

Your help with this is highly appreciated

Thanks


# Prepare raw counts as a DGEList object
dge <- DGEList(counts = countData)

# Obtain CPM values using cpm
cpm_values <- cpm(dge)

# Filter genes that have at least 10 CPM in at least 10 samples
keep <- rowSums(cpm_values > 10) >= 10

# Subset DGEList object to keep only selected genes
dge <- dge[keep, , keep.lib.sizes=FALSE] 

# create a design matrix
design <- model.matrix(~0 +AGE, data=metaData) 

# Estimate common and tagwise dispersions
dge <- estimateDisp(dge, design)

#fit linear model .. etc.
ADD REPLY
0
Entering edit mode

My advise is to always follow the manual unless you have expert knowledge to do something else. The linked thread is 8 years old, and recommendation by developers change over time. In the edgeR manual it doesn't recommend to filter on cpms, it uses filterByExpr. It is on you to follow to best practices in the manuals or do something custom. Please see the manuals of both edgeR and DESeq2, they contain code suggestions on prefiltering.

ADD REPLY

Login before adding your answer.

Traffic: 572 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6