what should I do with many zero counts from Salmon quantification
1
0
Entering edit mode
lkianmehr • 0
@lkianmehr-16873
Last seen 4 months ago
Iran

I have quantified the RNA-seq samples by Salmon. 2 groups are wild-type and 4 groups are Dnmt2 knocked-out. I've put all in one dataset for DE analysis. box plot of their normalized counts shows the median of knocked-out samples are zero, and maximum of 20 reads are assigned to each transcript. now to perform DE by DESeq2 I have 2 questions: 1- whether zero values should be deleted before? 2- to do DE what minimum of counts has to be selected?

Salmon DESeq2 normalization r • 717 views
2
Entering edit mode
@mikelove
Last seen 2 hours ago
United States

We have some minimal filtering code in the DESeq2 vignette you can take a look at. There's no point running a DE method when all the counts are 0 obviously, and you can additionally filter out genes which have very small counts for all samples, because these don't have enough precision for estimation of the LFC. A common rule is, for example a count of 10 or more in at least 3 or more samples. However, it does depend a bit on the dataset, for example UMI deduplicated data has counts < 10 which nevertheless give some precision to estimating the LFCs.

It is very suspicious that the maximum transcript count for a sample is 20. It is typically in the 100,000+ range for standard bulk RNA-seq of human or mouse. That there are many zeros is typical and expected, because cell types or tissues express only a subset of the transcripts in the genome.

0
Entering edit mode

Sorry, I think, I made a mistake cause I've calculated log2(1+counts), and made a box plot. it's y axis is between 0 to 20. what does it mean?

1
Entering edit mode

log2 of 20 is typical. No problem with this data, or anything you've described above.

Perhaps you can look at the vignette and workflow so you get an idea of what typical RNA-seq count datasets look like.

0
Entering edit mode

Excuse me, to make plotMA with res following DESeq2 vignette, it makes a plot based on expression log ratio and log expression, not LFC and mean of normalized counts, plotMA(ddsTxi, alpha= 0.1, main = "", xlab = "mean of normalized counts", ylim, mle = TRUE) I face with an error: Error in as.vector(x) : no method for coercing this S4 class to a vector. whether it should be converted to data.frame?

0
Entering edit mode

What is class(ddsTxi)?

If it is a DESeqDataSet or DESeqResults object it should work.

0
Entering edit mode

yes, it's a DESeqDataSet. but it doesn't work!

1
Entering edit mode

Can you try DESeq2::plotMA(). Maybe you are using another package that masks our plotMA method.

0
Entering edit mode

exactly, it works. thanks alot

0
Entering edit mode

Excuse me, I would appreciate if help me to making a heatmap of the count matrix which I performed according to DESeq2 vignette,

select <- order(rowMeans(counts(ddstxi,normalized=TRUE)), decreasing=TRUE)[1:20]

df <- as.data.frame(colData(ddstxi)[,c("group")])

pheatmap(assay(ntd)[select,], clusterrows=FALSE, showrownames=FALSE, clustercols=FALSE, annotationcol=df)

but it face with this error,

Error in check.length("fill") : 'gpar' element 'fill' must not be length 0

0
Entering edit mode

Not sure of that error. It’s coming from pheatmap not DESeq2 so check what you are inputting and check the help files from that package.