Question: what should I do with many zero counts from Salmon quantification
0
12 weeks ago by
lkianmehr0 wrote:

I have quantified the RNA-seq samples by Salmon. 2 groups are wild-type and 4 groups are Dnmt2 knocked-out. I've put all in one dataset for DE analysis. box plot of their normalized counts shows the median of knocked-out samples are zero, and maximum of 20 reads are assigned to each transcript. now to perform DE by DESeq2 I have 2 questions: 1- whether zero values should be deleted before? 2- to do DE what minimum of counts has to be selected?

normalization deseq2 R salmon • 192 views
modified 12 weeks ago by Michael Love22k • written 12 weeks ago by lkianmehr0
Answer: what should I do with many zero counts from Salmon quantification
2
12 weeks ago by
Michael Love22k
United States
Michael Love22k wrote:

We have some minimal filtering code in the DESeq2 vignette you can take a look at. There's no point running a DE method when all the counts are 0 obviously, and you can additionally filter out genes which have very small counts for all samples, because these don't have enough precision for estimation of the LFC. A common rule is, for example a count of 10 or more in at least 3 or more samples. However, it does depend a bit on the dataset, for example UMI deduplicated data has counts < 10 which nevertheless give some precision to estimating the LFCs.

It is very suspicious that the maximum transcript count for a sample is 20. It is typically in the 100,000+ range for standard bulk RNA-seq of human or mouse. That there are many zeros is typical and expected, because cell types or tissues express only a subset of the transcripts in the genome.

Sorry, I think, I made a mistake cause I've calculated log2(1+counts), and made a box plot. it's y axis is between 0 to 20. what does it mean?

1

log2 of 20 is typical. No problem with this data, or anything you've described above.

Perhaps you can look at the vignette and workflow so you get an idea of what typical RNA-seq count datasets look like.

Excuse me, to make plotMA with res following DESeq2 vignette, it makes a plot based on expression log ratio and log expression, not LFC and mean of normalized counts, plotMA(ddsTxi, alpha= 0.1, main = "", xlab = "mean of normalized counts", ylim, mle = TRUE) I face with an error: Error in as.vector(x) : no method for coercing this S4 class to a vector. whether it should be converted to data.frame?

What is class(ddsTxi)?

If it is a DESeqDataSet or DESeqResults object it should work.

yes, it's a DESeqDataSet. but it doesn't work!

1

Can you try DESeq2::plotMA(). Maybe you are using another package that masks our plotMA method.

exactly, it works. thanks alot

Excuse me, I would appreciate if help me to making a heatmap of the count matrix which I performed according to DESeq2 vignette,

select <- order(rowMeans(counts(ddstxi,normalized=TRUE)), decreasing=TRUE)[1:20]

df <- as.data.frame(colData(ddstxi)[,c("group")])

pheatmap(assay(ntd)[select,], clusterrows=FALSE, showrownames=FALSE, clustercols=FALSE, annotationcol=df)

but it face with this error,

Error in check.length("fill") : 'gpar' element 'fill' must not be length 0