what should I do with many zero counts from Salmon quantification
1
0
Entering edit mode
lkianmehr • 0
@lkianmehr-16873
Last seen 3.6 years ago
Iran

I have quantified the RNA-seq samples by Salmon. 2 groups are wild-type and 4 groups are Dnmt2 knocked-out. I've put all in one dataset for DE analysis. box plot of their normalized counts shows the median of knocked-out samples are zero, and maximum of 20 reads are assigned to each transcript. now to perform DE by DESeq2 I have 2 questions: 1- whether zero values should be deleted before? 2- to do DE what minimum of counts has to be selected?

thanks in advance

Salmon DESeq2 normalization r • 3.8k views
ADD COMMENT
2
Entering edit mode
@mikelove
Last seen 1 day ago
United States

We have some minimal filtering code in the DESeq2 vignette you can take a look at. There's no point running a DE method when all the counts are 0 obviously, and you can additionally filter out genes which have very small counts for all samples, because these don't have enough precision for estimation of the LFC. A common rule is, for example a count of 10 or more in at least 3 or more samples. However, it does depend a bit on the dataset, for example UMI deduplicated data has counts < 10 which nevertheless give some precision to estimating the LFCs.

It is very suspicious that the maximum transcript count for a sample is 20. It is typically in the 100,000+ range for standard bulk RNA-seq of human or mouse. That there are many zeros is typical and expected, because cell types or tissues express only a subset of the transcripts in the genome.

ADD COMMENT
0
Entering edit mode

Sorry, I think, I made a mistake cause I've calculated log2(1+counts), and made a box plot. it's y axis is between 0 to 20. what does it mean?

ADD REPLY
1
Entering edit mode

log2 of 20 is typical. No problem with this data, or anything you've described above.

Perhaps you can look at the vignette and workflow so you get an idea of what typical RNA-seq count datasets look like.

ADD REPLY
0
Entering edit mode

Excuse me, to make plotMA with res following DESeq2 vignette, it makes a plot based on expression log ratio and log expression, not LFC and mean of normalized counts, plotMA(ddsTxi, alpha= 0.1, main = "", xlab = "mean of normalized counts", ylim, mle = TRUE) I face with an error: Error in as.vector(x) : no method for coercing this S4 class to a vector. whether it should be converted to data.frame?

ADD REPLY
0
Entering edit mode

What is class(ddsTxi)?

If it is a DESeqDataSet or DESeqResults object it should work.

ADD REPLY
0
Entering edit mode

yes, it's a DESeqDataSet. but it doesn't work!

ADD REPLY
1
Entering edit mode

Can you try DESeq2::plotMA(). Maybe you are using another package that masks our plotMA method.

ADD REPLY
0
Entering edit mode

exactly, it works. thanks alot

ADD REPLY
0
Entering edit mode

Excuse me, I would appreciate if help me to making a heatmap of the count matrix which I performed according to DESeq2 vignette,

select <- order(rowMeans(counts(ddstxi,normalized=TRUE)), decreasing=TRUE)[1:20]

df <- as.data.frame(colData(ddstxi)[,c("group")])

pheatmap(assay(ntd)[select,], clusterrows=FALSE, showrownames=FALSE, clustercols=FALSE, annotationcol=df)

but it face with this error,

Error in check.length("fill") : 'gpar' element 'fill' must not be length 0

ADD REPLY
0
Entering edit mode

Not sure of that error. It’s coming from pheatmap not DESeq2 so check what you are inputting and check the help files from that package.

ADD REPLY

Login before adding your answer.

Traffic: 680 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6