Question

what should I do with many zero counts from Salmon quantification

0

Entering edit mode

lkianmehr • 0

@lkianmehr-16873

Last seen 2.9 years ago

Iran

I have quantified the RNA-seq samples by Salmon. 2 groups are wild-type and 4 groups are Dnmt2 knocked-out. I've put all in one dataset for DE analysis. box plot of their normalized counts shows the median of knocked-out samples are zero, and maximum of 20 reads are assigned to each transcript. now to perform DE by DESeq2 I have 2 questions: 1- whether zero values should be deleted before? 2- to do DE what minimum of counts has to be selected?

thanks in advance

Salmon DESeq2 normalization r • 2.8k views

ADD COMMENT • link updated 5.3 years ago by Michael Love 42k • written 5.3 years ago by lkianmehr • 0

score 2 · Accepted Answer · 2019-01-23

2

Entering edit mode

Michael Love 42k

@mikelove

Last seen 13 hours ago

United States

We have some minimal filtering code in the DESeq2 vignette you can take a look at. There's no point running a DE method when all the counts are 0 obviously, and you can additionally filter out genes which have very small counts for all samples, because these don't have enough precision for estimation of the LFC. A common rule is, for example a count of 10 or more in at least 3 or more samples. However, it does depend a bit on the dataset, for example UMI deduplicated data has counts < 10 which nevertheless give some precision to estimating the LFCs.

It is very suspicious that the maximum transcript count for a sample is 20. It is typically in the 100,000+ range for standard bulk RNA-seq of human or mouse. That there are many zeros is typical and expected, because cell types or tissues express only a subset of the transcripts in the genome.

ADD COMMENT • link 5.3 years ago Michael Love 42k

0

Entering edit mode

Sorry, I think, I made a mistake cause I've calculated log2(1+counts), and made a box plot. it's y axis is between 0 to 20. what does it mean?

ADD REPLY • link 5.3 years ago lkianmehr • 0

1

Entering edit mode

log2 of 20 is typical. No problem with this data, or anything you've described above.

Perhaps you can look at the vignette and workflow so you get an idea of what typical RNA-seq count datasets look like.

ADD REPLY • link 5.3 years ago Michael Love 42k

0

Entering edit mode

Excuse me, to make plotMA with res following DESeq2 vignette, it makes a plot based on expression log ratio and log expression, not LFC and mean of normalized counts, plotMA(ddsTxi, alpha= 0.1, main = "", xlab = "mean of normalized counts", ylim, mle = TRUE) I face with an error: Error in as.vector(x) : no method for coercing this S4 class to a vector. whether it should be converted to data.frame?

ADD REPLY • link 5.3 years ago lkianmehr • 0

0

Entering edit mode

What is class(ddsTxi)?

If it is a DESeqDataSet or DESeqResults object it should work.

ADD REPLY • link 5.3 years ago Michael Love 42k

0

Entering edit mode

yes, it's a DESeqDataSet. but it doesn't work!

ADD REPLY • link 5.3 years ago lkianmehr • 0

1

Entering edit mode

Can you try DESeq2::plotMA(). Maybe you are using another package that masks our plotMA method.

ADD REPLY • link 5.3 years ago Michael Love 42k

0

Entering edit mode

exactly, it works. thanks alot

ADD REPLY • link 5.3 years ago lkianmehr • 0

0

Entering edit mode

Excuse me, I would appreciate if help me to making a heatmap of the count matrix which I performed according to DESeq2 vignette,

select <- order(rowMeans(counts(ddstxi,normalized=TRUE)), decreasing=TRUE)[1:20]

df <- as.data.frame(colData(ddstxi)[,c("group")])

pheatmap(assay(ntd)[select,], clusterrows=FALSE, showrownames=FALSE, clustercols=FALSE, annotationcol=df)

but it face with this error,

Error in check.length("fill") : 'gpar' element 'fill' must not be length 0

ADD REPLY • link 5.3 years ago lkianmehr • 0

0

Entering edit mode

Not sure of that error. It’s coming from pheatmap not DESeq2 so check what you are inputting and check the help files from that package.

ADD REPLY • link 5.3 years ago Michael Love 42k