Question

Problem with low reads count

0

Entering edit mode

francesco.biagi • 0

@francescobiagi-17032

Last seen 7.2 years ago

Hello,

I'm a master student in a biological-related field and I'm using DESeq2 for differential gene expression analysis; when I evaluate the result of the output 3 genes results as very down-regulated, however if I look at the read counts there are 4 reads in just one of the 8 cells of untreated samples and no reads in the 26 cells of the treated samples. A similar thing happens for the other two genes. Reading the guide that you provide I found that no pre-filtering of the read counts is needed rather is better to give the reads as they are obtained to DESeq2; however it seems that the software has a bias due to the abovementioned results. I have not further investigated the problem, I just report it.

Data that I used are freely available on CommonMind Consortium: https://www.synapse.org/#!Synapse:syn11617751, related metadata are https://www.synapse.org/#!Synapse:syn11638462; as model I used Dx (treatment received by monkeys) and as covariates Sex and DLPFC_RNA_isolation_Batch as reported in https://www.ncbi.nlm.nih.gov/pubmed/27668389 (online methods related to monkeys).

Regards,

Francesco Biagi

deseq2 • 679 views

ADD COMMENT • link updated 7.3 years ago by Michael Love 43k • written 7.3 years ago by francesco.biagi • 0

score 0 · Answer 1 · 2018-08-23

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 4 days ago

United States

It’s surprising that these pass through as significant (are they significant? You didn’t mention this) with such low read counts.

If they are significant, you can use a pre-filter such as, must have a count of 5 or 10 in at least 3 samples:

keep <- rowSums(counts(dds) >= 5) >= 3

It’s not “better” to give all the genes, it’s just that filtering is not a requirement for the method to work.

ADD COMMENT • link 7.3 years ago Michael Love 43k

0

Entering edit mode

Hello MIcheal,

sorry for the delay, yes they are all significant. Yes, I have already fixed the problem, this post was ment in order to make aware mantainers of that problem, because as they said in their manual this pre-filtering step should not be needed at all.. as you said: "..filtering is not a requirement for the method to work."

However thank you a lot for you help.

Best Regards,

Francesco

ADD REPLY • link 7.2 years ago francesco.biagi • 0

0

Entering edit mode

Thanks for the post. Could you also say what version of DESeq2 you use? Are you using a Wald test (the default)? Just simple two group comparison? I wouldn’t think that these would have very small pvalues at all.

ADD REPLY • link 7.2 years ago Michael Love 43k

0

Entering edit mode

Is this similar to the setup you describe? I'm trying to figure out what range of fitted parameters would give a small p-value for such a gene.

dds <- makeExampleDESeqDataSet(m=34)
dds$condition <- factor(rep(1:2,c(8,26)))
counts(dds)[1,] <- rep(c(4L,0L),c(1,33))

> counts(dds)[1,]
 sample1  sample2  sample3  sample4  sample5  sample6  sample7  sample8  sample9 sample10 sample11
       4        0        0        0        0        0        0        0        0        0        0
sample12 sample13 sample14 sample15 sample16 sample17 sample18 sample19 sample20 sample21 sample22
       0        0        0        0        0        0        0        0        0        0        0
sample23 sample24 sample25 sample26 sample27 sample28 sample29 sample30 sample31 sample32 sample33
       0        0        0        0        0        0        0        0        0        0        0
sample34
       0

dds <- DESeq(dds)
res <- results(dds)

> res[1,]
log2 fold change (MLE): condition 2 vs 1
Wald test p-value: condition 2 vs 1
DataFrame with 1 row and 6 columns
               baseMean    log2FoldChange            lfcSE               stat            pvalue
              <numeric>         <numeric>        <numeric>          <numeric>         <numeric>
gene1 0.109431114293273 -1.42793623765064 3.49906906619771 -0.408090326494275 0.683207361697624
                   padj
              <numeric>
gene1 0.975948164280249

> packageVersion("DESeq2")
[1] ‘1.20.0’

ADD REPLY • link 7.2 years ago Michael Love 43k