suggest to report differential count estimate in DESeq2
1
0
Entering edit mode
Yongqing • 0
@b01bffdc
Last seen 14 days ago
Hong Kong

Hi,

I am writing a post because I encountered an interesting question when using DESeq2. I want to analyze RNA-seq data gained from samples of two groups: control and treatment (inhibitor of a kinase). I want to perform differential expression analysis and find out what the main role of this kinase is. I know many people would do it based on fold change. However, I noticed that some genes have a very small baseMean, though they have a large fold change. I am more interested in the differential counts of genes than the fold change because some genes that have a large baseMean might have a huge influence on the cell even without a large fold change. Also, in cells, living actually means a lot of chemical reactions going on. Suppose we have a molecule with a very large number in a cell, its number might be prevented from continuing to grow because of the regulatory networks within the cell - or if its number dropped, say four-fold in the cell, the cell would have died, which would make it harder to achieve as large a fold change as a molecule with a very small BaseMean, but this molecule is important. I tried to use baseMean * (2^(log2FoldChange) - 1) as an approximation of the count change, however, the baseMean would be the same for a gene even if the reference group changes, which would make my approximation less accurate when the baseMean becomes larger. But if there is a fold change, this software has estimated the mean of the control group, the mean of the treatment group, and the differential count. So I am writing to ask about the possibility of reporting these numbers in DESeq2. I believe it would help a lot of downstream lab work and benefit future biomedical discoveries. I have also attached key points about my current work in case you are interested and want to understand more about my question.

Best regards,

Yongqing Miao

key points of my work - Understanding the Role of Dual Leucine Zipper Kinase (DLK) in Neurotrophin Deprivation

• Analyzed RNA-seq data (30 million of single end 50 base pair reads per library) of interest (14 samples) from GSE95672 on the GEO webpage (cell: mice dorsal root ganglions, control: nerve growth factor (NGF) + dimethyl sulfoxide (DMSO), treatment 1: anti-NGF + DMSO, treatment 2: anti-NGF + DLK inhibitor (DLKi), culture time: 4 days)
• Used FastQC to perform quality control for fastq files, found the sequence duplication levels were high, and deduplicated fastq files with czid-dedup
• Used HISAT2 for allignment with GRCm38 as the reference genome and used FeatureCounts to summarize counts for genes
• Used Bayesian method based on mRNA counts provided by DESeq2 for differential expression analysis
• Set gene set 1 as genes whose expressions changed in the control group compared to treatment group 1, set gene set 2 as genes whose expressions changed in treatment group 2 compared to treatment group 1, and took the intersection of the two gene sets for pathway analysis with 0.025 as the false discovery rate for both gene sets
• Used baseMean * (2^{log2FoldChange} - 1) as an approximation of expression level change for significant genes and included genes that had more than 4000 approximated differential expression counts between treatment 1 and treatment 2 for annotation and pathway analysis
• Found that in neurotrophin deprivation, DLK may mainly lead to reduced mRNA levels of enzymes in lipid synthesis as well as reduced mRNA levels of tubulin (especially class 3), NGF receptors, and proteins that relate to axon and synapse (all genes mentioned above have more than 10000 approximated count change)
• Found that in neurotrophin deprivation, DLK may mainly lead to increased mRNA levels of stathmin-like 4, collagens, heat shock protein 5, transcription factor Jun, caspase 3, BCL2 binding component 3, and chloride channel voltage-sensitive 3 (all genes mentioned above have more than 4500 approximated count change)
• Performed pathway analysis in DAVID and STRING and indicated a dual role of DLK - it might help the neurons survive in short-term neurotrophin deprivation, but also make some preparation for the neurons' apoptosis
DESeq2 • 225 views
0
Entering edit mode
ATpoint ★ 2.0k
@atpoint-13662
Last seen 8 hours ago
Germany

Hi, it is unclear to me what you think that fold change is compared to "count change", "differential counts" and "expression level change". In any case, the supported and recommended ways of using DESeq2 for differential testing (that is the Wald test and the LRT) are outlined in detail in the manual. What DESeq2 tests are the counts you give it, any information what is going on in the cell that might bias the counts are outside of the scope of a statistical framework. This has to be accounted for by the preprocessing or by adding appropriate covariates to the design in case that applies. If results make sense or not in your particular situation must be worked out using your scientific expertise.

The issue of having large fold changes with large standard errors due to small counts is well-known. DESeq2 will take that into account when calculating significances. In fact, estimating expected dispersions across the full range of the baseMean as a basis for differential analysis is a main point of the method DESeq2 implements, in other words, the software takes care of different expression levels between genes.

Explicit shrinkage of the fold changes for downstream analysis ranking purposes is already supported, see lfcShrink() in the manual "get rid of" large fold changes that are noisy while preserving those fold changes with good data-driven support.

0
Entering edit mode

Hi, thank you very much for your reply. I understand that DESeq2 uses proper statistical tests to explore differential expression data. My point is that if DESeq2 can report a "fold change", it should have estimated two values for counts (for example, control: 20 counts, treatment: 80 counts, so that fold change would be: 4, log2FC: 2). I am here to suggest to report these 2 numbers (i.e. 20 and 80 in my example, or it can also report 80-20 = 60, which is what I refer as "differential counts" between the 2 groups).

0
Entering edit mode

suggest to report differential estimate in edgeR

Not all linear models have group means that are used to derive the log fold change.

For example, even ~batch + condition, you cannot compute the LFC just by looking at two average counts.

0
Entering edit mode

Hi Mike, thank you very much for your reply. I have one more question: when doing DE with DESeq2, would p-value be preferable to LFC?