Remove low abundant orfs:

Question

small log2foldChange value in the most significant genes

0

Entering edit mode

RitaB • 0

@30c7c498

Last seen 4.0 years ago

Belgium

Hello,

I am using DESeq2 to process the bacterial (metatranscriptome) response between 2 conditions. Each condition includes 5 biological replicates. After finishing the DESeq2 analysis I generated a volcano plot and saw that almost all the genes with the lowest padj actually have very low log2foldChange values. All replicates in the DPS condition have zero counts for those genes while DVM replicates mostly show quite high values. I would think that comparing between zeros in the DPS condition should produce a higher log2fold change in the DVM condition. Could you please share your opinion why would this happen?

Thank you!

Rita

enter image description here

the volcano plot

DESeq2 • 1.7k views

ADD COMMENT • link updated 4.1 years ago by Michael Love 43k • written 4.1 years ago by RitaB • 0

score 0 · Answer 1 · 2021-10-13

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 2 days ago

United States

Without seeing any code it's hard to know what happened.

I don't recommend DESeq2 for data that doesn't clearly match the RNA-seq assumptions (e.g. housekeeping genes exist and can be relied upon, data is roughly NB distributed, etc.), and so I don't recommend it to be applied to metatranscriptomic data myself.

ADD COMMENT • link 4.1 years ago Michael Love 43k

0

Entering edit mode

thank you, Michael, for your reply.

I understand the issue with the metatranscriptomic dataset and the DE analysis, however, with the complex metatranscriptome experiments, there are also not that many alternative methods to use. I did perform as many checks of the counts as possible to see if the data can fit the DESeq model and the results were not that bad.

My main issue with this is that I struggle to understand how such a difference in gene count between 2 treatments can cause such a low log2foldchange value (from my example above).

The code that I used to generate the results for the plot above:

dds <- DESeqDataSetFromMatrix(countData = data.matrix(bacterial_counts_no_outliers_bacteria_frac), colData = annotation_no_outliers_bacteria_frac, design = ~ condition)

Remove low abundant orfs:

keep = rowSums(counts(dds)) >= 10 dds_filt = dds[keep,]

Choose factor levels

dds_filt$condition=factor(dds_filt$condition, levels=c('DPS', "DVM"))

Run DESeq2

dds2=DESeq(dds_filt) res_table_DPS_DVM_shrink <- lfcShrink(dds2, coef="condition_DVM_vs_DPS", res=res_table_DVM_DPS, type="apeglm") res_table_DPS_DVM_shrink_tb <- res_table_DPS_DVM_shrink %>% data.frame() %>% rownames_to_column(var="Gene") %>% as_tibble()

res_table_DPS_DVM_shrink_tb_for_volcano_plot <- res_table_DPS_DVM_shrink_tb %>% mutate(threshold = padj < padj.cutoff & abs(log2FoldChange) >= 0.58)

please let me know if I missed something or made a mistake that could have led to such results of the log2foldchange values and such a shape of the volcano plot.

Thank you once again!

Rita

ADD REPLY • link 4.1 years ago RitaB • 0

0

Entering edit mode

Apeglm shrinkage assumes NB distributed data. That would be my guess. Maybe try ashr which doesn’t use the data likelihood, but assumes the estimated coefficients are disturbed normal.

ADD REPLY • link 4.1 years ago Michael Love 43k