filtering out lowly expressed genes by groups
Entering edit mode
sohail.mbio ▴ 10
Last seen 5 weeks ago
United States

Hello DESeq2 community,

I'm working on a complex RNA-seq experiment with a large design, involving 5 different timepoints, 2 different stimulation conditions, and 2 disease states, resulting in a total of 96 samples. In my initial preprocessing steps, I've filtered out genes with low expression (<100 counts).

However, as I delve into differential expression analysis, I've encountered a situation where I'm observing many differentially expressed genes (DEGs) with adjusted p-values (padj) < 0.1 in certain comparisons. What's intriguing is that most of these genes have zero transcript counts in the majority of samples within those specific comparison groups. These genes were not filtered out initially because they exhibit expression in other groups. For example (plot below), Deseq2 DE show significance in first two groups, even there was only 1 sample with few transcripts. enter image description here

Now my question is how to get rid of these DEGs? As these are not really DEGs.

RNASeqData DESeq2 • 329 views
Entering edit mode

Is there a question?

Entering edit mode

My question is how to avoid these DEGs? There are so many DEGs that are not actually expressed at T0. Is there any filter ? my current code is `dds <- DESeqDataSetFromMatrix(countData = cts, colData = coldata, design = ~ Disease + Treatment + Timepoint)

dds <- dds[ rowSums(counts(dds)) > 100, ] dds <- DESeq(dds) dds$group <- factor(paste0(dds$Treatment, dds$Timepoint, dds$Disease)) design(dds) <- ~ group dds <- DESeq(dds)


res<- results(dds, contrast = c("group","unstimT0CRSwNP", "unstimT0HC"), independentFiltering=TRUE)


Entering edit mode
Last seen 15 hours ago
United States

Something like

results(dds, c("condition","T0_unstim_CRWnP","T0_unstim_HC"))[rowMeans(assays(dds, "counts")[,colData(dds)$condition %in% c("T0_unstim_CRWnP","T0_unstim_HC")] > 10,]

Should do the trick, I imagine. You could even make a function

fun <- function(deseqobj, contvec, cutoff) 
        results(deseqobj, contvec)[rowMeans(assays(deseqobj, "counts")[,colData(dds)[,contvec[1]] %in% contvec[-1])] > cutoff,]

## and run it

fun(dds, c("condition","T0_unstim_CRWnP","T0_unstim_HC"))

None of this is tested, btw, so you might need to adjust.

Entering edit mode

You might want to filter first though.

fun <- function(deseqobj, contvec, cutoff) {
            lildds <- dds[rowMeans(assays(deseqobj, "counts")[,colData(dds)[,contvec[1]] %in% contvec[-1])] > cutoff,], 
             results(lildds, contvec)

Which won't invalidate your FDR values.


Login before adding your answer.

Traffic: 668 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6