Exporting DESeq2 filtered results
0
Entering edit mode
Laura • 0
@laura-24858
Last seen 12 hours ago

Hello,

I'm performing differential gene expression analysis on my bacterial RNA samples and I'm having some troubles exporting the results. I'm using DESeq2 version 1.30.0 and R 4.0.3.

I've generated my results following the vignette's workflow. Briefly put, this is my last chunk of code:

> dds <- DESeq(dds)
> res <- results(dds, alpha = 0.05, lfcThreshold = 1)
> summary(res)
out of 5546 with nonzero total read count
LFC > 1.00 (up)    : 29, 0.52%
LFC < -1.00 (down) : 7, 0.13%
outliers [1]       : 0, 0%
low counts [2]     : 2258, 41%
(mean count < 38)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results


So with the filters I've set (padj < 0.05 and lfcThreshold = 1) 36 genes in total are differentially expressed. I want to export in a CSV file the results data frame of only these 36 DE genes, but after running:

> resFilt <- res[which(res$padj < 0.05 & abs(res$log2FoldChange) > 1), ]
> write.csv(resFilt, file="DE_results_filtered.csv")


the resulting csv has 202 rows. Is this because creating a subset of the results doesn't account for the independent filtering? How can I export the results of only those 36 DE genes?

DESeq2 • 135 views
0
Entering edit mode
ATpoint • 480
@atpoint-13662
Last seen just now
Germany

You have to remove the NAs, use na.omit.

0
Entering edit mode

Thanks! Although I don't have NAs in the output CSV. Please, could you specify in which step should I remove NAs?

0
Entering edit mode

I'd recommend to work in R and figuring things out there before the CSV.

You should have concordance between the summary table and these sums, is that the case?

summary(res)
sum(res$padj < 0.05 & res$log2FoldChange > 0, na.rm=TRUE)
sum(res$padj < 0.05 & res$log2FoldChange < 0, na.rm=TRUE)

0
Entering edit mode

Sorry for the late reply. The results of the summary table and the sums after removing NAs are indeed the same:

> summary(res)
out of 5546 with nonzero total read count
LFC > 0 (up)       : 189, 3.4%
LFC < 0 (down)     : 64, 1.2%
outliers [1]       : 0, 0%
low counts [2]     : 1291, 23%
(mean count < 15)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
> sum(res$padj < 0.1 & res$log2FoldChange > 0, na.rm=TRUE)
189
> sum(res$padj < 0.1 & res$log2FoldChange < 0, na.rm=TRUE)
64


To simplify things I'm trying to first filter by padj to see if I can subset only those genes without the NAs. So with a padj < 0.05 there are 200 DE genes in total:

> res1 <- results(dds, alpha=0.05)
> summary(res1)
out of 5546 with nonzero total read count
LFC > 0 (up)       : 163, 2.9%
LFC < 0 (down)     : 37, 0.67%
outliers [1]       : 0, 0%
low counts [2]     : 1076, 19%
(mean count < 11)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
> sum(res1$padj < 0.05 & res1$log2FoldChange > 0, na.rm=TRUE)
163
> sum(res1$padj < 0.05 & res1$log2FoldChange < 0, na.rm=TRUE)
37


If I subset without removing NAs I obtain 203 DE genes. If I try to remove NAs with res = na.omit(res) and then subsetting with resFilt <- subset(res, padj < 0.05) I also obtain 203 genes, so NAs are still in my output.

I have tried other ways of removing NAs inside the subset function, such as resFilt <- subset(res, !is.na(padj < 0.05)), but I always end up with this output in the summary table:

out of 4255 with nonzero total read count
LFC > 0 (up)       : 189, 4.4%
LFC < 0 (down)     : 64, 1.5%
outliers [1]       : 0, 0%
low counts [2]     : 0, 0%
(mean count < 15)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results


With the same number of DE genes as in the res unfiltered object, but with less rows.

I'm sure there is a correct way of removing the NAs for subsetting the filtered results, but I can't seem to find it. Any help will be greatly appreciated :)

0
Entering edit mode

I'm a little lost, so from the top, you have the right numbers by summary and sum. What about:

resFilt <- res[ which(res$padj < 0.05), ] nrow(resFilt)  ADD REPLYlink 0 Entering edit mode That gives me 203 rows, but as I showed in the last post, there should be 200 DEGs using a padj < 0.05, so I guess that the result of resFilt <- res[ which(res$padj < 0.05), ] still includes NAs (although I don't see any in the CSV).

0
Entering edit mode

What about length(which(res$padj < 0.05)) and sum(res$padj < 0.05, na.rm=TRUE)? I'm confused why you're getting extra rows.

0
Entering edit mode

I'm also very confused. I'll leave you here the same code as above so you can see again the results easier:

> res.05 <- results(dds, alpha = 0.05)
> summary(res.05)
out of 5546 with nonzero total read count
LFC > 0 (up)       : 163, 2.9%
LFC < 0 (down)     : 37, 0.67%
outliers [1]       : 0, 0%
low counts [2]     : 1076, 19%
(mean count < 11)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
> length(which(res$padj < 0.05)) 203 > sum(res$padj < 0.05, na.rm=TRUE)
203


And also including the log2FC cutoff:

> resLFC <- results(dds, alpha = 0.05, lfcThreshold = 1)
> summary(resLFC)
out of 5546 with nonzero total read count
LFC > 1.00 (up)    : 29, 0.52%
LFC < -1.00 (down) : 7, 0.13%
outliers [1]       : 0, 0%
low counts [2]     : 2258, 41%
(mean count < 38)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
# There are 35 DEGs with padj <0.05 and log2Threshold = 1
# I'm note sure if the syntax of the next lines is correct, but the output is even more confusing
> length(which(res$padj < 0.05 & abs(res$log2FoldChange > 1)))
165
> sum(res$padj < 0.05 & abs(res$log2FoldChange > 1), na.rm=TRUE)
165

0
Entering edit mode

The last output is explained because you didn't use resLFC.

I suppose 3 of the LFC could be set to 0, depending on the design and the test (I can't see this information above). Can you show the design, how you ran DESeq() and how you ran results()?

0
Entering edit mode

Finally got it! Turns out I was trying to subset using the res unfiltered object like subset(res, padj < 0.05) instead of the filtered object (such as resLFC in the last chunk of code) like subset(resLFC, padj < 0.05). Sorry for the confusion and thank you so much for the help, your last comment was what clicked for me.