Question

Exporting DESeq2 filtered results

0

Entering edit mode

Laura ▴ 10

@laura-24858

Last seen 16 days ago

Spain

Hello,

I'm performing differential gene expression analysis on my bacterial RNA samples and I'm having some troubles exporting the results. I'm using DESeq2 version 1.30.0 and R 4.0.3.

I've generated my results following the vignette's workflow. Briefly put, this is my last chunk of code:

> dds <- DESeq(dds)
> res <- results(dds, alpha = 0.05, lfcThreshold = 1)
> summary(res)
out of 5546 with nonzero total read count
adjusted p-value < 0.05
LFC > 1.00 (up)    : 29, 0.52%
LFC < -1.00 (down) : 7, 0.13%
outliers [1]       : 0, 0%
low counts [2]     : 2258, 41%
(mean count < 38)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

So with the filters I've set (padj < 0.05 and lfcThreshold = 1) 36 genes in total are differentially expressed. I want to export in a CSV file the results data frame of only these 36 DE genes, but after running:

> resFilt <- res[which(res$padj < 0.05 & abs(res$log2FoldChange) > 1), ]
> write.csv(resFilt, file="DE_results_filtered.csv")

the resulting csv has 202 rows. Is this because creating a subset of the results doesn't account for the independent filtering? How can I export the results of only those 36 DE genes?

Thank you in advance!

DESeq2 • 8.0k views

ADD COMMENT • link updated 6 months ago by James W. MacDonald 65k • written 3.1 years ago by Laura ▴ 10

score 0 · Answer 1 · 2021-02-23

0

Entering edit mode

ATpoint ★ 4.0k

@atpoint-13662

Last seen 2 hours ago

Germany

You have to remove the NAs, use na.omit.

ADD COMMENT • link 3.1 years ago ATpoint ★ 4.0k

0

Entering edit mode

Thanks! Although I don't have NAs in the output CSV. Please, could you specify in which step should I remove NAs?

ADD REPLY • link 3.1 years ago Laura ▴ 10

0

Entering edit mode

I'd recommend to work in R and figuring things out there before the CSV.

You should have concordance between the summary table and these sums, is that the case?

summary(res)
sum(res$padj < 0.05 & res$log2FoldChange > 0, na.rm=TRUE)
sum(res$padj < 0.05 & res$log2FoldChange < 0, na.rm=TRUE)

ADD REPLY • link 3.1 years ago Michael Love 41k

0

Entering edit mode

Sorry for the late reply. The results of the summary table and the sums after removing NAs are indeed the same:

> summary(res)
out of 5546 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 189, 3.4%
LFC < 0 (down)     : 64, 1.2%
outliers [1]       : 0, 0%
low counts [2]     : 1291, 23%
(mean count < 15)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
> sum(res$padj < 0.1 & res$log2FoldChange > 0, na.rm=TRUE)
189
> sum(res$padj < 0.1 & res$log2FoldChange < 0, na.rm=TRUE)
64

To simplify things I'm trying to first filter by padj to see if I can subset only those genes without the NAs. So with a padj < 0.05 there are 200 DE genes in total:

> res1 <- results(dds, alpha=0.05)
> summary(res1)
out of 5546 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 163, 2.9%
LFC < 0 (down)     : 37, 0.67%
outliers [1]       : 0, 0%
low counts [2]     : 1076, 19%
(mean count < 11)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
> sum(res1$padj < 0.05 & res1$log2FoldChange > 0, na.rm=TRUE)
163
> sum(res1$padj < 0.05 & res1$log2FoldChange < 0, na.rm=TRUE)
37

If I subset without removing NAs I obtain 203 DE genes. If I try to remove NAs with res = na.omit(res) and then subsetting with resFilt <- subset(res, padj < 0.05) I also obtain 203 genes, so NAs are still in my output.

I have tried other ways of removing NAs inside the subset function, such as resFilt <- subset(res, !is.na(padj < 0.05)), but I always end up with this output in the summary table:

out of 4255 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)       : 189, 4.4%
LFC < 0 (down)     : 64, 1.5%
outliers [1]       : 0, 0%
low counts [2]     : 0, 0%
(mean count < 15)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

With the same number of DE genes as in the res unfiltered object, but with less rows.

I'm sure there is a correct way of removing the NAs for subsetting the filtered results, but I can't seem to find it. Any help will be greatly appreciated :)

ADD REPLY • link 3.1 years ago Laura ▴ 10

0

Entering edit mode

I'm a little lost, so from the top, you have the right numbers by summary and sum. What about:

resFilt <- res[ which(res$padj < 0.05), ]
nrow(resFilt)

ADD REPLY • link 3.1 years ago Michael Love 41k

0

Entering edit mode

That gives me 203 rows, but as I showed in the last post, there should be 200 DEGs using a padj < 0.05, so I guess that the result of resFilt <- res[ which(res$padj < 0.05), ] still includes NAs (although I don't see any in the CSV).

ADD REPLY • link 3.1 years ago Laura ▴ 10

0

Entering edit mode

What about length(which(res$padj < 0.05)) and sum(res$padj < 0.05, na.rm=TRUE)? I'm confused why you're getting extra rows.

ADD REPLY • link 3.1 years ago Michael Love 41k

0

Entering edit mode

I'm also very confused. I'll leave you here the same code as above so you can see again the results easier:

> res.05 <- results(dds, alpha = 0.05)
> summary(res.05)
out of 5546 with nonzero total read count
adjusted p-value < 0.05
LFC > 0 (up)       : 163, 2.9%
LFC < 0 (down)     : 37, 0.67%
outliers [1]       : 0, 0%
low counts [2]     : 1076, 19%
(mean count < 11)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
> length(which(res$padj < 0.05))
203
> sum(res$padj < 0.05, na.rm=TRUE)
203

And also including the log2FC cutoff:

> resLFC <- results(dds, alpha = 0.05, lfcThreshold = 1)
> summary(resLFC)
out of 5546 with nonzero total read count
adjusted p-value < 0.05
LFC > 1.00 (up)    : 29, 0.52%
LFC < -1.00 (down) : 7, 0.13%
outliers [1]       : 0, 0%
low counts [2]     : 2258, 41%
(mean count < 38)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results
# There are 35 DEGs with padj <0.05 and log2Threshold = 1
# I'm note sure if the syntax of the next lines is correct, but the output is even more confusing
> length(which(res$padj < 0.05 & abs(res$log2FoldChange > 1)))
165
> sum(res$padj < 0.05 & abs(res$log2FoldChange > 1), na.rm=TRUE)
165

ADD REPLY • link 3.1 years ago Laura ▴ 10

0

Entering edit mode

The last output is explained because you didn't use resLFC.

I suppose 3 of the LFC could be set to 0, depending on the design and the test (I can't see this information above). Can you show the design, how you ran DESeq() and how you ran results()?

ADD REPLY • link 3.1 years ago Michael Love 41k

1

Entering edit mode

Finally got it! Turns out I was trying to subset using the res unfiltered object like subset(res, padj < 0.05) instead of the filtered object (such as resLFC in the last chunk of code) like subset(resLFC, padj < 0.05). Sorry for the confusion and thank you so much for the help, your last comment was what clicked for me.

ADD REPLY • link 3.1 years ago Laura ▴ 10