Question

need help regarding differential expression analysis

0

Entering edit mode

Rishav ▴ 20

@rishav-25031

Last seen 14 months ago

Patna

Hi, everyone I am doing differential expression analysis using DESeq2. total genes that are differentially expressed: 8126 out of 60663 genes. Now, I want to lower these numbers by filtering out more genes. I want to find out about 2000 genes that are statistically more differentially expressed. Please help me regarding this concern. Below are the codes that I used


# hbv <- read.csv("hbv.csv")
> dim(hbv)
[1] 60664    43
> library(DESeq2)
> set.seed(1)
> colData <- DataFrame(condition=factor(c("ctrl", "treat", "ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat", "ctrl", "treat", "ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat")))
> countDataMatrix <- as.matrix(hbv[ , -1])
> rownames(countDataMatrix) <- hbv[ , 1]
> dds <- DESeqDataSetFromMatrix(countDataMatrix, colData,formula(~condition))
> dds <- DESeq(dds)
> res <- results(dds)
> resOrdered <- res[order(res$padj),]
> sig <- resOrdered[!is.na(resOrdered$padj) & resOrdered$padj<0.10 & abs(resOrdered$log2FoldChange)>=1,]
> dim(sig)
[1] 8126    6

rna-seq • 1.0k views

ADD COMMENT • link updated 2.9 years ago by Kevin Blighe ★ 3.9k • written 2.9 years ago by Rishav ▴ 20

score 1 · Answer 1 · 2021-05-22

1

Entering edit mode

Kevin Blighe ★ 3.9k

@kevin

Last seen 4 days ago

Republic of Ireland

Hi,

This question is more about basic / general usage in R, and is not related to any technical issue of the package being used (DESeq2). In any case, permit that I respond.

Firstly, lines like this can be somewhat 'risky':

colData <- DataFrame(condition=factor(c("ctrl", "treat", "ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat", "ctrl", "treat", "ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat")))

Risky, why? - it is risky because it is assumed that this is aligned to the columns of your expression data, hbv, but you have no checks implemented to guarantee this. Please verify that it is aligned.

Secondly, please apply log fold-change shrinkage. Take a look at Quick start in the DESeq2 vignette.

Thirdly, your current filter is resOrdered$padj<0.10 & abs(resOrdered$log2FoldChange)>=1. If you want less genes, then simply modify these values to, for example 0.05 and 2, respectively, or, use 0.05 and 1, and then take the top 2000 genes ranked by absolute shrunken fold change, if that is all that you want.

Kevin

ADD COMMENT • link 2.9 years ago Kevin Blighe ★ 3.9k

1

Entering edit mode

Thank you very much

ADD REPLY • link 2.9 years ago Rishav ▴ 20

0

Entering edit mode

But one thing I want to ask, why I can't set my experimental labels like this :

colData <- DataFrame(condition=factor(c("ctrl", "treat", "ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat", "ctrl", "treat", "ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat")))

If I don't have a meta file, then is it wrong to create experimental labels comparing two conditions like this?

ADD REPLY • link 2.9 years ago Rishav ▴ 20

1

Entering edit mode

You can do it this way; however, you have to be 100% certain that this metadata is aligned perfectly to the expression data.

ADD REPLY • link 2.9 years ago Kevin Blighe ★ 3.9k