need help regarding differential expression analysis
1
0
Entering edit mode
Rishav ▴ 20
@rishav-25031
Last seen 14 months ago
Patna

Hi, everyone I am doing differential expression analysis using DESeq2. total genes that are differentially expressed: 8126 out of 60663 genes. Now, I want to lower these numbers by filtering out more genes. I want to find out about 2000 genes that are statistically more differentially expressed. Please help me regarding this concern. Below are the codes that I used


# hbv <- read.csv("hbv.csv")
> dim(hbv)
[1] 60664    43
> library(DESeq2)
> set.seed(1)
> colData <- DataFrame(condition=factor(c("ctrl", "treat", "ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat", "ctrl", "treat", "ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat")))
> countDataMatrix <- as.matrix(hbv[ , -1])
> rownames(countDataMatrix) <- hbv[ , 1]
> dds <- DESeqDataSetFromMatrix(countDataMatrix, colData,formula(~condition))
> dds <- DESeq(dds)
> res <- results(dds)
> resOrdered <- res[order(res$padj),]
> sig <- resOrdered[!is.na(resOrdered$padj) & resOrdered$padj<0.10 & abs(resOrdered$log2FoldChange)>=1,]
> dim(sig)
[1] 8126    6
rna-seq • 1.0k views
ADD COMMENT
1
Entering edit mode
Kevin Blighe ★ 3.9k
@kevin
Last seen 4 days ago
Republic of Ireland

Hi,

This question is more about basic / general usage in R, and is not related to any technical issue of the package being used (DESeq2). In any case, permit that I respond.

Firstly, lines like this can be somewhat 'risky':

colData <- DataFrame(condition=factor(c("ctrl", "treat", "ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat", "ctrl", "treat", "ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat")))

Risky, why? - it is risky because it is assumed that this is aligned to the columns of your expression data, hbv, but you have no checks implemented to guarantee this. Please verify that it is aligned.

Secondly, please apply log fold-change shrinkage. Take a look at Quick start in the DESeq2 vignette.

Thirdly, your current filter is resOrdered$padj<0.10 & abs(resOrdered$log2FoldChange)>=1. If you want less genes, then simply modify these values to, for example 0.05 and 2, respectively, or, use 0.05 and 1, and then take the top 2000 genes ranked by absolute shrunken fold change, if that is all that you want.

Kevin

ADD COMMENT
1
Entering edit mode

Thank you very much

ADD REPLY
0
Entering edit mode

But one thing I want to ask, why I can't set my experimental labels like this :

colData <- DataFrame(condition=factor(c("ctrl", "treat", "ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat", "ctrl", "treat", "ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat","ctrl", "treat")))

If I don't have a meta file, then is it wrong to create experimental labels comparing two conditions like this?

ADD REPLY
1
Entering edit mode

You can do it this way; however, you have to be 100% certain that this metadata is aligned perfectly to the expression data.

ADD REPLY

Login before adding your answer.

Traffic: 596 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6