Specificity of gene expression
2
0
Entering edit mode
@arrigonialberto86-7608
Last seen 8.8 years ago
Italy

I was wondering if it is possible to use the 'contrast' keyword of the 'results' function of DESeq2 in order to select genes that are specifically expressed in a single biological class in cases where we have more than 2 conditions.

In this case I would just calculate contrasts for all the condition and e.g. select genes that are 'upregulated' in a single condition vs all the rest. Would this work? Would I have to correct the results for multiple comparisons again?

e.g., in this experimental design with a factor of three values (conditions):

samples <- read.table("samples.txt",sep="\t",as.is=T)
colnames(samples)<-c("sampleName","fileName","condition")
dds1 <- DESeqDataSetFromHTSeqCount(sampleTable=samples,directory="./",design= ~ condition)
dds1 <- DESeq(dds1) res <- results(dds1)

then I do pairwise comparison:

res.A.B <- results(dds1, contrast=c("condition","A","B"))
res.A.C <- results(dds1, contrast=c("condition","A","C"))
res.B.C <- results(dds1, contrast=c("condition","B","C"))

At this point can I select differentially expresses genes that are specifically expressed in e.g. 'A' only by intersecting the results? I do not need to correct for multiple comparisons? Can this same principle be extended to case where I have e.g. 10 levels to find genes that are specifically expressed in one condition (computationally it is not a great burden, since the 'parallel' option comes to the rescue)?

Thank you in advance

Alberto

differential gene expression specific expression contrast deseq2 • 1.5k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 2 days ago
United States

You can just use

results(dds, name="conditionC")

which can be interpreted as the conditionC effect over a middle point between all groups. (Note that statement is only true when the resultsNames look like: [Intercept, conditionA, conditionB, conditionC], which is what we call in the paper and documentation an "expanded model matrix".)

When there are only two groups (A and B), using name="conditionB" gives you 1/2 the log fold change between B and A (the middle point), so it's not really what anyone would want in that case. But with more groups, using 'name' gets the effect I think you are describing.

ADD COMMENT
0
Entering edit mode

Or you can do:

results(dds, contrast=list("conditionC", c("conditionA","conditionB")), listValues=c(1, -1/2))

which gives you the C vs the average of A and B.

ADD REPLY
0
Entering edit mode
@arrigonialberto86-7608
Last seen 8.8 years ago
Italy

Dear Mike, sorry to get back on that. Just to clarify (let's say we have 5 conditions) you are suggesting to find genes that are specifically expressed in conditionC by doing (using listValues as follows):

results(dds, contrast=list("conditionC", c("conditionA","conditionB","conditionD","conditionE")), listValues=c(1, -1/4))

I was wondering if this code instead could be used to obtain a more 'stringent' set of specifically expressed genes from my dataset (without listValues)

results(dds, contrast=list("conditionC", c("conditionA","conditionB","conditionD","conditionE")))

Thank you very much for your answer,

Alberto

ADD COMMENT
1
Entering edit mode

hi Alberto,

You can do the one with listValues or also the one in my first post. I would try both and look at plotMA and plotCounts for examples of the most significant genes. You will see how different patterns will show up for the different tests.

The second line you posted doesn't really make sense: this is comparing C vs the sum of A,B,D,E. So you will get rejections of the null which you are not interested in, e.g. if all groups are equal, because in this case C =/= sum(A,B,D,E).

ADD REPLY

Login before adding your answer.

Traffic: 820 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6