Question: Specificity of gene expression
0
4.1 years ago by
Italy
arrigonialberto860 wrote:

I was wondering if it is possible to use the 'contrast' keyword of the 'results' function of DESeq2 in order to select genes that are specifically expressed in a single biological class in cases where we have more than 2 conditions.

In this case I would just calculate contrasts for all the condition and e.g. select genes that are 'upregulated' in a single condition vs all the rest. Would this work? Would I have to correct the results for multiple comparisons again?

e.g., in this experimental design with a factor of three values (conditions):

samples <- read.table("samples.txt",sep="\t",as.is=T)
colnames(samples)<-c("sampleName","fileName","condition")
dds1 <- DESeqDataSetFromHTSeqCount(sampleTable=samples,directory="./",design= ~ condition)
dds1 <- DESeq(dds1) res <- results(dds1)

then I do pairwise comparison:

res.A.B <- results(dds1, contrast=c("condition","A","B"))
res.A.C <- results(dds1, contrast=c("condition","A","C"))
res.B.C <- results(dds1, contrast=c("condition","B","C"))

At this point can I select differentially expresses genes that are specifically expressed in e.g. 'A' only by intersecting the results? I do not need to correct for multiple comparisons? Can this same principle be extended to case where I have e.g. 10 levels to find genes that are specifically expressed in one condition (computationally it is not a great burden, since the 'parallel' option comes to the rescue)?

Alberto

modified 4.1 years ago • written 4.1 years ago by arrigonialberto860
0
4.1 years ago by
Michael Love26k
United States
Michael Love26k wrote:

You can just use

results(dds, name="conditionC")

which can be interpreted as the conditionC effect over a middle point between all groups. (Note that statement is only true when the resultsNames look like: [Intercept, conditionA, conditionB, conditionC], which is what we call in the paper and documentation an "expanded model matrix".)

When there are only two groups (A and B), using name="conditionB" gives you 1/2 the log fold change between B and A (the middle point), so it's not really what anyone would want in that case. But with more groups, using 'name' gets the effect I think you are describing.

Or you can do:

results(dds, contrast=list("conditionC", c("conditionA","conditionB")), listValues=c(1, -1/2))

which gives you the C vs the average of A and B.

0
4.1 years ago by
Italy
arrigonialberto860 wrote:

Dear Mike, sorry to get back on that. Just to clarify (let's say we have 5 conditions) you are suggesting to find genes that are specifically expressed in conditionC by doing (using listValues as follows):

results(dds, contrast=list("conditionC", c("conditionA","conditionB","conditionD","conditionE")), listValues=c(1, -1/4))

I was wondering if this code instead could be used to obtain a more 'stringent' set of specifically expressed genes from my dataset (without listValues)

results(dds, contrast=list("conditionC", c("conditionA","conditionB","conditionD","conditionE")))

Alberto

1

hi Alberto,

You can do the one with listValues or also the one in my first post. I would try both and look at plotMA and plotCounts for examples of the most significant genes. You will see how different patterns will show up for the different tests.

The second line you posted doesn't really make sense: this is comparing C vs the sum of A,B,D,E. So you will get rejections of the null which you are not interested in, e.g. if all groups are equal, because in this case C =/= sum(A,B,D,E).