Question

Specifically expressed genes

0

Entering edit mode

arrigonialberto86 • 0

@arrigonialberto86-7608

Last seen 8.2 years ago

Italy

I have a brief question regarding some analyses we are performing using DESeq2, and I was hoping you could give me an opinion or some suggestions in order to improve our procedure.

Our experimental design features ~80 samples, divided in 19 classes (which correspond to different cellular populations).

Unfortunately, our design is uneven, so that for one condition we have as much as 6 samples (max), and for two conditions there is one sample each.

Our goal is to select genes which are specifically expressed in some of these classes, so that the expression profiles of selected genes are similar to the following distribution density vectors

for selection 1)

"0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1" (dimensionality is 19 as the number of classes)

and for selection 2)

"0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0"

selection 3)

"0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1"

and selection 4)

"0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1"

In the procedure we used for this selection process (in DESeq2) we created a sampletable where we set as TREATED the samples for which the genes had to be 'specifically expressed' and the rest as 'CONTROL'. Of course, if we are selecting for samples in two classes (one constituted by 6 samples and one by 1) the design for the comparison may be unbalanced.

Once we obtained the results (a list of padj-ordered genes for each of the 4 comparison), we calculated Pearson correlation values for each gene and a class-specific reference model (the binary vectors reported above).

We obtained input values for the Pearson correlation calculations by calculating mean values for each one of the 19 cellular populations (from a different DESeq2 session), like this:

normalized.counts <- as.data.frame(counts( dds, normalized=TRUE))

rowMeans(counts(dds,normalized=TRUE)[,dds$condition == populations[1]]

We found an overall good correlation between the pearson coefficients calculated on the samples means of classes and the pvalues assigned to the genes by the DESeq2 for the different (4) classes. We used a threshold of 0.9 for the correlation coefficients in order to identify only genes which are 'specifically' expressed in the classes of our choice.

NOTE: while genes with very low padj (e.g. 10^-20) also have high correlation coefficients with our model distributions, genes with padj of the order 10(^-3/-4) may have non-specific expression (with Pearson correlation values as low as 0.3)

Thank you very much

DESeq2 specific expression rnaseq clustering differential expression • 1.2k views

ADD COMMENT • link updated 9.0 years ago by Michael Love 41k • written 9.0 years ago by arrigonialberto86 • 0

score 1 · Answer 1 · 2015-05-07

1

Entering edit mode

Michael Love 41k

@mikelove

Last seen 15 hours ago

United States

hi Alberto,

rather than dealing with the correlation, I may have a more straightforward suggestion. Say, for selection (3) the p-value from the test is against the null that the expected normalized mean for group [1-17] vs group [18,19] are equal. This is not true when, for example 1-18 is different than 19, although this is not the set of genes you are interested in. Why not test 1-17 vs 18 and 1-17 vs 19, and then take the intersection of the FDR-thresholded sets from both tests?

ADD COMMENT • link 9.0 years ago Michael Love 41k

0

Entering edit mode

thank you Mike!

ADD REPLY • link 9.0 years ago arrigonialberto86 • 0

0

Entering edit mode

thank you Mike!

ADD REPLY • link 9.0 years ago arrigonialberto86 • 0

0

Entering edit mode

thank you Mike!

ADD REPLY • link 9.0 years ago arrigonialberto86 • 0