Specifically expressed genes
1
0
Entering edit mode
@arrigonialberto86-7608
Last seen 8.2 years ago
Italy

I have a brief question regarding some analyses we are performing using DESeq2, and I was hoping you could give me an opinion or some suggestions in order to improve our procedure.

 

Our experimental design features ~80 samples, divided in 19 classes (which correspond to different cellular populations). 

Unfortunately, our design is uneven, so that for one condition we have as much as 6 samples (max), and for two conditions there is one sample each.

 

Our goal is to select genes which are specifically expressed in some of these classes, so that the expression profiles of selected genes are similar to the following distribution density vectors 

for selection 1)

"0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1" (dimensionality is 19 as the number of classes)

 

and for selection 2)

"0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0"

 

selection 3)

"0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1"

 

and selection 4)

"0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1"

 

In the procedure we used for this selection process (in DESeq2) we created a sampletable where we set as TREATED the samples for which the genes had to be 'specifically expressed' and the rest as 'CONTROL'. Of course, if we are selecting for samples in two classes (one constituted by 6 samples and one by 1) the design for the comparison may be unbalanced.

 

Once we obtained the results (a list of padj-ordered genes for each of the 4 comparison), we  calculated Pearson correlation values for each gene and a class-specific reference model (the binary vectors reported above).  

We obtained input values for the Pearson correlation calculations by calculating mean values for each one of the 19 cellular populations (from a different DESeq2 session), like this:

 

normalized.counts <- as.data.frame(counts( dds, normalized=TRUE))

rowMeans(counts(dds,normalized=TRUE)[,dds$condition == populations[1]]

 

We found an overall good correlation between the pearson coefficients calculated on the samples means of classes and the pvalues assigned to the genes by the DESeq2 for the different (4) classes. We used a threshold of 0.9 for the correlation coefficients in order to identify only genes which are 'specifically' expressed in the classes of our choice. 

 

NOTE: while genes with very low padj (e.g. 10^-20) also have high correlation coefficients with our model distributions, genes with padj of the order 10(^-3/-4) may have non-specific expression (with Pearson correlation values as low as 0.3) 

 

Thank you very much

DESeq2 specific expression rnaseq clustering differential expression • 1.2k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 15 hours ago
United States

hi Alberto,

rather than dealing with the correlation, I may have a more straightforward suggestion. Say, for selection (3) the p-value from the test is against the null that the expected normalized mean for group [1-17] vs group [18,19] are equal. This is not true when, for example 1-18 is different than 19, although this is not the set of genes you are interested in. Why not test 1-17 vs 18 and 1-17 vs 19, and then take the intersection of the FDR-thresholded sets from both tests?

ADD COMMENT
0
Entering edit mode

thank you Mike!

ADD REPLY
0
Entering edit mode

thank you Mike!

ADD REPLY
0
Entering edit mode

thank you Mike!

ADD REPLY

Login before adding your answer.

Traffic: 471 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6