Question: Specifically expressed genes
gravatar for arrigonialberto86
4.5 years ago by
arrigonialberto860 wrote:

I have a brief question regarding some analyses we are performing using DESeq2, and I was hoping you could give me an opinion or some suggestions in order to improve our procedure.


Our experimental design features ~80 samples, divided in 19 classes (which correspond to different cellular populations). 

Unfortunately, our design is uneven, so that for one condition we have as much as 6 samples (max), and for two conditions there is one sample each.


Our goal is to select genes which are specifically expressed in some of these classes, so that the expression profiles of selected genes are similar to the following distribution density vectors 

for selection 1)

"0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1" (dimensionality is 19 as the number of classes)


and for selection 2)

"0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0"


selection 3)

"0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1"


and selection 4)

"0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1"


In the procedure we used for this selection process (in DESeq2) we created a sampletable where we set as TREATED the samples for which the genes had to be 'specifically expressed' and the rest as 'CONTROL'. Of course, if we are selecting for samples in two classes (one constituted by 6 samples and one by 1) the design for the comparison may be unbalanced.


Once we obtained the results (a list of padj-ordered genes for each of the 4 comparison), we  calculated Pearson correlation values for each gene and a class-specific reference model (the binary vectors reported above).  

We obtained input values for the Pearson correlation calculations by calculating mean values for each one of the 19 cellular populations (from a different DESeq2 session), like this:


normalized.counts <- dds, normalized=TRUE))

rowMeans(counts(dds,normalized=TRUE)[,dds$condition == populations[1]]


We found an overall good correlation between the pearson coefficients calculated on the samples means of classes and the pvalues assigned to the genes by the DESeq2 for the different (4) classes. We used a threshold of 0.9 for the correlation coefficients in order to identify only genes which are 'specifically' expressed in the classes of our choice. 


NOTE: while genes with very low padj (e.g. 10^-20) also have high correlation coefficients with our model distributions, genes with padj of the order 10(^-3/-4) may have non-specific expression (with Pearson correlation values as low as 0.3) 


Thank you very much

ADD COMMENTlink modified 4.5 years ago by Michael Love26k • written 4.5 years ago by arrigonialberto860
Answer: Specifically expressed genes
gravatar for Michael Love
4.5 years ago by
Michael Love26k
United States
Michael Love26k wrote:

hi Alberto,

rather than dealing with the correlation, I may have a more straightforward suggestion. Say, for selection (3) the p-value from the test is against the null that the expected normalized mean for group [1-17] vs group [18,19] are equal. This is not true when, for example 1-18 is different than 19, although this is not the set of genes you are interested in. Why not test 1-17 vs 18 and 1-17 vs 19, and then take the intersection of the FDR-thresholded sets from both tests?

ADD COMMENTlink written 4.5 years ago by Michael Love26k

thank you Mike!

ADD REPLYlink written 4.5 years ago by arrigonialberto860

thank you Mike!

ADD REPLYlink written 4.5 years ago by arrigonialberto860

thank you Mike!

ADD REPLYlink written 4.5 years ago by arrigonialberto860
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 227 users visited in the last hour