Question: log2 fold change calculation for DE gene analysis
gravatar for sup230
11 months ago by
sup23010 wrote:


I have questions that are not a part of typical RNAseq work flow so I would like lots of your input/help!

I used DESeq2 to get the full list of DEgenes with their respective log2fc between the two phenotype groups in my data. The metadata are TCGA-LGG samples and I am interested in phenotype of seizure history. The two groups I am interested in are 177 samples in no seizure history group and 298 samples in yes seizure history group. After getting log2fc and converting to a metric, I continued to GSEA preranked tool to get a list of enriched gene sets between the groups.

And now based on this result of enriched gene sets, I am trying to assign some numeric value for a particular gene set to each individual. For example, if I have a gene set 'calcium signaling pathway' enriched in seizure history yes group, I would like to take one individual from the group, calculate log2 fold change against all individuals in no seizure history group. In other words, I am calculating log2fold change of gene expression between one sample from one group and all samples in the other group. Then I would like to get the mean of log2fc only among the genes that are found in the gene set 'calcium signaling pathway'(m), get the mean and standard deviation of log2fc for all genes detected (M and S respectively), and get z-score using (m-M)/(S/sqrt(n)) where n denotes the number of genes in the gene set. High z-score would mean that that particular sample are highly enriched for that gene set. I would like to do this for all individuals in the sz-yes group and try to find any correlation with other clinical features. 

My main question is whether I can calculate log2fc using one sample from one group and all samples in the other group without seriously distorting the previous analysis /workflow (initial DESeq2 to GSEA)? I understand the log2fc from DESeq2 is not the simple ratio of normalized counts, but I am not sure if there is any closed form equation to give shrunken estimate of log2fc as in DESeq2 result..One way I can think of is to make separate countdata and metadata that includes one sample from one group and all samples in the other group, and run DESeq2 to get log2fc, but also not sure if this would not interfere with previous analysis.

Looking forward to any inputs!

Thank you!

ADD COMMENTlink modified 11 months ago by Michael Love18k • written 11 months ago by sup23010
gravatar for Michael Love
11 months ago by
Michael Love18k
United States
Michael Love18k wrote:

What I would suggest is something like the microarray barcode idea. 

First, perform a VST or rlog transformation of the counts. Then take the row subset of the transformed data based on the gene-set. Then you could just compute the column sums of this sub-matrix, to give an overall score for the gene set (you could also take a weighted sum, if  you have per-gene weights for the gene set). Now you have a score for each sample for the gene set. You can calculate the mean and standard deviation of the control group, and use these values to standardize all the scores. So z_i = score_i - mean(score_j : j in control) / sd(score_j : j in control). Does this sound like what you are after?

ADD COMMENTlink written 11 months ago by Michael Love18k

Yes! This sounds like a much more efficient and right way. I have a couple followup questions though. 

When I calculate the mean and standard deviation for the control group, I would only include the genes in a particular gene set, because higher z-score would still mean that gene set is more enriched in that sample compared to the control, right? But in this case, would the z-scores from two gene sets be comparable?  

I am still not completely clear what VST does. Can I consider this as a form of normalized estimate with stabilized variance? What are the intermediate steps between VST and getting log2fc? 

Thank you so much!

ADD REPLYlink written 11 months ago by sup23010

Right (only mean and sd from that gene set).

z-scores are always comparable. So I'd say yes to comparing across gene sets.

for more details on VST, take a look at the transformation section in the vignette or the workflow (linked from top of vignette) 


ADD REPLYlink written 11 months ago by Michael Love18k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 222 users visited in the last hour