Question

log2 fold change calculation for DE gene analysis

0

Entering edit mode

sup230 ▴ 30

@sup230-13286

Last seen 8.3 years ago

Hi,

I have questions that are not a part of typical RNAseq work flow so I would like lots of your input/help!

I used DESeq2 to get the full list of DEgenes with their respective log2fc between the two phenotype groups in my data. The metadata are TCGA-LGG samples and I am interested in phenotype of seizure history. The two groups I am interested in are 177 samples in no seizure history group and 298 samples in yes seizure history group. After getting log2fc and converting to a metric, I continued to GSEA preranked tool to get a list of enriched gene sets between the groups.

And now based on this result of enriched gene sets, I am trying to assign some numeric value for a particular gene set to each individual. For example, if I have a gene set 'calcium signaling pathway' enriched in seizure history yes group, I would like to take one individual from the group, calculate log2 fold change against all individuals in no seizure history group. In other words, I am calculating log2fold change of gene expression between one sample from one group and all samples in the other group. Then I would like to get the mean of log2fc only among the genes that are found in the gene set 'calcium signaling pathway'(m), get the mean and standard deviation of log2fc for all genes detected (M and S respectively), and get z-score using (m-M)/(S/sqrt(n)) where n denotes the number of genes in the gene set. High z-score would mean that that particular sample are highly enriched for that gene set. I would like to do this for all individuals in the sz-yes group and try to find any correlation with other clinical features.

My main question is whether I can calculate log2fc using one sample from one group and all samples in the other group without seriously distorting the previous analysis /workflow (initial DESeq2 to GSEA)? I understand the log2fc from DESeq2 is not the simple ratio of normalized counts, but I am not sure if there is any closed form equation to give shrunken estimate of log2fc as in DESeq2 result..One way I can think of is to make separate countdata and metadata that includes one sample from one group and all samples in the other group, and run DESeq2 to get log2fc, but also not sure if this would not interfere with previous analysis.

Looking forward to any inputs!

Thank you!

deseq2 L2FC log2fc • 7.6k views

ADD COMMENT • link updated 8.5 years ago by Michael Love 43k • written 8.5 years ago by sup230 ▴ 30

score 1 · Answer 1 · 2017-08-10

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

What I would suggest is something like the microarray barcode idea.

First, perform a VST or rlog transformation of the counts. Then take the row subset of the transformed data based on the gene-set. Then you could just compute the column sums of this sub-matrix, to give an overall score for the gene set (you could also take a weighted sum, if you have per-gene weights for the gene set). Now you have a score for each sample for the gene set. You can calculate the mean and standard deviation of the control group, and use these values to standardize all the scores. So z_i = score_i - mean(score_j : j in control) / sd(score_j : j in control). Does this sound like what you are after?

ADD COMMENT • link 8.5 years ago Michael Love 43k

0

Entering edit mode

Yes! This sounds like a much more efficient and right way. I have a couple followup questions though.

When I calculate the mean and standard deviation for the control group, I would only include the genes in a particular gene set, because higher z-score would still mean that gene set is more enriched in that sample compared to the control, right? But in this case, would the z-scores from two gene sets be comparable?

I am still not completely clear what VST does. Can I consider this as a form of normalized estimate with stabilized variance? What are the intermediate steps between VST and getting log2fc?

Thank you so much!

ADD REPLY • link 8.5 years ago sup230 ▴ 30

0

Entering edit mode

Right (only mean and sd from that gene set).

z-scores are always comparable. So I'd say yes to comparing across gene sets.

for more details on VST, take a look at the transformation section in the vignette or the workflow (linked from top of vignette)

ADD REPLY • link 8.5 years ago Michael Love 43k