Correlation between CpG methylation and gene expression from DESeq2-> Normalization of counts using VST + fpkm/rpkm for optimal normalization?
Entering edit mode
Last seen 20 days ago
United States

As the title states, I've got some RRBS data and am looking to evaluate a correlation between gene expression and CpG island methylation.

In prepping the expression data (from a completed DESeq2 run), I'm planning to normalize the counts with the variance stabilized transformation (VST) prior to exporting and moving forward. However as I understand, the VST accounts for the library's size factors & inter-sample count variance BUT does not normalize for the feature length. Is this indeed the case? And if so, am I right to conclude that these should be normalized using fpkm or an analogous method before being used for downstream analysis?

I've been reading the documentation + source code as well as previous related answers but I'm still not 100% sure. The question seems silly as the only recommendations prior to exporting these data made in the vignettes and in previous questions allude to the rlog or VST transformations, but I wanted to be certain and don't have a good bioinformatics mentor to ask.

I sincerely appreciate your input.

DESeq2 • 143 views
Entering edit mode
Last seen 20 minutes ago
United States

In this case, can you say why you want to normalize for gene length? In our tximport/tximeta pipeline, it would correct for differential gene length (e.g. if effective gene length is changing across samples) but we don't need to divide out a common gene length factor from the entire row. It wouldn't affect a correlation anyway, which is scale invariant.


Login before adding your answer.

Traffic: 483 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6