Question

Normalize for changes in one direction

1

Entering edit mode

Jake ▴ 90

@jake-7236

Last seen 2.3 years ago

United States

Hi,

I'm trying to measure RNA stability changes after knocking out a repressor RNA binding protein. I'm using 4sU to label nascent RNA and also collecting total RNA as per this paper https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3114636/. I can then calculate RNA stability by the ratio of nascent RNA/total RNA. I'm using the interaction in limma to look at changes in RNA stability between WT and KO cells. The issue that I have run into is that it looks like some genes are stabilized and some genes are destabilized. However, I think this is just due to the relative nature of RNA-seq where once some genes are stabilized, other genes now take up less of the sequencing pool so look destabilized. I have validated with an alternative low throughput method based on qPCR that only the stabilized gene changes are real. I was wondering if people had suggestions for alternative ways to normalize the data so that the sequencing would only show the stabilized genes? Unfortunately external ERCC spikes have been noisy and not worked well for normalization.

Thanks

d <- DGEList(count_mat_filtered, genes = rownames(count_mat_filtered))
d <- calcNormFactors(d)
v <- voom(d, design, plot = T)

# Differential expression
fit <- lmFit(v, design)
cont.matrix <- makeContrasts(
  mRNA = KO.total - wt.total,
  su = KO.su - wt.su,
  interaction = (KO.total - KO.su) - (wt.total - wt.su),
  stability_wt = wt.total - wt.su,
  stability_KO = KO.total - KO.su,
  levels = design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2)
all_fc <- topTable(fit2, number = nrow(count_mat_filtered))
mRNA_results <- topTable(fit2, coef = 1, number = nrow(count_mat_filtered), confint = T)
su_results <- topTable(fit2, coef = 2, number = nrow(count_mat_filtered), confint = T)
interaction_results <- topTable(fit2, coef = 3, number = nrow(count_mat_filtered), confint = T)
stability_wt <- topTable(fit2, coef = 4, number = nrow(count_mat_filtered), confint = T)
stability_KO <- topTable(fit2, coef = 5, number = nrow(count_mat_filtered), confint = T)

limma voom rnaseq • 829 views

ADD COMMENT • link updated 6.8 years ago by Aaron Lun ★ 28k • written 6.8 years ago by Jake ▴ 90

0

Entering edit mode

I would say the first thing to do is to make an MA plot of your data. If it shows a bimodal distribution along the vertical (logFC) axis, then you might have some issues with the normalization, especially if neither mode is centered on zero.

ADD REPLY • link 6.8 years ago Ryan C. Thompson ★ 7.9k

score 1 · Answer 1 · 2018-02-20

As you already may be aware, the TMM normalization in calcNormFactors will correct for composition effects, assuming that most genes are not DE. For TMM normalization to be inappropriate, you'd have to have DE in most genes after knockdown of your protein. This tends to be a rather extreme situation, but I'm not familiar enough with RNA binding proteins to tell.

Moreover, I am not convinced by the use of qPCR to show that the DE genes from sequencing are not "real" (at least, in the context of improper normalization). Every qPCR protocol I've seen depends on normalization with a presumed-constant house-keeping gene like GADPH or TUBA. If your knockdown is having a global effect on the stability of the transcriptome, who's to say that your house-keeping genes are not also affected?

That being said, if you are able to define a set of genes - say, 100-1000 - that you know are not affected by your knockdown, then you have solved your own problem. Just run calcNormFactors on that subset of genes and use the resulting normalization factors with the full set of genes. (Do not set keep.lib.sizes=FALSE during subsetting, though, as this will render the normalization factors incompatible with the DGEList containing the full set of genes.)

If you can't define a set of constant genes... well, use of external spike-ins is the standard approach to this problem, provided the ratio of spike-in RNA to cells is the same across all samples. It's unfortunate that they haven't worked in your case; we find that ERCCs work pretty well for normalizing single-cell RNA-seq data, and it's hard to imagine a noisier (transcriptomic) assay than that.