Question

Non-used samples changing number of significant genes in limma

0

Entering edit mode

Jake ▴ 90

@jake-7236

Last seen 2.3 years ago

United States

Hi,

I performed polysome profiling to measure translation rate in wildtype and knockout cells. For the polysome samples I collected a monosome fraction, a low polysome fraction, and a high polysome fraction. I also collected mRNA for both wildtype and knockout. I'm interested in comparing translation changes between knockout and wildtype (and most interested in the change in high polysome relative to changes in mRNA).

I'm using limma to find differences in the ratio of (ko high / ko mRNA) / (wt high / wt mRNA). I initially input all of my samples (mRNA, monosome, low, high) into limma and only pulled out the ratio that I was interested in. However, I noticed that the samples I put in changes the number of significant genes a lot even when I only pull out the ko vs wt high vs mRNA ratio:

input samples:	# significant changes (ko high / ko mRNA) / (wt high / wt mRNA):
mRNA/mono/low/high	643
mRNA/mono/high	1176
mRNA/low/high	1186
mRNA/high	1228

I know that voom takes the variation across samples into account is that the reason that I'm losing significance the more sample types that I put in? Looking at the samples I don't think there are any outliers that are increasing overall variance.

This is the code that I'm using:

layout <- data.frame(row.names = colnames(count_mat),
                     type = colnames(count_mat),
                     condition = colnames(count_mat),
                     stringsAsFactors = F)
layout[grep('high', colnames(count_mat)), ]$type <- 'high'
layout[grep('low', colnames(count_mat)), ]$type <- 'low'
layout[grep('mono', colnames(count_mat)), ]$type <- 'mono'
layout[grep('mRNA', colnames(count_mat)), ]$type <- 'mRNA'
layout[grep('ko', colnames(count_mat)), ]$condition <- 'ko'
layout[grep('wt', colnames(count_mat)), ]$condition <- 'wt'
limmaModel <- paste(layout$condition, layout$type, sep = '.')
limmaModel <- factor(limmaModel)
print(layout)

design <- model.matrix(~0 + limmaModel)
colnames(design) <- levels(limmaModel)
rownames(design) <- rownames(layout)

# Filter low read counts
count_mat_filtered <- count_mat[rowSums(cpm(count_mat) > 5) >= 3, ]

# Normalize to log2(CPM + 0.5) with voom
d <- DGEList(count_mat_filtered, genes = rownames(count_mat_filtered))
d <- calcNormFactors(d)
v <- voom(d, design, plot = T)

# Differential expression
fit <- lmFit(v, design)
cont.matrix <- makeContrasts(
  interaction_high_mrna = (ko.high - ko.mRNA) - (wt.high - wt.mRNA),
  levels = design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2)

interaction_high_mrna <- topTable(fit2, coef = 'interaction_high_mrna', number = nrow(count_mat_filtered))

sum(interaction_high_mrna$adj.P.Val < 0.05)

R version 3.4 edgeR_3.18.1 limma_3.32.9

limma voom • 1.2k views

ADD COMMENT • link 7.2 years ago Jake ▴ 90

score 0 · Answer 1 · 2017-10-24

0

Entering edit mode

Ryan C. Thompson ★ 7.9k

@ryan-c-thompson-5618

Last seen 9 weeks ago

Icahn School of Medicine at Mount Sinai…

Remember that limma is estimating a single value of the variance for each gene based on all the samples that are provided to it. This same variance value is then used for all tests involving that fitted model. Of course adding more samples also increases the precision of the estimated variance, which should improve your statistical power, all else being equal. If adding in more samples is causing you to get fewer significant genes, the most likely explanation is that the samples you added in have a higher variance than the samples that were already in the model. This should show up clearly on the voomWithQualityWeights plot. To make this more obvious, you could force voomWithQualityWeights to estimate a single weight for each group by passing your design as the var.design argument (in addition to the regular design). Keep in mind that a higher variance should result in a lower weight, so you should be looking for the mono and low groups to have low weights.

ADD COMMENT • link 7.2 years ago Ryan C. Thompson ★ 7.9k

0

Entering edit mode

I tried it with both voomWithQualityWeights and voomWithQualityWeights(var.design = design). This is what the plot looks like for the second one: https://imgur.com/HDhOUhQ. The order is:

ko 1 high
ko 1 low
ko 1 mono
ko 2 high
ko 2 low
ko 2 mono
wt 1 high
wt 1 low
wt 1 mono
wt 2 high
wt 2 low
wt 2 mono
ko 1 mRNA
ko 2 mRNA
ko 3 mRNA
ko 4 mRNA
wt 1 mRNA
wt 2 mRNA
wt 3 mRNA

In this case would you recommend just inputing the samples that I am interested in directly comparing?

Thanks

ADD REPLY • link 7.2 years ago Jake ▴ 90

0

Entering edit mode

This plot doesn't seem to show the low or mono samples as having any worse quality than the high samples, although the mRNA samples seem to have a higher average quality than almost anything else. And even if they did, I would expect the quality weights to account for that. I'm afraid I don't know enough to diagnose the problem based on this plot.

However, taking a closer look at the counts of significant genes in the table you gave, the only real difference is in the top row. The difference between 1176 genes and 1228 is negligible. So adding in either the mono or the low samples has a minimal effect, but adding both of them in the same model has a drastic effect. This seems very counterintuitive, and if I saw this effect in my data, I would suspect that I made a mistake coding my model for the first row and would double-check my code. For example, you could have accidentally treated low and mono as a single group instead of 2 separate groups in the design, thus inflating the variance estimates. I doubt that's the exact answer, though, since your quality weight plot clearly shows different weights for those two groups.

ADD REPLY • link 7.2 years ago Ryan C. Thompson ★ 7.9k