Hi,
I performed polysome profiling to measure translation rate in wildtype and knockout cells. For the polysome samples I collected a monosome fraction, a low polysome fraction, and a high polysome fraction. I also collected mRNA for both wildtype and knockout. I'm interested in comparing translation changes between knockout and wildtype (and most interested in the change in high polysome relative to changes in mRNA).
I'm using limma to find differences in the ratio of (ko high / ko mRNA) / (wt high / wt mRNA). I initially input all of my samples (mRNA, monosome, low, high) into limma and only pulled out the ratio that I was interested in. However, I noticed that the samples I put in changes the number of significant genes a lot even when I only pull out the ko vs wt high vs mRNA ratio:
input samples: | # significant changes (ko high / ko mRNA) / (wt high / wt mRNA): |
mRNA/mono/low/high | 643 |
mRNA/mono/high | 1176 |
mRNA/low/high | 1186 |
mRNA/high | 1228 |
I know that voom takes the variation across samples into account is that the reason that I'm losing significance the more sample types that I put in? Looking at the samples I don't think there are any outliers that are increasing overall variance.
This is the code that I'm using:
layout <- data.frame(row.names = colnames(count_mat), type = colnames(count_mat), condition = colnames(count_mat), stringsAsFactors = F) layout[grep('high', colnames(count_mat)), ]$type <- 'high' layout[grep('low', colnames(count_mat)), ]$type <- 'low' layout[grep('mono', colnames(count_mat)), ]$type <- 'mono' layout[grep('mRNA', colnames(count_mat)), ]$type <- 'mRNA' layout[grep('ko', colnames(count_mat)), ]$condition <- 'ko' layout[grep('wt', colnames(count_mat)), ]$condition <- 'wt' limmaModel <- paste(layout$condition, layout$type, sep = '.') limmaModel <- factor(limmaModel) print(layout) design <- model.matrix(~0 + limmaModel) colnames(design) <- levels(limmaModel) rownames(design) <- rownames(layout) # Filter low read counts count_mat_filtered <- count_mat[rowSums(cpm(count_mat) > 5) >= 3, ] # Normalize to log2(CPM + 0.5) with voom d <- DGEList(count_mat_filtered, genes = rownames(count_mat_filtered)) d <- calcNormFactors(d) v <- voom(d, design, plot = T) # Differential expression fit <- lmFit(v, design) cont.matrix <- makeContrasts( interaction_high_mrna = (ko.high - ko.mRNA) - (wt.high - wt.mRNA), levels = design) fit2 <- contrasts.fit(fit, cont.matrix) fit2 <- eBayes(fit2) interaction_high_mrna <- topTable(fit2, coef = 'interaction_high_mrna', number = nrow(count_mat_filtered)) sum(interaction_high_mrna$adj.P.Val < 0.05)
R version 3.4 edgeR_3.18.1 limma_3.32.9
I tried it with both voomWithQualityWeights and voomWithQualityWeights(var.design = design). This is what the plot looks like for the second one: https://imgur.com/HDhOUhQ. The order is:
In this case would you recommend just inputing the samples that I am interested in directly comparing?
Thanks
This plot doesn't seem to show the low or mono samples as having any worse quality than the high samples, although the mRNA samples seem to have a higher average quality than almost anything else. And even if they did, I would expect the quality weights to account for that. I'm afraid I don't know enough to diagnose the problem based on this plot.
However, taking a closer look at the counts of significant genes in the table you gave, the only real difference is in the top row. The difference between 1176 genes and 1228 is negligible. So adding in either the mono or the low samples has a minimal effect, but adding both of them in the same model has a drastic effect. This seems very counterintuitive, and if I saw this effect in my data, I would suspect that I made a mistake coding my model for the first row and would double-check my code. For example, you could have accidentally treated low and mono as a single group instead of 2 separate groups in the design, thus inflating the variance estimates. I doubt that's the exact answer, though, since your quality weight plot clearly shows different weights for those two groups.