Question

How do I fix an odd voom plot in a combined dataset?

1

Entering edit mode

1sunmic2 • 0

@4b974839

Last seen 2 days ago

Canada

Hi Everyone, I'm having a bit of trouble with my voom normalization as the mean-varience plot looks extremely off. As reference, here is an image: my voom plot which as you can see, looks like a fish

For context, my dataset is a merged dataset, here is my code for my dataset:

    pan_gene_reads$gene <- pan_gene_reads$Name
STARcounts$gene <- STARcounts$Ensembl_ID
Target_gene_exp_count$gene <- Target_gene_exp_count$sample

pan_gene_reads$gene <- gsub("\\.\\d+$", "", pan_gene_reads$gene)  # Remove version numbers
STARcounts$gene <- gsub("\\.\\d+$", "", STARcounts$gene)
Target_gene_exp_count$gene <- gsub("\\.\\d+$", "", Target_gene_exp_count$gene)

combined_data_counts <- merge(pan_gene_reads, STARcounts, by = "gene", all = FALSE)
combined_data_counts <- merge(combined_data_counts, Target_gene_exp_count, by = "gene", all = FALSE)

gene_names_counts <- combined_data_counts[,1:3]
combined_data_counts$gene
combined_data_counts <- combined_data_counts[, -c(1:3)]
combined_data_counts <- combined_data_counts[, -363]
combined_data_counts <- combined_data_counts[, -546]

And here is the code for my voom normalization:

    dge1 <- DGEList(counts = combined_data_counts)

 keep <- rowSums(cpm(dge1) > 1) >= 2
 d1 <- dge1[keep, , keep.lib.sizes = FALSE]
dim(d1)

dge1 <- calcNormFactors(dge1)

dge1$samples$norm.factors


design1 <- model.matrix(~1, data = dge1$samples)


voom_data1 <- voom(d1, design = design1, plot = TRUE)

Is this plot or code bad? If so, how can I fix it? I've tried removing batch effects, which doesn't work since voom doesn't accept negative values, and further filtering, which doesn't change the plot.

limma Normalization voom • 125 views

ADD COMMENT • link updated 2 days ago by Gordon Smyth 52k • written 2 days ago by 1sunmic2 • 0

score 0 · Answer 1 · 2025-01-01

voom is a DE analysis method rather than a normalization method. voom expects the design matrix to be the same complete design matrix that you will use for the DE analysis, including all your experimental factors as well as any important covariates and batch effects. It is not correct to simply replace the design matrix with an intercept column.

Batch effects are handled by including the batch variables in the design matrix, not by changing the counts.

Having said all that, it is very difficult to handle merged datasets. Have you tried doing a limma-voom analysis on the individual datasets before merging?