Question

How do I fix an odd voom plot in a combined dataset?

1

Entering edit mode

1sunmic2 • 0

@4b974839

Last seen 29 days ago

Canada

Hi Everyone, I'm having a bit of trouble with my voom normalization as the mean-varience plot looks extremely off. As reference, here is an image: my voom plot which as you can see, looks like a fish

For context, my dataset is a merged dataset, here is my code for my dataset:

    pan_gene_reads$gene <- pan_gene_reads$Name
STARcounts$gene <- STARcounts$Ensembl_ID
Target_gene_exp_count$gene <- Target_gene_exp_count$sample

pan_gene_reads$gene <- gsub("\\.\\d+$", "", pan_gene_reads$gene)  # Remove version numbers
STARcounts$gene <- gsub("\\.\\d+$", "", STARcounts$gene)
Target_gene_exp_count$gene <- gsub("\\.\\d+$", "", Target_gene_exp_count$gene)

combined_data_counts <- merge(pan_gene_reads, STARcounts, by = "gene", all = FALSE)
combined_data_counts <- merge(combined_data_counts, Target_gene_exp_count, by = "gene", all = FALSE)

gene_names_counts <- combined_data_counts[,1:3]
combined_data_counts$gene
combined_data_counts <- combined_data_counts[, -c(1:3)]
combined_data_counts <- combined_data_counts[, -363]
combined_data_counts <- combined_data_counts[, -546]

And here is the code for my voom normalization:

    dge1 <- DGEList(counts = combined_data_counts)

 keep <- rowSums(cpm(dge1) > 1) >= 2
 d1 <- dge1[keep, , keep.lib.sizes = FALSE]
dim(d1)

dge1 <- calcNormFactors(dge1)

dge1$samples$norm.factors


design1 <- model.matrix(~1, data = dge1$samples)


voom_data1 <- voom(d1, design = design1, plot = TRUE)

Is this plot or code bad? If so, how can I fix it? I've tried removing batch effects, which doesn't work since voom doesn't accept negative values, and further filtering, which doesn't change the plot.

limma Normalization voom • 664 views

ADD COMMENT • link updated 9 weeks ago by James W. MacDonald 68k • written 10 weeks ago by 1sunmic2 • 0

score 2 · Answer 1 · 2025-01-01

2

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 5 hours ago

WEHI, Melbourne, Australia

voom is a DE analysis method rather than a normalization method. voom expects the design matrix to be the same complete design matrix that you will use for the DE analysis, including all your experimental factors as well as any important covariates and batch effects. It is not correct to simply replace the design matrix with an intercept column.

Batch effects are handled by including the batch variables in the design matrix, not by changing the counts.

Having said all that, it is very difficult to handle merged datasets. Have you tried doing a limma-voom analysis on the individual datasets before merging?

ADD COMMENT • link 10 weeks ago Gordon Smyth 52k

1

Entering edit mode

Thanks for responding! I did change the code slightly:

dge1 <- DGEList(counts = combined_data_counts)


keep <- rowSums(cpm(dge1) > 3) >= 5
 d1 <- dge1[keep, , keep.lib.sizes = FALSE]
dim(d1)

dge1 <- calcNormFactors(dge1)

dge1$samples$norm.factors

group <- factor(c(rep("healthy", 362), rep("cancer", 917)))
batch <- factor(c(rep("healthy", 362), rep("CancerTCGA", 183), rep("CancerTARGET", 734)))

design1 <- model.matrix(~0 + group)

voom_data1 <- voom(d1, design = design1, plot = TRUE)

And ended up with a slightly better plot: enter image description here

When I include batch variables in the design matrix, A) It doesn't change the plot B) I receive a warning saying partial NA coefficients for 52764 probes Any ideas how to fix this? Any help would be appreciated, thanks!

ADD REPLY • link 9 weeks ago 1sunmic2 • 0

1

Entering edit mode

You will get NA coefficients from ~Group+Batch because the Group factor is completely confounded with the batch factor.

ADD REPLY • link 9 weeks ago Gordon Smyth 52k

1

Entering edit mode

So I changed it to ~0 + group (with the same plot), but I still don't know if it's good or what to fix if not.

ADD REPLY • link 9 weeks ago 1sunmic2 • 0

0

Entering edit mode

It looks more as you might expect, although it doesn't appear that you are filtering many of the low expressing genes out. Normally you should expect some truncation of the genes with a logCPM < 1 or so.

ADD REPLY • link 9 weeks ago James W. MacDonald 68k