Question

Weird distribution of p-values, with no significant adjusted p-values?

0

Entering edit mode

seahorsebox828 • 0

@9a6181c0

Last seen 2.4 years ago

Australia

Hello all, I am currently performing analysis in R to find whether there are differentially expressed genes between samples with cancer, and without cancer (benign), as a student project. I am using the HuProt array, and thus have around 48,000 observations per sample. Before using limma, I used t.tests to see whether there were any significant genes. This is how I've done the t.tests:

p.val <- c()
for(i in 1:length(total_635[,1])){
  p.val[i] <- t.test(total_635[i,c(4:22)],total_635[i,c(23:40)])$p.val
}

Each row corresponds to a gene, and columns 4:22 represent the log2-transformed, normalised (normalised between arrays) intensities for the benign samples, and the columns 23:40 represent the log2-transformed, normalised intensities for the cancer samples.

This is the output:

enter image description here

This is a histogram of the p-values:

enter image description here

I then adjusted the p-values using this method:

df_pval635_800$p_value_adj <- p.adjust(df_pval635_800$p_value, "fdr")

This is a summary of the adjusted p-value column:

enter image description here

I'm not sure why there are no significant adjusted p-values, plus the distribution of the unadjusted p-values seems slightly off! Is there any reason why this may be so?

These are the p-values and adjusted p-values after conducting lmFit():

enter image description here

This is the design matrix used, etc.:

Group <- factor(targets$Type, levels=c("Benign","Cancer"))
design <- model.matrix(~0+Group)
colnames(design) <- c("Benign", "Cancer")
total_635sorted <- total_635[order(total_635$ID), ]

total_635n <- total_635sorted[,-c(1,2,3)] #removing Name column, ID column, etc.
correlation <- duplicateCorrelation(total_635n, design, ndups=2, spacing=1)
correlation <- correlation$consensus.correlation

fit <- lmFit(total_635n, design, correlation=correlation, ndups=2, spacing=1)
cont.matrix <- makeContrasts(Cancer-Benign, levels=design)
fit <- contrasts.fit(fit, cont.matrix)
lmfit_800_635 <- eBayes(fit)

Limma doesn't seem to identify any significant adjust p-values either. I'm not sure if this is representative of the data. Any advice would be greatly appreciated!

Thank you.

fdr limmaGUI p-values limma • 1.1k views

ADD COMMENT • link updated 2.5 years ago by James W. MacDonald 68k • written 2.5 years ago by seahorsebox828 • 0

score 1 · Answer 1 · 2022-10-11

1

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 13 hours ago

United States

Deflation of your p-values is usually due to missing covariates in your design. My go-to in this situation is to use the sva package to generate surrogate variables to soak up the extra variability.

Also, don't use regular t-tests. Just use limma.

ADD COMMENT • link 2.5 years ago James W. MacDonald 68k