Weird distribution of p-values, with no significant adjusted p-values?
1
0
Entering edit mode
@9a6181c0
Last seen 2.1 years ago
Australia

Hello all, I am currently performing analysis in R to find whether there are differentially expressed genes between samples with cancer, and without cancer (benign), as a student project. I am using the HuProt array, and thus have around 48,000 observations per sample. Before using limma, I used t.tests to see whether there were any significant genes. This is how I've done the t.tests:

p.val <- c()
for(i in 1:length(total_635[,1])){
  p.val[i] <- t.test(total_635[i,c(4:22)],total_635[i,c(23:40)])$p.val
}

Each row corresponds to a gene, and columns 4:22 represent the log2-transformed, normalised (normalised between arrays) intensities for the benign samples, and the columns 23:40 represent the log2-transformed, normalised intensities for the cancer samples.

This is the output:

enter image description here

This is a histogram of the p-values:

enter image description here

I then adjusted the p-values using this method:

df_pval635_800$p_value_adj <- p.adjust(df_pval635_800$p_value, "fdr")

This is a summary of the adjusted p-value column:

enter image description here

I'm not sure why there are no significant adjusted p-values, plus the distribution of the unadjusted p-values seems slightly off! Is there any reason why this may be so?

These are the p-values and adjusted p-values after conducting lmFit():

enter image description here

This is the design matrix used, etc.:

Group <- factor(targets$Type, levels=c("Benign","Cancer"))
design <- model.matrix(~0+Group)
colnames(design) <- c("Benign", "Cancer")
total_635sorted <- total_635[order(total_635$ID), ]

total_635n <- total_635sorted[,-c(1,2,3)] #removing Name column, ID column, etc.
correlation <- duplicateCorrelation(total_635n, design, ndups=2, spacing=1)
correlation <- correlation$consensus.correlation

fit <- lmFit(total_635n, design, correlation=correlation, ndups=2, spacing=1)
cont.matrix <- makeContrasts(Cancer-Benign, levels=design)
fit <- contrasts.fit(fit, cont.matrix)
lmfit_800_635 <- eBayes(fit)

Limma doesn't seem to identify any significant adjust p-values either. I'm not sure if this is representative of the data. Any advice would be greatly appreciated!

Thank you.

fdr limmaGUI p-values limma • 991 views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 15 hours ago
United States

Deflation of your p-values is usually due to missing covariates in your design. My go-to in this situation is to use the sva package to generate surrogate variables to soak up the extra variability.

Also, don't use regular t-tests. Just use limma.

ADD COMMENT

Login before adding your answer.

Traffic: 721 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6