Hello, Would someone please help me verify if i'm on the right track in correcting for technical replicates using limma and also if there is a better approach to use (duplicateCorrelation() vs lmer()) for my protein expression data.
I've got 8 disease patients per group (DiseaseA, DiseaseB). There are 2 biological samples(_1,_2) with 4 technical replicates each (rep1-4).
Quite confused on what to include in the block when handling technical replicates - is it the column - patientID or replicates_1 or replicates_2 below. Please note I have manually created replicate_2 column
> sample.sheet
group patientID replicates_1 replicates_2
258736519273_1 Disease_A Disease_A_1 1 1
258736519272_4 Disease_A Disease_A_1 2 2
258736519276_1 Disease_A Disease_A_1 3 3
258736519273_6 Disease_A Disease_A_1 4 4
258736519273_2 Disease_A Disease_A_2 1 1
258736519272_1 Disease_A Disease_A_2 2 2
258736519275_3 Disease_A Disease_A_2 3 3
258736519272_8 Disease_A Disease_A_2 4 4
258736519273_5 Disease_B Disease_B_1 1 5
258736519273_3 Disease_B Disease_B_1 2 6
258736519274_2 Disease_B Disease_B_1 3 7
258736519275_7 Disease_B Disease_B_1 4 8
258736519274_3 Disease_B Disease_B_2 1 5
258736519274_7 Disease_B Disease_B_2 2 6
258736519275_1 Disease_B Disease_B_2 3 7
258736519272_2 Disease_B Disease_B_2 4 8
258736519273_8 Healthy Healthy_1 1 9
258736519273_4 Healthy Healthy_1 2 10
258736519272_5 Healthy Healthy_1 3 11
258736519274_4 Healthy Healthy_1 4 12
258736519274_8 Healthy Healthy_2 1 9
258736519276_3 Healthy Healthy_2 2 10
258736519272_7 Healthy Healthy_2 3 11
258736519274_5 Healthy Healthy_2 4 12
group <- factor(sample.sheet$group);
patient <- factor(sample.sheet$patientID); **# unsure whether to use patientID or replicates_1 or replicates_2**
#> table(sample.sheet$group)
#Disease_A Disease_B Healthy
# 8 8 8
#> table(sample.sheet$patientID)
#Disease_A_1 Disease_A_2 Disease_B_1 Disease_B_2 Healthy_1 Healthy_2
# 4 4 4 4 4 4
# Create a design matrix for your comparison (e.g., control vs treatment)
design <- model.matrix(~0 + group);
colnames(design) <- gsub('group', '', colnames(design));
# Use duplicateCorrelation to estimate within-patient correlation
corfit <- duplicateCorrelation(exp.data.subset, design, block = patient);
fit <- lmFit(
exp.data.subset,
design,
block = patient,
correlation = corfit$consensus.correlation
);
# create contrast matrix
contrast.matrix <- makeContrasts(
'Disease_AvsHealthy' = Disease_A-Healthy,
'Disease_BvsHealthy' = Disease_B-Healthy,
'Disease_A-HealthyvsDisease_B-Healthy' = (Disease_A-Healthy) - (Disease_B-Healthy),
levels = design
);
fit2 <- contrasts.fit(fit, contrast.matrix);
fit2 <- eBayes(fit2, trend = TRUE); **# robust = TRUE, trend = TRUE? what is more appropriate for proteomics **
Hi Gordon, Thank you very much for the prompt response.
If i understand correctly, wouldnt this remove the need to use duplicateCorrelation(). Since averaging condenses these replicates into a single value per biological replicate, would that mean we no longer need to account for correlations between them? Thanks for clarifying!