Dear Bioc Community,
good morning and I hope my message finds everyone well !! I wanted to ask one specific question regarding possible analysis of a phosphoproteomics dataset of a breast cancer cell line with multiple conditions (WT, LTED (estrogen deprived) & TAMR(tamoxifen resistant) with multiple timepoints (0m, 2m, 5m, 10m, 30m, 60m & 120m), comprised of 3 biological replicates each; The phosphosite ratios have been processed and log2 normalized using the vsn pipeline. Ultimately, the goal is to find specific differentially abundant phosphosites that differentiate between either LTED or TAMR versus the WT samples at any time point, but also any noticeable differences between the LTED and TAMR conditions;
On this premise, one way would be to directly construct direct comparisons for each time point, like this:
head(colData(ppe))
DataFrame with 6 rows and 3 columns
Condition Timepoint group
<character> <numeric> <character>
LTED_0_r1 LTED_0 0 LTED
LTED_0_r2 LTED_0 0 LTED
LTED_0_r3 LTED_0 0 LTED
LTED_10_r1 LTED_10 10 LTED
LTED_10_r2 LTED_10 10 LTED
LTED_10_r3 LTED_10 10 LTED
cond.cell <- as.factor(ppe@colData$Condition)
design <- model.matrix(~ 0 + cond.cell)
colnames(design) <- gsub("cond.cell", "", colnames(design))
fit <- lmFit(ppe@assays@data$Normalization, design)
# Warning message: Partial NA coefficients for 9910 probe(s)
contrast.matrix <- makeContrasts(TAMR_0_comp=WT_0-TAMR_0,
LTED_0_comp=WT_0-LTED_0, TAMR_2_comp=WT_2-TAMR_2....)
However, In section 9.6.2 of the limma vignette, there is also an example with multiple timepoints mentioned, and tried to reproduce based on my experimental design similarity:
X <- ns(ppe$Timepoint, df=5)
Group <- factor(ppe$group,levels=c("WT","LTED","TAMR")) # to keep the WT condition as the "baseline" level
design <- model.matrix(~Group*X)
head(design)
(Intercept) GroupLTED GroupTAMR X1 X2 X3 X4 X5 GroupLTED:X1
1 1 1 0 0.0000000 0.00000000 0.0000000 0.0000000 0.0000000 0.0000000
2 1 1 0 0.0000000 0.00000000 0.0000000 0.0000000 0.0000000 0.0000000
3 1 1 0 0.0000000 0.00000000 0.0000000 0.0000000 0.0000000 0.0000000
4 1 1 0 0.5442603 0.07399677 -0.1181344 0.2973124 -0.1783874 0.5442603
5 1 1 0 0.5442603 0.07399677 -0.1181344 0.2973124 -0.1783874 0.5442603
6 1 1 0 0.5442603 0.07399677 -0.1181344 0.2973124 -0.1783874 0.5442603
GroupTAMR:X1 GroupLTED:X2 GroupTAMR:X2 GroupLTED:X3 GroupTAMR:X3 GroupLTED:X4 GroupTAMR:X4
1 0 0.00000000 0 0.0000000 0 0.0000000 0
2 0 0.00000000 0 0.0000000 0 0.0000000 0
3 0 0.00000000 0 0.0000000 0 0.0000000 0
4 0 0.07399677 0 -0.1181344 0 0.2973124 0
5 0 0.07399677 0 -0.1181344 0 0.2973124 0
6 0 0.07399677 0 -0.1181344 0 0.2973124 0
GroupLTED:X5 GroupTAMR:X5
1 0.0000000 0
2 0.0000000 0
3 0.0000000 0
4 -0.1783874 0
5 -0.1783874 0
6 -0.1783874 0
fit1 <- lmFit(ppe@assays@data$Normalization, design)
fit2 <- eBayes(fit1, trend = TRUE, robust=TRUE)
Hence, my critical follow-up questions are the following:
1) Based on my overall description, the putative implementation with a regression spline would be more optimal, as there are multiple timepoints covered (7), whereas the presense of also multiple conditions (3), probably would suggest constant protein phoshorylation events that do not alter directly in specific timepoints, but rather are altered smoothly on the overall profiled profiled experiment?
2) If my notion is correct, could I leave the number of degrees of freedom to 5?
3) Regarding the interpretation of design matrix, as also important comparisons to evaluate:
A) After column X5, each follow-up column like GroupLTED:X1, essentially denotes the differential phosphorylation abundance alteration of LTED vs WT, but on a specific timepoint axis? and which is the interpretation of X1 to X5 ? These are the 5 "different" curves implemented?
B) hence based on this implementation, I could not focus on a specific timepoint, rather perhaps with an "ANOVA" like procedure, check for phosphosites with different time-trends overall in either LTED vs WT or TAMR vs WT?
Thank you in advance;
Efstathios