Question

Differential abundance analysis of phosphoproteomics data with multiple time points and conditions

1

Entering edit mode

svlachavas ▴ 830

@svlachavas-7225

Last seen 6 months ago

Germany/Heidelberg/German Cancer Resear…

Dear Bioc Community,

good morning and I hope my message finds everyone well !! I wanted to ask one specific question regarding possible analysis of a phosphoproteomics dataset of a breast cancer cell line with multiple conditions (WT, LTED (estrogen deprived) & TAMR(tamoxifen resistant) with multiple timepoints (0m, 2m, 5m, 10m, 30m, 60m & 120m), comprised of 3 biological replicates each; The phosphosite ratios have been processed and log2 normalized using the vsn pipeline. Ultimately, the goal is to find specific differentially abundant phosphosites that differentiate between either LTED or TAMR versus the WT samples at any time point, but also any noticeable differences between the LTED and TAMR conditions;

On this premise, one way would be to directly construct direct comparisons for each time point, like this:

head(colData(ppe))
DataFrame with 6 rows and 3 columns
             Condition Timepoint       group
           <character> <numeric> <character>
LTED_0_r1       LTED_0         0        LTED
LTED_0_r2       LTED_0         0        LTED
LTED_0_r3       LTED_0         0        LTED
LTED_10_r1     LTED_10        10        LTED
LTED_10_r2     LTED_10        10        LTED
LTED_10_r3     LTED_10        10        LTED

cond.cell <- as.factor(ppe@colData$Condition)
design <- model.matrix(~ 0 + cond.cell)
colnames(design) <- gsub("cond.cell", "", colnames(design))

fit <- lmFit(ppe@assays@data$Normalization, design)

# Warning message: Partial NA coefficients for 9910 probe(s) 
contrast.matrix <- makeContrasts(TAMR_0_comp=WT_0-TAMR_0,
LTED_0_comp=WT_0-LTED_0, TAMR_2_comp=WT_2-TAMR_2....)

However, In section 9.6.2 of the limma vignette, there is also an example with multiple timepoints mentioned, and tried to reproduce based on my experimental design similarity:

X <- ns(ppe$Timepoint, df=5) 
Group <- factor(ppe$group,levels=c("WT","LTED","TAMR")) # to keep the WT condition as the "baseline" level
design <- model.matrix(~Group*X)

head(design)
  (Intercept) GroupLTED GroupTAMR        X1         X2         X3        X4         X5 GroupLTED:X1
1           1         1         0 0.0000000 0.00000000  0.0000000 0.0000000  0.0000000    0.0000000
2           1         1         0 0.0000000 0.00000000  0.0000000 0.0000000  0.0000000    0.0000000
3           1         1         0 0.0000000 0.00000000  0.0000000 0.0000000  0.0000000    0.0000000
4           1         1         0 0.5442603 0.07399677 -0.1181344 0.2973124 -0.1783874    0.5442603
5           1         1         0 0.5442603 0.07399677 -0.1181344 0.2973124 -0.1783874    0.5442603
6           1         1         0 0.5442603 0.07399677 -0.1181344 0.2973124 -0.1783874    0.5442603
  GroupTAMR:X1 GroupLTED:X2 GroupTAMR:X2 GroupLTED:X3 GroupTAMR:X3 GroupLTED:X4 GroupTAMR:X4
1            0   0.00000000            0    0.0000000            0    0.0000000            0
2            0   0.00000000            0    0.0000000            0    0.0000000            0
3            0   0.00000000            0    0.0000000            0    0.0000000            0
4            0   0.07399677            0   -0.1181344            0    0.2973124            0
5            0   0.07399677            0   -0.1181344            0    0.2973124            0
6            0   0.07399677            0   -0.1181344            0    0.2973124            0
  GroupLTED:X5 GroupTAMR:X5
1    0.0000000            0
2    0.0000000            0
3    0.0000000            0
4   -0.1783874            0
5   -0.1783874            0
6   -0.1783874            0

fit1 <- lmFit(ppe@assays@data$Normalization, design)
fit2 <- eBayes(fit1, trend = TRUE, robust=TRUE)

Hence, my critical follow-up questions are the following:

1) Based on my overall description, the putative implementation with a regression spline would be more optimal, as there are multiple timepoints covered (7), whereas the presense of also multiple conditions (3), probably would suggest constant protein phoshorylation events that do not alter directly in specific timepoints, but rather are altered smoothly on the overall profiled profiled experiment?

2) If my notion is correct, could I leave the number of degrees of freedom to 5?

3) Regarding the interpretation of design matrix, as also important comparisons to evaluate:

A) After column X5, each follow-up column like GroupLTED:X1, essentially denotes the differential phosphorylation abundance alteration of LTED vs WT, but on a specific timepoint axis? and which is the interpretation of X1 to X5 ? These are the 5 "different" curves implemented?

B) hence based on this implementation, I could not focus on a specific timepoint, rather perhaps with an "ANOVA" like procedure, check for phosphosites with different time-trends overall in either LTED vs WT or TAMR vs WT?

Thank you in advance;

Efstathios

Proteomics DifferentialRegulation limma • 446 views

ADD COMMENT • link 8 months ago svlachavas ▴ 830