Dear Bioconductor community,
based on a current project aiming to decipher prognostic biomarkers that driving the response to chemotherapy in breast cancer, we are trying to analyze external public datasets to validate in house findings; on this premise, on an rnaseq gene expression dataset of neo-adjuvant treated patients with multiple time points:
T1 is the “pre-treatment” timepoint, T2 after two weeks on treatment, T3 is the “middle” timepoint of chemotherapy, whereas T4 refers to the relative timepoint when the surgical resection performed;
Our main goal is to find any differences between T1 and the available time-points that refer to after chemotherapy to identify interesting markers; as also to emphasize perhaps on specific comparisons-the main putative issue is based on the availability of different time points on the available patients:
head(dd,8) # A tibble: 8 x 2 SampleID Biopsy_Time <chr> <chr> 1 Patient_11_T1 biopsy time: T1 2 Patient_11_T2 biopsy time: T2 3 Patient_11_T3 biopsy time: T3 4 Patient_11_T4 biopsy time: T4 5 Patient_15_T1 biopsy time: T1 6 Patient_15_T2 biopsy time: T2 7 Patient_15_T3 biopsy time: T3 8 Patient_15_T4 biopsy time: T4 table(dd$Biopsy_Time) biopsy time: T1 biopsy time: T2 biopsy time: T3 biopsy time: T4 21 9 19 20
As you see from above, not all time points are covered uniformly;
6 patients have complete 4 timepoints (T1-T2-T3-T4)
13 patients have 3 available timepoints: 2 have T1-T2-T4, and the rest T1-T3-T4
Thus, my major questions are the following:
1) If I consider only the 6 patients will all timepoints available, I could perform something like a "paired" design? For example if SampleID only denotes the patient number, and the Biopsy time the available treatment timepoints/conditions, with T1 as the reference level then:
design <- model.matrix(~SampleID+Biopsy_Time) fit <- estimateDisp(y,design) fit2 <- glmQLFit(fit, design) fit3 <- glmQLFTest(fit2)
and also for example by setting
glmQLFTest I can perform an ANOVA-like comparison that can detect DEs between any of the chemotherapy timepoints versus treatment naïve samples?
2) In addition, is there a way to incorporate also the information of the additional samples that lack any of the aforementioned levels? or the relative NA values in specific levels such as T2 would hamper any relative analysis?
Thank you in advance,