Dear readers,
I would greatly appreciate some help regarding analysis of a multi-factorial timecourse experiment, particularly regarding model selection and comparisons (either in edgeR or in DESeq2).
My dataset consists of three closely related species sampled at three timepoints (3 reps for each timepoint:species). At the first timepoint they are all susceptible, at the second timepoint two of the species have gained resistance, and at the third timepoint they behave the same as at timepoint two. The third species remains susceptible during all three timepoints. Our hypothesis is that in the resistant species pathways are up- or down-regulated between the first two stages, but in the susceptible species the regulation is opposite (or consistently too low/too high).
I have tried models with the factors as main effects (~stage + species), with interactions (~species + species:stage, ~species*stage), and also nested (~species + species/stage). Using these models I am not able to carry out the comparisons I would like because often the terms of interest are missing from the resultsNames. I have also tried adding extra factors to take into account the behaviour of the species:stage but this does not differentiate between the first stage and the other two stages in the susceptible species. Do you think I'm using the wrong models? If so is there a model that you think is better suited for this analysis?
I have also tried to group each species:stage according to the DESeq2 vignette (section 3.3) and according to the timecourse analysis guidelines in the DESeq2 vignette (section 3.4). I am convinved that the "true" results are contained in the results of the "group" analysis but there are soo many results not conforming to our expression profile hypothesis that I can't sort through them all one by one.
I must also mention that about half the genes are globally differentially expressed between two or three of the stages (>10 000 genes).
Ideally I want to compare tolerant-species-at-tolerant-stages2+3 with tolerant-species-at-sensitive-stage1 and within those DE results find the intersection with the DEG of sensitive-species-at-stage1 compared to tolerant-species-at-sensitive-stage1 and sensitive-species-at-stage2+3 compared to tolerant-species-at-tolerant-stages2+3. Do you think this is even possible to do in one comparison or are individual comparisons and then combination through venn diagrams the only way?
I would appreciate any suggestions you have.
Thank you,
Anna
PS: I am using R version 3.3.2 (2016-10-31), edgeR_3.16.5, DESeq2_1.14.1 on windows 32-bit. I am an R newbie and have been working with DESeq2 for the past two months.
Hello again,
Just an update on this question in case other readers find it helpful:
I tried different methods for analysing my data, both grouping the samples in interesting ways, and also doing complicated contrasts to get the DEG in one go, but they all produced too many false positives. We noticed these false positives through visual inspection of the candidates and it was apparent that the sensitive samples were being compared to the average of the resistant species, which is not exactly biologically relevant in our case... We would rather compare the sensitive species to each resistant species separately and then find the common DEG.
I ended up using a "group" analysis, as suggested in the DESeq2 manual (page 30, section 3.3) and in the EdgeR manual (page 34, section 3.3.1). Then all comparisons of interest (both intra and inter-species) are carried out and common genes which pass our p-value cutoff in all comparisons of interest are considered true positives. DESeq2 gave more results than EdgeR which I think is to be expected.
Thank you again for your suggestions Aaron Lun and Gavin Kelly!
-Anna