Dear community,
I would like to address an important question regarding a GEO a time series dataset, GSE21059. I used limma package for preprocessing.
In detail, I'm asking about the specific step considering contrasts I would use. Part of my relevant code (after preprocessing/normalization etc):
grouping <- paste0(final$targets$Sample.and.Data.Relationship.Format, ".", final$targets$time)
head(grouping) [1] irradiated.0.5h irradiated.0.5h irradiated.0.5h irradiated.0.5h bystander.0.5h bystander.0.5h 18 Levels: bystander.0.5h bystander.1.0h bystander.2.0h bystander.24.0h bystander.4.0h ... irradiated.6.0h batch <- factor(final$targets$Batch) # where batch the 4 different biological replicates
> head(batch) [1] 1 2 3 4 1 2 Levels: 1 2 3 4 design <- model.matrix(~0 + grouping + batch) colnames(design) [1] "groupingbystander.0.5h" "groupingbystander.1.0h" "groupingbystander.2.0h" "groupingbystander.24.0h" [5] "groupingbystander.4.0h" "groupingbystander.6.0h" "groupingcontrol.0.5h" "groupingcontrol.1.0h" [9] "groupingcontrol.2.0h" "groupingcontrol.24.0h" "groupingcontrol.4.0h" "groupingcontrol.6.0h" [13] "groupingirradiated.0.5h" "groupingirradiated.1.0h" "groupingirradiated.2.0h" "groupingirradiated.24.0h" [17] "groupingirradiated.4.0h" "groupingirradiated.6.0h" "batch2" "batch3" [21] "batch4"
Also, it is important to mention that there are control samples (groups) also for each time point and not "universal"
My main goal, is to identify any putative DE genes, that could discriminates/separate Bystander samples from irradiated ones, totally and not only for example in a specific time point. Thus, i thought a naive setting of contrasts.fit in the following lines:
con <- makeContrasts(total comparison =((groupingirradiated.0.5h + groupingirradiated.1.0h + groupingirradiated.2.0h + groupingirradiated.4.0h + groupingirradiated.6.0h + groupingirradiated.24.0h)/6 - (groupingbystander.0.5h + groupingbystander.1.0h + groupingbystander.2.0h + groupingbystander.4.0h + groupingbystander.6.0h + groupingbystander.24.0h)/6), levels=design).......
With this approach, i ended in 20 DE genes with an FDR cutoff < 0.05: which, despite the relatively small number, are implicated in interesting biological processes relative to our studied phenomenon. However, in a following heatmap-including only the bystander and irradiated samples--, there was not a clear separation.
Thus, in your opinion:
1) Could be an "improved" formulation of my above contrasts fit, in order to identify any DE genes above all the time points that discriminate directly bystander and irradiated samples ? Or my notion is incorrect, based on the fact of different times points, and i should follow a different approach
2) Even if my methodology above is vital, the above identified genes could still "be valid" --except their biological relevance", but perhaps try a different approach before heatmap construction? In other words, for instance compute the average of each one of these genes in the batches for each condition, and then perform the heatmap ? In the context, of perhaps different time points in the heatmap and/or batches affect the clustering of these genes?
Thank you in advance,
Konstantinos
Thank you very much for your answer !
We didn't design this experiment and also we did not ignore the time points,
this dataset is included in a bigger number of analyzed datasets regarding a common biological phenomenon, which is bystander effects. That's why I created this post to have the opinions of the specialists, regarding our basing aim on this dataset:
to identify any DE genes directly separating bystander vs irradiated samples (regardless of the time point)
thus, do you agree with the above approach?
or
to take for each time point with makeContrasts bystander vs irradiated samples and then perhaps from a Venn diagram intersect any common DE between all time points?