Question: Correct construction of multiple design matrix with limma in an agilent microarray dataset
0
3.1 years ago by
University of Salerno, Salerno, Italy
Konstantinos Yeles20 wrote:

Dear community,

I would like to address an important question regarding a GEO a time series dataset, GSE21059. I used limma package for preprocessing.

In detail, I'm asking about the specific step considering contrasts I would use. Part of my relevant code (after preprocessing/normalization etc):

grouping <- paste0(final$targets$Sample.and.Data.Relationship.Format, ".", final$targets$time)
head(grouping)
18 Levels: bystander.0.5h bystander.1.0h bystander.2.0h bystander.24.0h bystander.4.0h ... irradiated.6.0h

batch <- factor(final$targets$Batch) # where batch the 4 different biological replicates 
> head(batch)
[1] 1 2 3 4 1 2
Levels: 1 2 3 4
design <- model.matrix(~0 + grouping + batch)
colnames(design)
[1] "groupingbystander.0.5h"   "groupingbystander.1.0h"   "groupingbystander.2.0h"   "groupingbystander.24.0h"
[5] "groupingbystander.4.0h"   "groupingbystander.6.0h"   "groupingcontrol.0.5h"     "groupingcontrol.1.0h"
[9] "groupingcontrol.2.0h"     "groupingcontrol.24.0h"    "groupingcontrol.4.0h"     "groupingcontrol.6.0h"
[21] "batch4"  

Also, it is important to mention that there are control samples (groups) also for each time point and not "universal"

My main goal, is to identify any putative DE genes, that could discriminates/separate Bystander samples from irradiated ones, totally and not only for example in a specific time point. Thus, i thought a naive setting of contrasts.fit in the following lines:

con <- makeContrasts(total comparison =((groupingirradiated.0.5h + groupingirradiated.1.0h + groupingirradiated.2.0h + groupingirradiated.4.0h + groupingirradiated.6.0h + groupingirradiated.24.0h)/6 - (groupingbystander.0.5h + groupingbystander.1.0h + groupingbystander.2.0h + groupingbystander.4.0h + groupingbystander.6.0h + groupingbystander.24.0h)/6), levels=design).......

With this approach, i ended in 20 DE genes with an FDR cutoff < 0.05: which, despite the relatively small number, are implicated in interesting biological processes relative to our studied phenomenon. However, in a following heatmap-including only the bystander and irradiated samples--, there was not a clear separation.

1) Could be an "improved" formulation of my above contrasts fit, in order to identify any DE genes above all the time points that discriminate directly bystander and irradiated samples ? Or my notion is incorrect, based on the fact of different times points, and i should follow a different approach

2) Even if my methodology above is vital, the above identified genes could still "be valid" --except their biological relevance", but perhaps try a different approach before heatmap construction? In other words, for instance compute the average of each one of these genes in the batches for each condition, and then perform the heatmap ? In the context, of perhaps different time points in the heatmap and/or batches affect the clustering of these genes?

Konstantinos

modified 3.1 years ago by James W. MacDonald51k • written 3.1 years ago by Konstantinos Yeles20
Answer: Correct construction of multiple design matrix with limma in an agilent microarr
0
3.1 years ago by
United States
James W. MacDonald51k wrote:

The analysis you are doing is ignoring the fact that you have six time points. I suppose you could have a rationale for doing that, but it does seem sort of odd that you (or your collaborators) would design an experiment that involves a time course and then ignore that fact in the analysis.

The essence of your question is 'should I be analyzing my data this way', rather than the conventional purpose of this support site, which is to answer questions about the software itself. While we sometimes stray into the realm of giving analysis advice around here, I think the better option is for you to find a local statistician to consult with.

We didn't design this experiment and also we did not ignore the time points,

this dataset is included in a bigger number of analyzed datasets regarding a common biological phenomenon, which is bystander effects. That's why I created this post to have the opinions of the specialists, regarding our basing aim on this dataset:

to identify any DE genes directly separating bystander vs irradiated samples (regardless of the time point)

thus, do you agree with the above approach?

or

to take for each time point with makeContrasts bystander vs irradiated samples and then perhaps from a Venn diagram intersect any common DE between all time points?