Question

Correct construction of multiple design matrix with limma in an agilent microarray dataset

0

Entering edit mode

Konstantinos Yeles ▴ 90

@konstantinos-yeles-8961

Last seen 11 days ago

Italy

Dear community,

I would like to address an important question regarding a GEO a time series dataset, GSE21059. I used limma package for preprocessing.

In detail, I'm asking about the specific step considering contrasts I would use. Part of my relevant code (after preprocessing/normalization etc):

grouping <- paste0(final$targets$Sample.and.Data.Relationship.Format, ".", final$targets$time)

head(grouping)
[1] irradiated.0.5h irradiated.0.5h irradiated.0.5h irradiated.0.5h bystander.0.5h  bystander.0.5h 
18 Levels: bystander.0.5h bystander.1.0h bystander.2.0h bystander.24.0h bystander.4.0h ... irradiated.6.0h

batch <- factor(final$targets$Batch) # where batch the 4 different biological replicates

> head(batch) 
[1] 1 2 3 4 1 2
Levels: 1 2 3 4
design <- model.matrix(~0 + grouping + batch)
colnames(design)
 [1] "groupingbystander.0.5h"   "groupingbystander.1.0h"   "groupingbystander.2.0h"   "groupingbystander.24.0h" 
 [5] "groupingbystander.4.0h"   "groupingbystander.6.0h"   "groupingcontrol.0.5h"     "groupingcontrol.1.0h"    
 [9] "groupingcontrol.2.0h"     "groupingcontrol.24.0h"    "groupingcontrol.4.0h"     "groupingcontrol.6.0h"    
[13] "groupingirradiated.0.5h"  "groupingirradiated.1.0h"  "groupingirradiated.2.0h"  "groupingirradiated.24.0h"
[17] "groupingirradiated.4.0h"  "groupingirradiated.6.0h"  "batch2"                   "batch3"                  
[21] "batch4"

Also, it is important to mention that there are control samples (groups) also for each time point and not "universal"

My main goal, is to identify any putative DE genes, that could discriminates/separate Bystander samples from irradiated ones, totally and not only for example in a specific time point. Thus, i thought a naive setting of contrasts.fit in the following lines:

con <- makeContrasts(total comparison =((groupingirradiated.0.5h + groupingirradiated.1.0h + groupingirradiated.2.0h + groupingirradiated.4.0h + groupingirradiated.6.0h + groupingirradiated.24.0h)/6 - (groupingbystander.0.5h + groupingbystander.1.0h + groupingbystander.2.0h + groupingbystander.4.0h + groupingbystander.6.0h + groupingbystander.24.0h)/6), levels=design).......

With this approach, i ended in 20 DE genes with an FDR cutoff < 0.05: which, despite the relatively small number, are implicated in interesting biological processes relative to our studied phenomenon. However, in a following heatmap-including only the bystander and irradiated samples--, there was not a clear separation.

Thus, in your opinion:

1) Could be an "improved" formulation of my above contrasts fit, in order to identify any DE genes above all the time points that discriminate directly bystander and irradiated samples ? Or my notion is incorrect, based on the fact of different times points, and i should follow a different approach

2) Even if my methodology above is vital, the above identified genes could still "be valid" --except their biological relevance", but perhaps try a different approach before heatmap construction? In other words, for instance compute the average of each one of these genes in the batches for each condition, and then perform the heatmap ? In the context, of perhaps different time points in the heatmap and/or batches affect the clustering of these genes?

Thank you in advance,

Konstantinos

limma multiple factor design multiple time points microarray design and contrast matrix • 1.4k views

ADD COMMENT • link updated 8.7 years ago by James W. MacDonald 68k • written 8.7 years ago by Konstantinos Yeles ▴ 90

score 1 · Accepted Answer · 2016-10-25

1

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 10 hours ago

United States

The analysis you are doing is ignoring the fact that you have six time points. I suppose you could have a rationale for doing that, but it does seem sort of odd that you (or your collaborators) would design an experiment that involves a time course and then ignore that fact in the analysis.

The essence of your question is 'should I be analyzing my data this way', rather than the conventional purpose of this support site, which is to answer questions about the software itself. While we sometimes stray into the realm of giving analysis advice around here, I think the better option is for you to find a local statistician to consult with.

ADD COMMENT • link 8.7 years ago James W. MacDonald 68k

0

Entering edit mode

Thank you very much for your answer !

We didn't design this experiment and also we did not ignore the time points,

this dataset is included in a bigger number of analyzed datasets regarding a common biological phenomenon, which is bystander effects. That's why I created this post to have the opinions of the specialists, regarding our basing aim on this dataset:

to identify any DE genes directly separating bystander vs irradiated samples (regardless of the time point)

thus, do you agree with the above approach?

or

to take for each time point with makeContrasts bystander vs irradiated samples and then perhaps from a Venn diagram intersect any common DE between all time points?

ADD REPLY • link 8.7 years ago Konstantinos Yeles ▴ 90