I would be very grateful for advice regarding the analysis of my microarray data. I am pretty new to gene expression analysis, and limma
seems to be a very powerful package, but I am still a bit overwhelmed by it because it offers so many different functions that I can't make sense of yet.
In my experiment, I thawed a vial of cells and cultured them. The cells were then split into four plates and further cultured. Three of the four plates were treated with different drugs for several days; the fourth plate was not treated and served as a control.
At different time points, I collected a portion of the cells for gene expression analysis.
In analyzing the microarray data, I am now interested in looking at the differentially expressed genes (DEGs) at the different time points to determine whether the prolonged treatment has an enhancing effect, i.e., whether the number of significant DEGs increases. Since the cell culture's duration will also affect the genome, I would like to normalize the treated cells to the control sample of each time point so that I can only see the effects of the treatment.
Unfortunately, we could not make replicates of each sample due to cost, so I only measured one sample per treatment and time point.
How would one proceed here? So far, I have been inspired by chapter 17.4 (Time course effects of corn oil on rat thymus) of the limma
Handbook. Here, however, the data are not normalized to a control, so I cannot adopt this pipeline one-to-one. Furthermore, replicates were measured here, and I wonder if I have to use any special functions to interpret my single measurements correctly.
My idea was to analyze the three treatments separately, i.e., first load only the data from the first treatment and the control, normalize the data to the control and then perform linear regression modeling using a y ~ 0 + time
function. When the data are normalized to the control, no genes should be differentially expressed at time t0, which is why a forced intercept at (0,0) makes biological sense. From the model, the effect of treatment over time would then be seen. I would subsequently run this pipeline for the remaining two treatments as well.
I am grateful to the community for any hints, code snippets, or advice.
Targets.tsv
FileName Name Treatment Time Cy3
file1 ctrl_t0 none 0 control
file2 treat1_t0 treat1 0 sample
file3 treat2_t0 treat2 0 sample
file4 treat3_t0 treat3 0 sample
file5 ctrl_t1 none 1 control
file6 treat1_t1 treat1 1 sample
file7 treat2_t1 treat2 1 sample
file8 treat3_t1 treat3 1 sample
file9 ctrl_t2 none 2 control
file10 treat1_t2 treat1 2 sample
file11 treat2_t2 treat2 2 sample
file12 treat3_t2 treat3 2 sample
file13 ctrl_t3 none 3 control
file14 treat1_t3 treat1 3 sample
file15 treat2_t3 treat2 3 sample
file16 treat3_t3 treat3 3 sample
file17 ctrl_t4 none 4 control
file18 treat1_t4 treat1 4 sample
file19 treat2_t4 treat2 4 sample
file20 treat3_t4 treat3 4 sample
Selection of the files of one treatment*
targetinfo <- readTargets(targetsFile, row.names = 'Name')
targetinfo <- subset(targetinfo, Treatment == "treat1" | Treatment == "none")
Resulting targets
FileName Name Treatment Time Cy3
file1 ctrl_t0 none 0 control
file2 treat1_t0 treat1 0 sample
file5 ctrl_t1 none 1 control
file6 treat1_t1 treat1 1 sample
file9 ctrl_t2 none 2 control
file10 treat1_t2 treat1 2 sample
file13 ctrl_t3 none 3 control
file14 treat1_t3 treat1 3 sample
file17 ctrl_t4 none 4 control
file18 treat1_t4 treat1 4 sample
Linear regression*
time_factor <- factor(targetinfo$Time)
design <- model.matrix(~ 0 + time_factor)
fit <- lmFit(eset, design = design, weights = array.weights)
With this approach, the data is NOT normalized to the control as I do not know how to set the control samples as a reference. I saw a modelMatrix
function in limma
, which seems to do what I'm seeking, but I don't understand how I should combine it with my linear regression model ~ 0 + time_factor
.
A couple of questions:
Should
treat1_t0
,treat2_t0
andtreat3_t0
all be the same asctrl_t0
? I have assumed in my answer below that they are.Are the time-points equally spaced? When you asked this question on the Cross-Validated forum ( https://stats.stackexchange.com/questions/592854/ ) you said the time-points were t0, t2, t5, t7 and t10.
Can you show an MDS plot of the data so we can get an idea of the expression pattern over time? The first steps in a microarray analysis are always to normalize the data and then to explore the sample relationships with an MDS plot.
t0
samples were harvested just before adding the pharmaceuticals, thus all thet0
should be more or less the same. They are just four dirrerent cell culture plates cultured from one and the same cell vial.t0
.The most important plot is the MDS plot of all 20 samples at once. Simply
plotMDS(y)
.Here you go :)
The MDS plot shows suggests that your treatments are hardly different to the controls. So the results from your experiment are likely to be rather limited. The most noticeable aspect of the plot one sample that is an outlier.