I would be very grateful for advice regarding the analysis of my microarray data. I am pretty new to gene expression analysis, and
limma seems to be a very powerful package, but I am still a bit overwhelmed by it because it offers so many different functions that I can't make sense of yet.
In my experiment, I thawed a vial of cells and cultured them. The cells were then split into four plates and further cultured. Three of the four plates were treated with different drugs for several days; the fourth plate was not treated and served as a control.
At different time points, I collected a portion of the cells for gene expression analysis.
In analyzing the microarray data, I am now interested in looking at the differentially expressed genes (DEGs) at the different time points to determine whether the prolonged treatment has an enhancing effect, i.e., whether the number of significant DEGs increases. Since the cell culture's duration will also affect the genome, I would like to normalize the treated cells to the control sample of each time point so that I can only see the effects of the treatment.
Unfortunately, we could not make replicates of each sample due to cost, so I only measured one sample per treatment and time point.
How would one proceed here? So far, I have been inspired by chapter 17.4 (Time course effects of corn oil on rat thymus) of the
limma Handbook. Here, however, the data are not normalized to a control, so I cannot adopt this pipeline one-to-one. Furthermore, replicates were measured here, and I wonder if I have to use any special functions to interpret my single measurements correctly.
My idea was to analyze the three treatments separately, i.e., first load only the data from the first treatment and the control, normalize the data to the control and then perform linear regression modeling using a
y ~ 0 + time function. When the data are normalized to the control, no genes should be differentially expressed at time t0, which is why a forced intercept at (0,0) makes biological sense. From the model, the effect of treatment over time would then be seen. I would subsequently run this pipeline for the remaining two treatments as well.
I am grateful to the community for any hints, code snippets, or advice.
FileName Name Treatment Time Cy3 file1 ctrl_t0 none 0 control file2 treat1_t0 treat1 0 sample file3 treat2_t0 treat2 0 sample file4 treat3_t0 treat3 0 sample file5 ctrl_t1 none 1 control file6 treat1_t1 treat1 1 sample file7 treat2_t1 treat2 1 sample file8 treat3_t1 treat3 1 sample file9 ctrl_t2 none 2 control file10 treat1_t2 treat1 2 sample file11 treat2_t2 treat2 2 sample file12 treat3_t2 treat3 2 sample file13 ctrl_t3 none 3 control file14 treat1_t3 treat1 3 sample file15 treat2_t3 treat2 3 sample file16 treat3_t3 treat3 3 sample file17 ctrl_t4 none 4 control file18 treat1_t4 treat1 4 sample file19 treat2_t4 treat2 4 sample file20 treat3_t4 treat3 4 sample
Selection of the files of one treatment*
targetinfo <- readTargets(targetsFile, row.names = 'Name') targetinfo <- subset(targetinfo, Treatment == "treat1" | Treatment == "none")
FileName Name Treatment Time Cy3 file1 ctrl_t0 none 0 control file2 treat1_t0 treat1 0 sample file5 ctrl_t1 none 1 control file6 treat1_t1 treat1 1 sample file9 ctrl_t2 none 2 control file10 treat1_t2 treat1 2 sample file13 ctrl_t3 none 3 control file14 treat1_t3 treat1 3 sample file17 ctrl_t4 none 4 control file18 treat1_t4 treat1 4 sample
time_factor <- factor(targetinfo$Time) design <- model.matrix(~ 0 + time_factor) fit <- lmFit(eset, design = design, weights = array.weights)
With this approach, the data is NOT normalized to the control as I do not know how to set the control samples as a reference. I saw a
modelMatrix function in
limma, which seems to do what I'm seeking, but I don't understand how I should combine it with my linear regression model
~ 0 + time_factor.
A couple of questions:
treat3_t0all be the same as
ctrl_t0? I have assumed in my answer below that they are.
Are the time-points equally spaced? When you asked this question on the Cross-Validated forum ( https://stats.stackexchange.com/questions/592854/ ) you said the time-points were t0, t2, t5, t7 and t10.
Can you show an MDS plot of the data so we can get an idea of the expression pattern over time? The first steps in a microarray analysis are always to normalize the data and then to explore the sample relationships with an MDS plot.
t0samples were harvested just before adding the pharmaceuticals, thus all the
t0should be more or less the same. They are just four dirrerent cell culture plates cultured from one and the same cell vial.
The most important plot is the MDS plot of all 20 samples at once. Simply
Here you go :)
The MDS plot shows suggests that your treatments are hardly different to the controls. So the results from your experiment are likely to be rather limited. The most noticeable aspect of the plot one sample that is an outlier.