Design matrix for a two-color experimental design testing for time effect on 3 treated cell-lines, each with its own time=0 control
1
0
Entering edit mode
rubi ▴ 110
@rubi-6462
Last seen 4.5 years ago

Hi,

I did microarray gene-expression profiling on the following experimental design:

I took three cell lines and plated each one of them in four plates, hence a total of 12 plates.

I purified RNA from a single plate of each of the three cell lines at time=0, then X-ray irradiated the remaining plates, and then for each cell-line purified RNA at three different time points (24h, 48h, and 72h), i.e. a plate per time-point.

For each cell-line, the reference channel in the microarray is its corresponding non-irradiated sample.

The cell-lines in this context are treated as biological replicates (perhaps not ideal but this is merely a pilot experiment).

I'd like to estimate the effect of time on irradiated-cell gene expression where gene expression from each cell-line would be the irradiated sample at each time point relative to its corresponding the non-irradiated time=0 sample. I guess that if my response was log fold-change (log.fc) of a certain time point relative to time 0, the model would be: log.fc ~ time and if I wanted to adjust for cell-line it would be: log.fc ~ time + cell.line

Here's my targets data.frame (Cy3 lists the irradiated samples and Cy5 the non-irradiated samples):

targets <- data.frame(SlideNumber = rep(1:3,3),
FileName=c("/data/SG15074465_258552110006_S001_GE2_1200_Jun14_PEG200_1_1.txt",
"/data/SG15074465_258552110006_S001_GE2_1200_Jun14_PEG200_1_2.txt",
"/data/SG15074465_258552110006_S001_GE2_1200_Jun14_PEG200_1_3.txt",
"/data/SG15074465_258552110010_S001_GE2_1200_Jun14_PEG200_1_1.txt",
"/data/SG15074465_258552110010_S001_GE2_1200_Jun14_PEG200_1_2.txt",
"/data/SG15074465_258552110010_S001_GE2_1200_Jun14_PEG200_1_3.txt",
"/data/SG15074465_258552110010_S001_GE2_1200_Jun14_PEG200_2_1.txt",
"/data/SG15074465_258552110010_S001_GE2_1200_Jun14_PEG200_2_2.txt",
"/data/SG15074465_258552110010_S001_GE2_1200_Jun14_PEG200_2_1.txt"),
Cy3 = c("xray.24_1","xray.48_1","xray.72_1",
"xray.24_2","xray.48_2","xray.72_2",
"xray.24_3","xray.48_3","xray.72_3"),
Cy5 = c("ctrl.0_1","ctrl.0_1","ctrl.0_1",
"ctrl.0_2","ctrl.0_2","ctrl.0_2",
"ctrl.0_3","ctrl.0_3","ctrl.0_3"),
cell.line = c(rep(1,3),rep(2,3),rep(3,3)),
time=rep(c(24,48,72),3),
stringsAsFactors = F)



I read the scanned files and did the normalizations using this code:

RG <- limma::read.maimages(targets$FileName, source="agilent.median") RG.bg.corrected <- limma::backgroundCorrect(RG,method="normexp",offset=16) RG.bg.within.array.corrected <- limma::normalizeWithinArrays(RG.bg.corrected,method="loess") RG.bg.between.array.corrected <- limma::normalizeBetweenArrays(RG.bg.within.array.corrected,method="Aquantile")  And for setting up a design matrix for the log.fc ~ time model, I tried: library(dplyr) design <- model.matrix(y ~ time, data=targets %>% dplyr::mutate(y=rnorm(9))) But I'm getting this error when trying to run lmFit:  Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent And for the model adjusting for cell-line I set up this design matrix: targets$cell.line <- factor(targets$cell.line,levels=1:3) design <- model.matrix(y ~ time+cell.line,contrasts=c(NA,"contr.treatment"),data=targets %>% dplyr::mutate(y=rnorm(9))) But running lmFit still produces the same error. Any idea what the problem is or what should the design matrix be? Here are the first 5 rows of RG.bg.between.array.corrected > RG.bg.between.array.corrected[1:5,] An object of class "MAList"$targets
FileName
1 /data/SG15074465_258552110006_S001_GE2_1200_Jun14_PEG200_1_1.txt
2 /data/SG15074465_258552110006_S001_GE2_1200_Jun14_PEG200_1_2.txt
3 /data/SG15074465_258552110006_S001_GE2_1200_Jun14_PEG200_1_3.txt
4 /data/SG15074465_258552110010_S001_GE2_1200_Jun14_PEG200_1_1.txt
5 /data/SG15074465_258552110010_S001_GE2_1200_Jun14_PEG200_1_2.txt
6 /data/SG15074465_258552110010_S001_GE2_1200_Jun14_PEG200_1_3.txt
7 /data/SG15074465_258552110010_S001_GE2_1200_Jun14_PEG200_2_1.txt
8 /data/SG15074465_258552110010_S001_GE2_1200_Jun14_PEG200_2_2.txt
9 /data/SG15074465_258552110010_S001_GE2_1200_Jun14_PEG200_2_3.txt

$genes Block Column Row Name ID RefNumber ControlType GeneName TopHit Description 1 1 1 1 GE_BrightCorner GE_BrightCorner 62976 pos GE_BrightCorner NA NA 2 1 2 1 GE_BrightCorner GE_BrightCorner 62648 pos GE_BrightCorner NA NA 3 1 3 1 DarkCorner DarkCorner 62320 pos DarkCorner NA NA 4 1 4 1 DarkCorner DarkCorner 61992 pos DarkCorner NA NA 5 1 5 1 XP_004869187.1 CUST_5675_PI439843739 61664 false NA NA$source
[1] "agilent.median"

$printer$ngrid.r
[1] 1

$ngrid.c [1] 1$nspot.r
[1] 384

$nspot.c [1] 164$M

/data/SG15074465_258552110006_S001_GE2_1200_Jun14_PEG200_1_1 /data/SG15074465_258552110006_S001_GE2_1200_Jun14_PEG200_1_2 /data/SG15074465_258552110006_S001_GE2_1200_Jun14_PEG200_1_3

[1,]                                                 -0.003804927                                                   0.07581274                                                   0.32692028
[2,]                                                 -0.100439724                                                  -0.12135554                                                   0.10399374
[3,]                                                 -0.100439724                                                  -0.30350304                                                   0.14603983
[4,]                                                 -0.016358922                                                   0.18537314                                                   0.71029227
[5,]                                                  0.000278216                                                  -0.07288553                                                  -0.02972195
/data/SG15074465_258552110010_S001_GE2_1200_Jun14_PEG200_1_1 /data/SG15074465_258552110010_S001_GE2_1200_Jun14_PEG200_1_2 /data/SG15074465_258552110010_S001_GE2_1200_Jun14_PEG200_1_3

[1,]                                                   0.09153339                                                  0.191030075                                                   0.30743532
[2,]                                                  -0.11118199                                                  0.008799683                                                  -0.04141555
[3,]                                                   0.03102885                                                 -0.159041326                                                  -0.26419409
[4,]                                                  -0.24002567                                                  0.103334305                                                   0.04191329
[5,]                                                  -0.21634197                                                 -0.344161687                                                  -0.46036156
/data/SG15074465_258552110010_S001_GE2_1200_Jun14_PEG200_2_1 /data/SG15074465_258552110010_S001_GE2_1200_Jun14_PEG200_2_2 /data/SG15074465_258552110010_S001_GE2_1200_Jun14_PEG200_2_3

[1,]                                                  0.063933997                                                   0.02578673                                                   0.17873555
[2,]                                                  0.040366462                                                   0.08211603                                                  -0.03525927
[3,]                                                 -0.045199954                                                   0.05364567                                                  -0.18528808
[4,]                                                  0.567896630                                                   0.84323869                                                   1.03922721
[5,]                                                  0.004504603                                                  -0.12057734                                                  -0.08462349

\$A

/data/SG15074465_258552110006_S001_GE2_1200_Jun14_PEG200_1_1 /data/SG15074465_258552110006_S001_GE2_1200_Jun14_PEG200_1_2 /data/SG15074465_258552110006_S001_GE2_1200_Jun14_PEG200_1_3

[1,]                                                    11.694533                                                    12.125544                                                    11.851844
[2,]                                                     4.587902                                                     4.517597                                                     4.493085
[3,]                                                     4.587902                                                     4.623010                                                     4.514404
[4,]                                                     7.225362                                                     7.568576                                                     7.128799
[5,]                                                     8.326830                                                     8.414536                                                     8.234831

/data/SG15074465_258552110010_S001_GE2_1200_Jun14_PEG200_1_1 /data/SG15074465_258552110010_S001_GE2_1200_Jun14_PEG200_1_2 /data/SG15074465_258552110010_S001_GE2_1200_Jun14_PEG200_1_3

[1,]                                                    13.510580                                                    12.466787                                                    11.687914
[2,]                                                     6.000278                                                     4.472188                                                     4.542115
[3,]                                                     4.720447                                                     4.480202                                                     4.499401
[4,]                                                     8.146038                                                     7.809378                                                     7.921780
[5,]                                                     7.795751                                                     8.051800                                                     7.590878

/data/SG15074465_258552110010_S001_GE2_1200_Jun14_PEG200_2_1 /data/SG15074465_258552110010_S001_GE2_1200_Jun14_PEG200_2_2 /data/SG15074465_258552110010_S001_GE2_1200_Jun14_PEG200_2_3

[1,]                                                    11.745827                                                    11.204838                                                    12.069992
[2,]                                                     4.449941                                                     4.588539                                                     4.472029
[3,]                                                     4.493532                                                     4.523331                                                     4.576514
[4,]                                                     7.656538                                                     7.598940                                                     7.566575
[5,]                                                     7.777802                                                     7.843019                                                     7.658971

limma microarray design matrix twochannel • 767 views
0
Entering edit mode

It is unnecessary to add random numbers to a targets file. To make a design matrix, you can simply use:

design <- model.matrix(~time, data=targets)
1
Entering edit mode
@gordon-smyth
Last seen 6 hours ago
WEHI, Melbourne, Australia

The error arises from annotation columns, not from the design matrix

This error doesn't have anything to do with the design matrix. This particular error arises when one tries to add rownames to a matrix and the vector of rownames doesn't agree with the row dimension of the matrix. If this error has indeed come from lmFit (and that seems strange to me), then the linear models have been fitted correctly but the error has occurred when trying to adding rownames to the fitted model object at the end of the computation.

The output you show doesn't match the code

There are some strange things going on with your code. The MAList object RG.bg.between.array.corrected that you show cannot have been created by the limma code that you give. In particular, the columns called ID, RefNumber and TopHit are all non-standard annotation column names and hence cannot have been read in by the default read.maimages() command that you give.

Another issue is that your annotation columns contain NA values, and I have never seen NA values from an Agilent file -- missing values are instead represented as empty strings. I conclude that you must have added columns to the data object yourself independently of the limma functions.

Another issue is that you haven't actually shown the code that gives the error.

What's going on?

I was guessing that you have edited the data object using code that you don't show and that there are problems with it, for example columns not all of the same length.

Eventually it became apparent that you have set the genes component of the MAList independently of the code shown in your question, and that your genes data.frame has more rows than there are probes on your microarray. In other words, your MAList is invalid because the number of rows of the different components don't agree. This leads to an error when lmFit tries to set the rownames of the fitted model object.

The lesson is that you should read the annotation columns using read.maimages as explained in the limma User's Guide (Section 4.4) and not try to set them yourself. It's quicker and easier all round.

Well, there a lot of things going on here and you're still some way from making a start on analysing this dataset. I'll try to point you in the right direction:

1. What analysis are you trying to do? Normally it wouldn't be enough to regress expression on time ignoring cell line. Are you perhaps trying to regress on time adjusting for cell line? Are you interested in the differences between cell lines? What is your biological question?
2. Where exactly is your data? Your code doesn't show any object containing any expression data. You claim to have run the lmFit() function, but on what data? How have you read the expression data into R? How have you background corrected and normalized?

The last point is the important. Without knowing what form of expression data you are trying to analyse, it isn't possible to advise you how to form the design matrix.

0
Entering edit mode

Thanks a lot for answer @Gordon Smyth.

I edited my question to make it clearer and address your questions, but let me also reply here.

My biological question is which genes show time-dependent changes in expression due to irradiation (i.e., which genes go up and down with time due to the irradiation).

I'd like to try two models: one simply treating cell-lines as biological replicates and hence simply regressing expression on time. And the other adjusting for cell-line.

In a further step, I have a similar experiment from another species and I'd like to estimate the species by time interaction effect on gene expression (again with two models, one treating for cell-lines as biological replicates and the other adjusting for them).

1
Entering edit mode

I'm not sure what the problem is, but I don't think it has to do with the design matrix.

Can you please show the first 5 lines of data, like this:

RG.bg.between.array.corrected[1:5, ]
0
Entering edit mode

Sure. Just added it at the bottom of my post.

Thanks a lot.

1
Entering edit mode

0
Entering edit mode

Thanks again @Gordon Smyth.

The reason for the error is the GAL annotation file.

It has a few non-probe rows in it, e.g.:

  Block Column Row            Name                     ID RefNumber ControlType        GeneName TopHit Description
1     1      1   1 GE_BrightCorner        GE_BrightCorner     62976         pos GE_BrightCorner     NA          NA
2     1      2   1 GE_BrightCorner        GE_BrightCorner     62648         pos GE_BrightCorner     NA          NA
3     1      3   1      DarkCorner             DarkCorner     62320         pos      DarkCorner     NA          NA
4     1      4   1      DarkCorner             DarkCorner     61992         pos      DarkCorner     NA          NA

That once removed (dplyr::filter(ControlType == "false")), eliminates the problem.

Thanks again and sorry for not catching this earlier.

0
Entering edit mode

GAL files are only for GenePix data and even then you don't need to read them. It is far better to read the annotation columns directly from the Agilent Feature Extraction files by setting the 'annotation' argument when using read.maimages(). Then the number of annotation rows would automatically agree with the number of rows of expression data and there would be no error.

Can I gently say that that this is why the Bioconductor Support Posting Guide asks people to show all the code leading to the error message that they are reporting. If you had shown the code, including reading the GAL file, then I would have been able to identify the problem two days ago.