Now i would like to select the differentially expressed genes with creating a model.matrix and then do the lmFit but do i have to do the step ExpressionSetillumina? if i do how do i do do it , with the class Elist? If id ont how do i continue??
You can answer this question yourself by simply reading the help page for topTable(). That's the first thing you should do when you get a result you don't understand. Here are your hints:
Arguments:
fit: list containing a linear model fit produced by lmFit,
lm.series, gls.series or mrlm. For topTable, it
should be an object of class MArrayLM as produced by
lmFit and eBayes.
coef: column number or column name specifying which coefficient or
contrast of the linear model is of interest. For topTable,
can also be a vector of column subscripts, in which case the
gene ranking is by F-statistic for that set of contrasts.
Question: Did you specify a coefficient?
Further, under the Details section:
topTableF ranks genes on the basis of moderated F-statistics for
one or more coefficients. If topTable is called and coef has
two or more elements, then the specified columns will be extracted
from fit and topTableF called on the result. topTable with
coef=NULL is the same as topTableF, unless the fitted model
fit has only one column.
The way you have specified your design matrix sets (I'm guessing here) cellineKBR3-R 0.1 µM Lapatinib as the baseline (intercept term), and all the other coefficients are estimating the log fold change between the given sample type and the baseline. In other words, the second coefficient is estimating the difference between 1 µM Lapatinib and 0.1 µM Lapatinib in KBR3-R cells. That is probably not what you want, especially given the two different cell lines. You should probably instead fit a model without an intercept.
And then use a contrasts matrix to make whatever comparisons you care about. This is all covered in the limma User's Guide, which if you plan on doing analyses yourself you should read quite thoroughly.
Thanks: so is this right then (to get what i want a list of differentialy expressed genes:
# select differentially expressed genes
design_matrix <- as.data.frame(unlist(lapply(colnames(data_norm), function(x) substr(x,1,(nchar(x)-2)))))
colnames(design_matrix) <- "cellline"
rownames(design_matrix) <- colnames(data_norm)
design <- model.matrix(~design_matrix$cellline)
fit <- lmFit(data_norm, design)
fit2 <- eBayes(fit, trend=TRUE)
# plot & write out results
results <- decideTests(fit2)
colnames(fit2)
[1] "(Intercept)" "design_matrix$celllineKBR3-R 1uM Lapatinib"
[3] "design_matrix$celllineKBR3-R Control" "design_matrix$celllineKBR3 0.1uM Lapatinib"
[5] "design_matrix$celllineKBR3 1uM Lapatinib" "design_matrix$celllineKBR3 Control"
#So there is six columns, six samples (x 3 biological repeats) the "(Intercept)"= KBR3-R 0,1 uM Lapatinib and because of six samples coef=6 (?)
top <- topTable(fit2, number=length(data_norm[,1]), sort.by="t", coef= 6)
Or is it coef=5 cause when it said error: "Removing intercept from test coefficients..." cause it takes out the intercepot column
The way you have specified your design matrix sets (I'm guessing here) cellineKBR3-R 0.1 µM Lapatinib as the baseline (intercept term), and all the other coefficients are estimating the log fold change between the given sample type and the baseline. In other words, the second coefficient is estimating the difference between 1 µM Lapatinib and 0.1 µM Lapatinib in KBR3-R cells. That is probably not what you want, especially given the two different cell lines. You should probably instead fit a model without an intercept.
And then use a contrasts matrix to make whatever comparisons you care about. This is all covered in the limma User's Guide, which if you plan on doing analyses yourself you should read quite thoroughly.