Question

How to get a list of differentialy expressed genes with data from Illumina array?

0

Entering edit mode

kanacska ▴ 10

@kanacska-7375

Last seen 8.6 years ago

Hungary

Hi,

I want to do a list for differencialy expressed genes with txt data from this experiment: GSE38376

I downloaded the txt file read it in R and normalized it:

> idata <- read.ilmn("GSE38376_non-normalized.txt",probeid = "PROBE_ID",expr="SKBR")

> data_nobg <- backgroundCorrect(idata,method="normexp")

> data_norm <- normalizeBetweenArrays(data_nobg,method="quantile")

data_ norm is in Elist class

Now i would like to select the differentially expressed genes with creating a model.matrix and then do the lmFit but do i have to do the step ExpressionSetillumina? if i do how do i do do it , with the class Elist? If id ont how do i continue??

Thanks,

Anna

limma illumina microarray lmfit differential gene expression • 2.4k views

ADD COMMENT • link updated 8.7 years ago by James W. MacDonald 65k • written 8.7 years ago by kanacska ▴ 10

score 2 · Answer 1 · 2015-08-05

2

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 10 hours ago

United States

You can just define a design matrix and fit a model using the limma package. The EList class is part of limma, so will be handled appropriately.

ADD COMMENT • link 8.7 years ago James W. MacDonald 65k

score 1 · Answer 2 · 2015-08-05

1

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 10 hours ago

United States

You can answer this question yourself by simply reading the help page for topTable(). That's the first thing you should do when you get a result you don't understand. Here are your hints:

Arguments:

     fit: list containing a linear model fit produced by lmFit,
          lm.series, gls.series or mrlm.  For topTable, it
          should be an object of class MArrayLM as produced by
         lmFit and eBayes.

    coef: column number or column name specifying which coefficient or
          contrast of the linear model is of interest. For topTable,
          can also be a vector of column subscripts, in which case the
          gene ranking is by F-statistic for that set of contrasts.

Question: Did you specify a coefficient?

Further, under the Details section:

     topTableF ranks genes on the basis of moderated F-statistics for
     one or more coefficients. If topTable is called and coef has
     two or more elements, then the specified columns will be extracted
     from fit and topTableF called on the result. topTable with
     coef=NULL is the same as topTableF, unless the fitted model
     fit has only one column.

ADD COMMENT • link 8.7 years ago James W. MacDonald 65k

0

Entering edit mode

Thanks: so is this right then (to get what i want a list of differentialy expressed genes:

# select differentially expressed genes
design_matrix <- as.data.frame(unlist(lapply(colnames(data_norm), function(x) substr(x,1,(nchar(x)-2)))))
colnames(design_matrix) <- "cellline"
rownames(design_matrix) <- colnames(data_norm)
design <- model.matrix(~design_matrix$cellline)
fit <- lmFit(data_norm, design)
fit2 <- eBayes(fit, trend=TRUE)

# plot & write out results
results <- decideTests(fit2)

colnames(fit2)
[1] "(Intercept)" "design_matrix$celllineKBR3-R 1uM Lapatinib"
[3] "design_matrix$celllineKBR3-R Control" "design_matrix$celllineKBR3 0.1uM Lapatinib"
[5] "design_matrix$celllineKBR3 1uM Lapatinib" "design_matrix$celllineKBR3 Control"

#So there is six columns, six samples (x 3 biological repeats) the "(Intercept)"= KBR3-R 0,1 uM Lapatinib and because of six samples coef=6 (?)

top <- topTable(fit2, number=length(data_norm[,1]), sort.by="t", coef= 6)

ADD REPLY • link 8.7 years ago kanacska ▴ 10

0

Entering edit mode

Or is it coef=5 cause when it said error: "Removing intercept from test coefficients..." cause it takes out the intercepot column

ADD REPLY • link 8.7 years ago kanacska ▴ 10

0

Entering edit mode

The way you have specified your design matrix sets (I'm guessing here) cellineKBR3-R 0.1 µM Lapatinib as the baseline (intercept term), and all the other coefficients are estimating the log fold change between the given sample type and the baseline. In other words, the second coefficient is estimating the difference between 1 µM Lapatinib and 0.1 µM Lapatinib in KBR3-R cells. That is probably not what you want, especially given the two different cell lines. You should probably instead fit a model without an intercept.

design <- model.matrix(~ 0 + celline, design_matrix)

And then use a contrasts matrix to make whatever comparisons you care about. This is all covered in the limma User's Guide, which if you plan on doing analyses yourself you should read quite thoroughly.

ADD REPLY • link 8.7 years ago James W. MacDonald 65k

James W. MacDonald · Answer 3 · 2015-08-05

Thanks i did it:

design_matrix <- as.data.frame(unlist(lapply(colnames(data_norm), function(x) substr(x,1,(nchar(x)-2)))))
colnames(design_matrix) <- "cellline"
rownames(design_matrix) <- colnames(data_norm)
design <- model.matrix(~design_matrix$cellline)
fit <- lmFit(data_norm, design)
fit2 <- eBayes(fit, trend=TRUE)

Now i want to do a Table of Top Genes from Linear Model Fit:

> top <- topTable(fit2, number=length(data_norm[,1]), sort.by="t")

But i get an error message
Removing intercept from test coefficients
Error in match.argsort.by, c("F", "none")) :
'arg' should be one of “F”, “none”

why doesnt it work?