Question: How to get a list of differentialy expressed genes with data from Illumina array?
0
gravatar for kanacska
3.7 years ago by
kanacska10
Hungary
kanacska10 wrote:

Hi,

I want to do a list for differencialy expressed genes with txt data from this experiment: GSE38376  

I downloaded the txt file read it in R and normalized it:

> idata <- read.ilmn("GSE38376_non-normalized.txt",probeid = "PROBE_ID",expr="SKBR")

> data_nobg <- backgroundCorrect(idata,method="normexp")

> data_norm <- normalizeBetweenArrays(data_nobg,method="quantile")

data_ norm is in Elist class

Now i would like to select the differentially expressed genes with creating a model.matrix and then do the lmFit but do i have to do the step ExpressionSetillumina? if i do how do i do do it , with the class Elist? If id ont how do i continue??

 

Thanks, 

Anna

 

ADD COMMENTlink modified 3.7 years ago by James W. MacDonald49k • written 3.7 years ago by kanacska10
Answer: How to get a list of differentialy expressed genes with data from Illumina array
2
gravatar for James W. MacDonald
3.7 years ago by
United States
James W. MacDonald49k wrote:

You can just define a design matrix and fit a model using the limma package. The EList class is part of limma, so will be handled appropriately.

ADD COMMENTlink written 3.7 years ago by James W. MacDonald49k
Answer: How to get a list of differentialy expressed genes with data from Illumina array
1
gravatar for James W. MacDonald
3.7 years ago by
United States
James W. MacDonald49k wrote:

You can answer this question yourself by simply reading the help page for topTable(). That's the first thing you should do when you get a result you don't understand. Here are your hints:

Arguments:

     fit: list containing a linear model fit produced by lmFit,
          lm.series, gls.series or mrlm.  For topTable, it
          should be an object of class MArrayLM as produced by
         lmFit and eBayes.

    coef: column number or column name specifying which coefficient or
          contrast of the linear model is of interest. For topTable,
          can also be a vector of column subscripts, in which case the
          gene ranking is by F-statistic for that set of contrasts.

Question: Did you specify a coefficient?

Further, under the Details section:

     topTableF ranks genes on the basis of moderated F-statistics for
     one or more coefficients. If topTable is called and coef has
     two or more elements, then the specified columns will be extracted
     from fit and topTableF called on the result. topTable with
     coef=NULL is the same as topTableF, unless the fitted model
     fit has only one column.

 

 

ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by James W. MacDonald49k

Thanks: so is this right then (to get what i want a list of differentialy expressed genes:

# select differentially expressed genes
design_matrix <- as.data.frame(unlist(lapply(colnames(data_norm), function(x) substr(x,1,(nchar(x)-2)))))
colnames(design_matrix) <- "cellline"
rownames(design_matrix) <- colnames(data_norm)
design <- model.matrix(~design_matrix$cellline)
fit <- lmFit(data_norm, design)
fit2 <- eBayes(fit, trend=TRUE)

# plot & write out results
results <- decideTests(fit2)

colnames(fit2)
[1] "(Intercept)"                                "design_matrix$celllineKBR3-R 1uM Lapatinib"
[3] "design_matrix$celllineKBR3-R Control"       "design_matrix$celllineKBR3 0.1uM Lapatinib"
[5] "design_matrix$celllineKBR3 1uM Lapatinib"   "design_matrix$celllineKBR3 Control"

#So there is six columns, six samples (x 3 biological repeats) the "(Intercept)"= KBR3-R 0,1 uM Lapatinib and because of six samples coef=6 (?)

top <- topTable(fit2, number=length(data_norm[,1]), sort.by="t", coef= 6)

 

ADD REPLYlink modified 3.7 years ago • written 3.7 years ago by kanacska10

Or is it coef=5 cause when it said error: "Removing intercept from test coefficients..." cause it takes out the intercepot column

ADD REPLYlink written 3.7 years ago by kanacska10

The way you have specified your design matrix sets (I'm guessing here) cellineKBR3-R 0.1 µM Lapatinib as the baseline (intercept term), and all the other coefficients are estimating the log fold change between the given sample type and the baseline. In other words, the second coefficient is estimating the difference between 1 µM Lapatinib and 0.1 µM Lapatinib in KBR3-R cells. That is probably not what you want, especially given the two different cell lines. You should probably instead fit a model without an intercept.

design <- model.matrix(~ 0 + celline, design_matrix)

And then use a contrasts matrix to make whatever comparisons you care about. This is all covered in the limma User's Guide, which if you plan on doing analyses yourself you should read quite thoroughly.

ADD REPLYlink written 3.7 years ago by James W. MacDonald49k
Answer: How to get a list of differentialy expressed genes with data from Illumina array
0
gravatar for kanacska
3.7 years ago by
kanacska10
Hungary
kanacska10 wrote:

Thanks i did it:

 

design_matrix <- as.data.frame(unlist(lapply(colnames(data_norm), function(x) substr(x,1,(nchar(x)-2)))))
colnames(design_matrix) <- "cellline"
rownames(design_matrix) <- colnames(data_norm)
design <- model.matrix(~design_matrix$cellline)
fit <- lmFit(data_norm, design)
fit2 <- eBayes(fit, trend=TRUE)

Now i want to do a Table of Top Genes from Linear Model Fit:

 

> top <- topTable(fit2, number=length(data_norm[,1]), sort.by="t")

But i get an error message
Removing intercept from test coefficients
Error in match.argsort.by, c("F", "none")) : 
  'arg' should be one of “F”, “none”

why doesnt it work?

ADD COMMENTlink modified 3.7 years ago by James W. MacDonald49k • written 3.7 years ago by kanacska10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 128 users visited in the last hour