Question

Differential expression of nomalised and log2 transformed data in limma

0

Entering edit mode

stinehedensted • 0

@stinehedensted-14017

Last seen 6.6 years ago

Denmark

Hello

I have normalized and log2 transformed RNA microarray data from GEO, and I want to do differential gene expression analysis with the limma package in R.

My dataset contains 25 samples of cancer og 25 samples of stroma. Each sample have 33297 gene expression values.

I've loaded my data as:

> Array_tumorandstroma <- read.table("25sT_og25T-samples.csv", header = T, sep = ";", stringsAsFactors = T, nrow = 100)

> Array_tumorandstroma2 <- Array_tumorandstroma[,-1]

> rownames(Array_tumorandstroma2) <- Array_tumorandstroma[,1]

Then I just started to design a model matrix. I am unsure, whether I can just do this right away?

> design <-model.matrix((~0+factor(c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2))))

> colnames(design) <- c("sT", "T")

> design
sT T
1 1 0
2 1 0
3 1 0
4 1 0
5 1 0
6 1 0
7 1 0
8 1 0
9 1 0
10 1 0
11 1 0
12 1 0
13 1 0
14 1 0
15 1 0
16 1 0
17 1 0
18 1 0
19 1 0
20 1 0
21 1 0
22 1 0
23 1 0
24 1 0
25 1 0
26 0 1
27 0 1
28 0 1
29 0 1
30 0 1
31 0 1
32 0 1
33 0 1
34 0 1
35 0 1
36 0 1
37 0 1
38 0 1
39 0 1
40 0 1
41 0 1
42 0 1
43 0 1
44 0 1
45 0 1
46 0 1
47 0 1
48 0 1
49 0 1
50 0 1
attr(,"assign")
[1] 1 1
attr(,"contrasts")
attr(,"contrasts")$`factor(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2))`
[1] "contr.treatment"

Is my matrix design correct? I am very insure about this, and nobody I my lab has experience in the limma package.

When I try to do the moderated t-test, I run into trouble:

> fit <- lm.fit(Array_tumorandstroma2, design)

Error in lm.fit(Array_tumorandstroma2, design) : incompatible dimensions

I guess, my matrix design does not fit the dimentions of my microarray dataset (Array_tumorandstroma2). And probably this is way the lm.fit function does not work. Can anyone please advise me to better matrix design?

I also tried an other matrix design, but that resulted in the same problem:

> design <- model.matrix((~0+factor(c(1,2))))

> colnames(design) <- c("sT", "T")

> design
sT T
1 1 0
2 0 1
attr(,"assign")
[1] 1 1
attr(,"contrasts")
attr(,"contrasts")$`factor(c(1, 2))`
[1] "contr.treatment"

> fit <- lm.fit(Array_tumorandstroma2, design)
Error in lm.fit(Array_tumorandstroma2, design) : incompatible dimensions

Thank you in advance for any help or suggestions!

/Stine

limma lmfit model.matrix differential gene expression • 1.7k views

ADD COMMENT • link 6.6 years ago stinehedensted • 0

1

Entering edit mode

It looks like you're using the wrong function. Limma provides 'lmFit', not the lm.fit function you're using, which is something entirely different. Also, it's probably worth turning Array_tumorandstroma2 into a numeric matrix, rather than the data.frame that read.table returns. And for safety, I'd find some way of deriving the factor describing T and sT programmatically from the column names of the table, rather than explicitly enumerating them, or at least enumerating them in a more readable (and debug-able) fashion, e.g. rep(c("cancer", "stroma"), times=c(25,25)) - if you're really sure that the samples come in blocks, with cancer first.

ADD REPLY • link 6.6 years ago Gavin Kelly ▴ 680

Steve Lianoglou · Answer 1 · 2017-09-25

Hi Gavin,

Thank you for your reply. You are totally right, I was using the wrong function. I have now turned the dataframe into a numeric matrix. I ordered all the samples in the csv file, so I know fore sure, that the 25 first samples are from cancer, and that the last 25 sampls are from stroma.

I ran into a new problem, about my design matrix. I get the error-message:

"row dimension of design doesn't match column dimension of data object"

The dimentions of my dataobject is anf design matrix is:

> dim(merge_mean)
[1] 100   2
> dim(design)
[1] 100   2

I don't understand what the problem is? Probably this is also due to poor understand of how the design-matrix works.

I will be very gratefull, for your advise.

My full coding is below,

//Stine

> #source("https://bioconductor.org/biocLite.R")

> #biocLite("limma")
> 
> #browseVignettes("limma")
> library("limma")

> #Import data
> setwd("U:/Speciale/Svitlana et al/Differential ekspression")

> Array_tumorandstroma <- read.table("25sT_og25T-samples.csv", header = T, sep = ";", stringsAsFactors = T, nrow = 100, dec = ",")

> Array_tumorandstroma2 <- Array_tumorandstroma[,-1]

> rownames(Array_tumorandstroma2) <- Array_tumorandstroma[,1]

> #Convert dataframe into matrix: 
> Array_tumorandstroma2 <- as.matrix(sapply(Array_tumorandstroma2,as.numeric))

> #Mean expressionvalue for cancer and stroma:
> mean_cancer <- rowMeans(Array_tumorandstroma2[,1:25])

> mean_stroma <- rowMeans(Array_tumorandstroma2[,26:50])

> merge_mean <- matrix(c(mean_cancer, mean_stroma), nrow = 100, ncol = 2)

> #Construct design matrix 
> design <- model.matrix((~0+factor(rep(1:2, each=50))))

> colnames(design) <- c("Cancer", "Stroma")

> #lm.fit bruges til at finde least? square model
> fit <- lmFit(merge_mean, design)
Error in lmFit(merge_mean, design) : 
  row dimension of design doesn't match column dimension of data object