time course analysis using LIMMA

0

Entering edit mode

Assa Yeroslaviz ★ 1.5k

@assa-yeroslaviz-1597

Last seen 22 days ago

Germany

Hello everybody, I'm have a set of CEL files of six time points and I would like to use LIMMA to analyse to differences in time between a treatment and the control arrays. I'm reading the data from an excel sheet and create an object for each time point and treatment level. > data_1.0 <- read.delim(file="stress_1.0.txt", row.names=1) > dim(data_1.0) 22810 51 I than combined the various time points and treatments into one data frame with > xxx <- cbind(log_control_1.0_1h, log_control_1.0_3h, log_control_1.0_6h, log_control_1.0_9h, log_control_1.0_24h, log_control_1.0_48h, log_xxx_1.0_1h, log_xxx_1.0_3h, log_xxx_1.0_6h, log_xxx_1.0_9h, log_xxx_1.0_24h, log_xxx_1.0_48h) > row.names(xxx) <- row.names(data_1.0) I have a different number of array for the various time points. 1. Can that cause a problem in the analysis? I'm having difficulties in creating the correct design and contrast matrices. I tried : >d <- factor(c("c1","c1","c1","c1","c3","c3","c3","c3","c3","c3","c6","c6"," c6","c6","c6","c6", "c9","c9","c9","c24","c24","c24","c24","c48","c48","t1","t1","t1","t1" , "t3","t3","t3","t3","t3","t3","t3","t6","t6","t6","t6","t6","t9","t9", "t9","t9", "t24","t24","t24","t24","t48","t48"), levels=c("c1","c3","c6","c9","c24","c48","t1","t3","t6","t9","t24","t4 8")) #Design matrix for stress > design <- model.matrix(~0+d) # line in the table with NaN values (NA) were taken out - 4 lines of house keeoping genes > fit_limma<- lmFit(xxx, design) #which genes responds either at 1h, 3h,6h,9h,24h or 48h to treatment? > contrastoftreatment <- makeContrasts("dt1-dc1","dt3-dc3","dt6-dc6","dt9-dc9","dt24-dc24","dt4 8-dc48" ,levels=design) But after a lot of thoughts I found out that I'm basically only searching for here for differentially regulated genes between treated and control experiments in a specific time point. so I created a terget files: Name fileName Target Time xxx_0_1h_1 xxx_0_1h_1.CEL wt.1h1 1 xxx_0_1h_2 xxx_0_1h_2.CEL wt.1h2 1 xxx_0_1h_3 xxx_0_1h_3.CEL wt.1h3 1 xxx_0_1h_4 xxx_0_1h_4.CEL wt.1h4 1 xxx_0_24h_1 xxx_0_24h_1.CEL wt.24h1 24 xxx_0_24h_2 xxx_0_24h_2.CEL wt.24h2 24 xxx_0_24h_3 xxx_0_24h_3.CEL wt.24h3 24 xxx_0_24h_4 xxx_0_24h_4.CEL wt.24h4 24 xxx_0_3h_1 xxx_0_3h_1.CEL wt.3h1 3 xxx_0_3h_2 xxx_0_3h_2.CEL wt.3h2 3 xxx_0_3h_3 xxx_0_3h_3.CEL wt.3h3 3 xxx_0_3h_5 xxx_0_3h_5.CEL wt.3h4 3 xxx_0_3h_6 xxx_0_3h_6.CEL wt.3h5 3 xxx_0_3h_7 xxx_0_3h_7.CEL wt.3h6 3 xxx_0_48h_1 xxx_0_48h_1.CEL wt.48h1 48 xxx_0_48h_2 xxx_0_48h_2.CEL wt.48h2 48 xxx_0_6h_1 xxx_0_6h_1.CEL wt.6h1 6 xxx_0_6h_2 xxx_0_6h_2.CEL wt.6h2 6 xxx_0_6h_3 xxx_0_6h_3.CEL wt.6h3 6 xxx_0_6h_4 xxx_0_6h_4.CEL wt.6h4 6 xxx ... (you get the idea) and used it to analyse the data according to the differences between the treated and the control experiments. > targets <-readTargets("targets.txt",row.names="Name",quote="") > lev <- c("wt.1h", "wt.24h", "wt.3h", "wt.48h", "wt.6h", "wt.9h", "mu.1h", "mu.24h", "mu.3h", "mu.48h", "mu.6h", "mu.9h") > f<-factor(targets$Target, levels=lev) > design_1 <- model.matrix(~0+f) > colnames(design_1) <- lev > fit_1 <- lmFit(xxx, design_1) # searching for the genes with time effects. > cont.wt <- makeContrasts("wt.3h-wt.1h","wt.6h-wt.3h","wt.9h-wt.6h", "wt.24h-wt.9h","wt.48h-wt.24h", "mu.3h-mu.1h", "mu.6h-mu.3h", "mu.9h- mu.6h", "mu.24h-mu.9h","mu.48h-mu.24h", levels=design_1) > fit_2 <-contrasts.fit(fit_1, cont.wt) > fit_3 <- eBayes(fit_2) > topTable(fit_3, adjust="BH") > sel.wt <- p.adjust(fit_3$F.p.value, method="fdr")<0.01 > cont.diff <- makeContrasts(diff3h=(mu.3h-mu.1h) - (wt.3h-wt.1h), diff6h=(mu.6h-mu.3h) - (wt.6h-wt.3h), diff9h=(mu.9h-mu.6h) - (wt.9h-wt.6h), diff24h=(mu.24h-mu.9h) - (wt.24h-wt.9h), diff48h=(mu.48h-mu.24h) - (wt.48h-wt.24h), levels=design_1) > fit_diff <-contrasts.fit(fit_1, cont.diff) > fit_diff1 <- eBayes(fit_diff) with this command I'm extracting the genes which are differentially regulated between the 3h and the 1h time points with thresholds p<0.01 and a 2 fold change. > topGenes <- topTable(fit_diff1, coef=1, number=3000, adjust="BH", p.value=0.01, lfc=2, sort.by="logFC") I'd have a look at the resulting genes, but somehow I didn't get the feeling that it now does what I wanted it to do, so my main question is - Is the experimental design I made here on the second part correct for finding significantly differentially regulated genes in a time course analysis between different time points? Is there a way of extrapolating this kind of contrast design so that I'm able to find only the genes which are differentially regulated in all time points with this experimetal design? Does such an analysis make any sense? THX for any kind of help. Assa Yeroslaviz -- [[alternative HTML version deleted]]

• 1.1k views

ADD COMMENT • link updated 15.6 years ago by Wei Shi ★ 3.6k • written 15.6 years ago by Assa Yeroslaviz ★ 1.5k

0

Entering edit mode

Wei Shi ★ 3.6k

@wei-shi-2183

Last seen 3 months ago

Australia/Melbourne/Olivia Newton-John …

Hi Assa: You need to normalize your data before carrying out differential expression analysis. If your data was from Affymetrix arrays, you might consider using gcrma to normalize your data. Moreover, you might use ReadAffy function in the gcRMA package to read your data instead of reading in them one by one using read.delim. Your experiment design in your second part of analysis looks fine. One way to find genes differentially regulated in all time points is to use decideTests function in the limma package. You can first find the set of DE genes for each time point using this function and then get the intersection of these sets. Below is sample code to do this: y <- decideTests(fit_1, p.value = 0.01, lfc = log2(2)) z <- apply(y!=0, 1, all) Cheers, Wei Assa Yeroslaviz wrote: > Hello everybody, > > I'm have a set of CEL files of six time points and I would like to use LIMMA > to analyse to differences in time between a treatment and the control > arrays. > > I'm reading the data from an excel sheet and create an object for each time > point and treatment level. > >> data_1.0 <- read.delim(file="stress_1.0.txt", row.names=1) >> dim(data_1.0) >> > 22810 51 > > I than combined the various time points and treatments into one data frame > with > >> xxx <- cbind(log_control_1.0_1h, log_control_1.0_3h, log_control_1.0_6h, >> > log_control_1.0_9h, > log_control_1.0_24h, log_control_1.0_48h, log_xxx_1.0_1h, log_xxx_1.0_3h, > log_xxx_1.0_6h, > log_xxx_1.0_9h, log_xxx_1.0_24h, log_xxx_1.0_48h) > >> row.names(xxx) <- row.names(data_1.0) >> > > I have a different number of array for the various time points. > 1. Can that cause a problem in the analysis? > > I'm having difficulties in creating the correct design and contrast > matrices. > I tried : > >> d <- >> > factor(c("c1","c1","c1","c1","c3","c3","c3","c3","c3","c3","c6","c6" ,"c6","c6","c6","c6", > "c9","c9","c9","c24","c24","c24","c24","c48","c48","t1","t1","t1","t 1", > "t3","t3","t3","t3","t3","t3","t3","t6","t6","t6","t6","t6","t9","t9 ","t9","t9", > "t24","t24","t24","t24","t48","t48"), > levels=c("c1","c3","c6","c9","c24","c48","t1","t3","t6","t9","t24"," t48")) > #Design matrix for stress > >> design <- model.matrix(~0+d) >> > # line in the table with NaN values (NA) were taken out - 4 lines of house > keeoping genes > >> fit_limma<- lmFit(xxx, design) >> > #which genes responds either at 1h, 3h,6h,9h,24h or 48h to treatment? > >> contrastoftreatment <- >> > makeContrasts("dt1-dc1","dt3-dc3","dt6-dc6","dt9-dc9","dt24-dc24","d t48-dc48" > ,levels=design) > > But after a lot of thoughts I found out that I'm basically only searching > for here for differentially regulated genes between treated and control > experiments in a specific time point. > > so I created a terget files: > Name fileName Target Time > xxx_0_1h_1 xxx_0_1h_1.CEL wt.1h1 1 > xxx_0_1h_2 xxx_0_1h_2.CEL wt.1h2 1 > xxx_0_1h_3 xxx_0_1h_3.CEL wt.1h3 1 > xxx_0_1h_4 xxx_0_1h_4.CEL wt.1h4 1 > xxx_0_24h_1 xxx_0_24h_1.CEL wt.24h1 24 > xxx_0_24h_2 xxx_0_24h_2.CEL wt.24h2 24 > xxx_0_24h_3 xxx_0_24h_3.CEL wt.24h3 24 > xxx_0_24h_4 xxx_0_24h_4.CEL wt.24h4 24 > xxx_0_3h_1 xxx_0_3h_1.CEL wt.3h1 3 > xxx_0_3h_2 xxx_0_3h_2.CEL wt.3h2 3 > xxx_0_3h_3 xxx_0_3h_3.CEL wt.3h3 3 > xxx_0_3h_5 xxx_0_3h_5.CEL wt.3h4 3 > xxx_0_3h_6 xxx_0_3h_6.CEL wt.3h5 3 > xxx_0_3h_7 xxx_0_3h_7.CEL wt.3h6 3 > xxx_0_48h_1 xxx_0_48h_1.CEL wt.48h1 48 > xxx_0_48h_2 xxx_0_48h_2.CEL wt.48h2 48 > xxx_0_6h_1 xxx_0_6h_1.CEL wt.6h1 6 > xxx_0_6h_2 xxx_0_6h_2.CEL wt.6h2 6 > xxx_0_6h_3 xxx_0_6h_3.CEL wt.6h3 6 > xxx_0_6h_4 xxx_0_6h_4.CEL wt.6h4 6 > xxx ... (you get the idea) > > and used it to analyse the data according to the differences between the > treated and the control experiments. > >> targets <-readTargets("targets.txt",row.names="Name",quote="") >> lev <- c("wt.1h", "wt.24h", "wt.3h", "wt.48h", "wt.6h", "wt.9h", >> > "mu.1h", "mu.24h", "mu.3h", "mu.48h", "mu.6h", "mu.9h") > >> f<-factor(targets$Target, levels=lev) >> design_1 <- model.matrix(~0+f) >> colnames(design_1) <- lev >> fit_1 <- lmFit(xxx, design_1) >> > # searching for the genes with time effects. > >> cont.wt <- makeContrasts("wt.3h-wt.1h","wt.6h-wt.3h","wt.9h-wt.6h", >> > "wt.24h-wt.9h","wt.48h-wt.24h", "mu.3h-mu.1h", "mu.6h-mu.3h", "mu .9h-mu.6h", > "mu.24h-mu.9h","mu.48h-mu.24h", levels=design_1) > > >> fit_2 <-contrasts.fit(fit_1, cont.wt) >> fit_3 <- eBayes(fit_2) >> topTable(fit_3, adjust="BH") >> sel.wt <- p.adjust(fit_3$F.p.value, method="fdr")<0.01 >> > > >> cont.diff <- makeContrasts(diff3h=(mu.3h-mu.1h) - (wt.3h-wt.1h), >> > diff6h=(mu.6h-mu.3h) - (wt.6h-wt.3h), > diff9h=(mu.9h-mu.6h) - (wt.9h-wt.6h), > diff24h=(mu.24h-mu.9h) - (wt.24h-wt.9h), > diff48h=(mu.48h-mu.24h) - (wt.48h-wt.24h), > levels=design_1) > >> fit_diff <-contrasts.fit(fit_1, cont.diff) >> fit_diff1 <- eBayes(fit_diff) >> > > with this command I'm extracting the genes which are differentially > regulated between the 3h and the 1h time points with thresholds p<0.01 and a > 2 fold change. > >> topGenes <- topTable(fit_diff1, coef=1, number=3000, adjust="BH", >> > p.value=0.01, lfc=2, sort.by="logFC") > > I'd have a look at the resulting genes, but somehow I didn't get the feeling > that it now does what I wanted it to do, so my main question is - Is the > experimental design I made here on the second part correct for finding > significantly differentially regulated genes in a time course analysis > between different time points? > > Is there a way of extrapolating this kind of contrast design so that I'm > able to find only the genes which are differentially regulated in all time > points with this experimetal design? > > Does such an analysis make any sense? > > THX for any kind of help. > > > Assa Yeroslaviz > > -- > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 15.6 years ago Wei Shi ★ 3.6k

Login before adding your answer.