Search
Question: differential gene expression
0
4.2 years ago by
Barcelona
Pau Marc Muñoz Torres20 wrote:

Hi everybody.

I'm trying to study differential gene expression in an affymetrix array. My data is composed by 1 array with 4 cells, each one with different conditions. I'm interested on compare all cells in pairs, i.e if I have cels a b c d i want to compare ab, ac, ad, bc, bd and cd.

looking at bioconductor documentation i thought that the best approach was to do the following script for each pair (LV1LV2 is a pair in this case).

library(affy)
library(affycoretools)
library(limma)
LV1LV2 <- c("20050622_Fibro_LV1_U133A_ov.CEL","20050622_Fibro_LV2_U133A_ov.CEL")
SLV1LV2=c("LV1","LV2")
DataLV1LV2 <-ReadAffy(filenames=LV1LV2,sampleNames =  SLV1LV2)
hist(DataLV1LV2)
legend("topright", gsub("\\.[Cc][Ee][Ll]", "",sampleNames(DataLV1LV2)) , col = 1:ncol(DataLV1LV2), lty= 1:ncol(DataLV1LV2))
savePlot(filename="histoLV1LV2.png",type="png")

rLV1LV2=rma(DataLV1LV2)

#Diferencial expression for LV1LV2

GroupLV1LV2 <- factor(pData(DataLV1LV2)[,1 ] , levels = levels(pData(DataLV1LV2)[,1]))
designLV1LV2<- model.matrix(~GroupLV1LV2)
fitLV1LV2 <-lmFit(rLV1LV2,designLV1LV2)
fLV1LV2 <-eBayes(fitLV1LV2 )
tabLV1LV2 <- topTable(fitLV1LV2, coef = 2, adjust = "fdr", n = 50)
write.table(tabLV1LV2 ,file="DEGLV1LV2.xls",row.names=F, sep="\t")
write.csv(tabLV1LV2, file = "tabLV1LV250.csv",row.names=TRUE)
selectedLV1LV2<- p.adjust(fitLV1LV2$p.value[, 1]) <0.05 esetSelLV1LV2 <- eset [selectedLV1LV2, ] write.csv(esetSelLV1LV2, file = "SIGNIFICANTLV1LV250.csv",row.names=TRUE) heatmap(exprs(esetSelLV1LV2)) savePlot(filename="heatmapDataLV1LV2.png",type="png") After fLV1LV2 <-eBayes(fitLV1LV2 ), I get the following error : Error in ebayes(fit = fit, proportion = proportion, stdev.coef.lim = stdev.coef.lim, : No residual degrees of freedom in linear model fits  which seems to be caused by : • not passing the data properly • having only two samples. when i look at fitLV1LV2 summary i get summary(fitLV1LV2$coef)

(Intercept) GroupLV1LV2LV2

Min. : 3.727 Min. :-2.78770

1st Qu.: 5.161 1st Qu.:-0.20358

Median : 6.379 Median :-0.07446

Mean : 6.730 Mean :-0.03792

3rd Qu.: 7.901 3rd Qu.: 0.07528

Max. :14.714 Max. : 2.10817

> summary(fitLV1LV2$df.residual) Min. 1st Qu. Median Mean 3rd Qu. Max. 0 0 0 0 0 0 > summary(fitLV1LV2$sigma)

Mode NA's

logical 22283

I think that the approach I'm doing to performe the analysis is wrong. There is some way to, upladin all  4 cells in an object, do the comparations by pairs (pe data<-readAffy() and then do comparations) or a better approach to do it?

thanks

modified 4.1 years ago by Diego Diez730 • written 4.2 years ago by Pau Marc Muñoz Torres20
1

It's too much of a pain to parse your code to decipher this answer for myself, but do you have any replicates per condition? If your experiment has no replicates, I'm pretty sure that this (or a similar) error is what you'll get.

5
4.2 years ago by
Sean Davis21k
United States
Sean Davis21k wrote:

Since you have no replication, you essentially have no way of doing a hypothesis test.  In statistical terms, this is referred to as "No residual degrees of freedom".  To answer your last question, probably the easiest approach is to rank your probesets by pairwise fold change (no p-values or other statistical testing).

0
4.1 years ago by
Diego Diez730
Japan
Diego Diez730 wrote:

I would add to Sean's answer that it is also important here to consider the expression level (e.g. the average log2 expression, A) in order to avoid selecting too many false positives. Non expressed genes show large stochastic variation and tend to show high fold changes. An MA plot can be very helpful in this situation to know where to put the cutoff.