Question: differential gene expression
gravatar for Pau Marc Muñoz Torres
3.9 years ago by
Pau Marc Muñoz Torres20 wrote:

Hi everybody.


 I'm trying to study differential gene expression in an affymetrix array. My data is composed by 1 array with 4 cells, each one with different conditions. I'm interested on compare all cells in pairs, i.e if I have cels a b c d i want to compare ab, ac, ad, bc, bd and cd.


 looking at bioconductor documentation i thought that the best approach was to do the following script for each pair (LV1LV2 is a pair in this case).

LV1LV2 <- c("20050622_Fibro_LV1_U133A_ov.CEL","20050622_Fibro_LV2_U133A_ov.CEL")
DataLV1LV2 <-ReadAffy(filenames=LV1LV2,sampleNames =  SLV1LV2)
pData(DataLV1LV2)<-read.table("pLV1LV2.txt", header=T,row.names=1, sep="\t")
legend("topright", gsub("\\.[Cc][Ee][Ll]", "",sampleNames(DataLV1LV2)) , col = 1:ncol(DataLV1LV2), lty= 1:ncol(DataLV1LV2))


#Diferencial expression for LV1LV2

GroupLV1LV2 <- factor(pData(DataLV1LV2)[,1 ] , levels = levels(pData(DataLV1LV2)[,1]))
designLV1LV2<- model.matrix(~GroupLV1LV2)
fitLV1LV2 <-lmFit(rLV1LV2,designLV1LV2)
fLV1LV2 <-eBayes(fitLV1LV2 )
tabLV1LV2 <- topTable(fitLV1LV2, coef = 2, adjust = "fdr", n = 50)
write.table(tabLV1LV2 ,file="DEGLV1LV2.xls",row.names=F, sep="\t")
write.csv(tabLV1LV2, file = "tabLV1LV250.csv",row.names=TRUE)
selectedLV1LV2<- p.adjust(fitLV1LV2$p.value[, 1]) <0.05
esetSelLV1LV2 <- eset [selectedLV1LV2, ]
write.csv(esetSelLV1LV2, file = "SIGNIFICANTLV1LV250.csv",row.names=TRUE)


After fLV1LV2 <-eBayes(fitLV1LV2 ), I get the following error :

Error in ebayes(fit = fit, proportion = proportion, stdev.coef.lim = stdev.coef.lim, :

No residual degrees of freedom in linear model fits 


which seems to be caused by :


  • not passing the data properly
  • having only two samples.


when i look at fitLV1LV2 summary i get




(Intercept) GroupLV1LV2LV2

Min. : 3.727 Min. :-2.78770

1st Qu.: 5.161 1st Qu.:-0.20358

Median : 6.379 Median :-0.07446

Mean : 6.730 Mean :-0.03792

3rd Qu.: 7.901 3rd Qu.: 0.07528

Max. :14.714 Max. : 2.10817

> summary(fitLV1LV2$df.residual)

Min. 1st Qu. Median Mean 3rd Qu. Max.

0 0 0 0 0 0

> summary(fitLV1LV2$sigma)

Mode NA's

logical 22283


I think that the approach I'm doing to performe the analysis is wrong. There is some way to, upladin all  4 cells in an object, do the comparations by pairs (pe data<-readAffy() and then do comparations) or a better approach to do it?



ADD COMMENTlink modified 3.9 years ago by Diego Diez730 • written 3.9 years ago by Pau Marc Muñoz Torres20

It's too much of a pain to parse your code to decipher this answer for myself, but do you have any replicates per condition? If your experiment has no replicates, I'm pretty sure that this (or a similar) error is what you'll get.

ADD REPLYlink written 3.9 years ago by Steve Lianoglou12k
gravatar for Sean Davis
3.9 years ago by
Sean Davis21k
United States
Sean Davis21k wrote:

Since you have no replication, you essentially have no way of doing a hypothesis test.  In statistical terms, this is referred to as "No residual degrees of freedom".  To answer your last question, probably the easiest approach is to rank your probesets by pairwise fold change (no p-values or other statistical testing).

ADD COMMENTlink written 3.9 years ago by Sean Davis21k
gravatar for Diego Diez
3.9 years ago by
Diego Diez730
Diego Diez730 wrote:

I would add to Sean's answer that it is also important here to consider the expression level (e.g. the average log2 expression, A) in order to avoid selecting too many false positives. Non expressed genes show large stochastic variation and tend to show high fold changes. An MA plot can be very helpful in this situation to know where to put the cutoff.

ADD COMMENTlink written 3.9 years ago by Diego Diez730
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 284 users visited in the last hour