Question

estimateSizeFactors first or later?

0

Entering edit mode

Mehmet Ilyas Cosacak • 0

@mehmet-ilyas-cosacak-9020

Last seen 7.1 years ago

Germany/Dresden/ CRTD - DZNE

Hi,

I have 3 replicates of 4 different samples sequenced by RNA-Seq (Untreated_GFP(-), Untreated_GFP(+), Treated_GFP(-) and Treated_GFP(+)) and I would like compare in 4 different ways;

(1) Untreated_GFP(+) vs Untreated_GFP(-)

(2) Treated_GFP(+) vs Treated_GFP(-)

(3) Treated_GFP(+) vs Untreated_GFP(-)

(4) Untreated_GFP(-) vs Treated_GFP(-).

I have different results when I first use esimateSizeFactors then run DESeq, compared if I run first DESeq then estimateSizeFactors.

a part of the R script is as below:

#-----------------------------------------------------------------------#

input_file <- "fearureCountsTable.txt"; 
numberOfReplicates <- 3
all_conditions <- rep(c("Untreated_GFP_Neg","Treated_GFP_Pos","Untreated_GFP_Neg","Treated_GFP_Pos"),numberOfReplicates)

DataFromInputFile <- read.table(input_file, sep="\t", header = T);

numberOfColumns <- ncol(DataFromInputFile);
colnames(DataFromInputFile)[1] <- "Geneid";
countdata <- DataFromInputFile[,c(1,7:numberOfColumns)];
colnames(countdata) <- gsub("L\\d+_","",colnames(countdata));
colnames(countdata) <- gsub(".bam$","",colnames(countdata));
newMatrix_countdata <- convertDataFrameToMatrix(countdata)
coldata <- data.frame(row.names = colnames(newMatrix_countdata),condition = as.factor(all_conditions))
ddsFull <- DESeqDataSetFromMatrix(countData = newMatrix_countdata, colData = coldata, design = ~ condition)

ddsFull  <- estimateSizeFactors(ddsFull)  # ????

dds <- ddsFull[,colData(ddsFull)$condition %in% c("Treated_GFP_Neg","Treated_GFP_Pos")]
dds$condition <- droplevels(dds$condition)
dds$condition <- relevel(dds$condition,"Treated_GFP_Neg")
dds <- DESeq(dds)

#-----------------------------------------------------------------------#

Which way is beter? If there are any similar questions with answers, I will appreciate it if you could direct me.

best,

ilyas.

estimatesizefactors deseq2 • 2.3k views

ADD COMMENT • link updated 9.5 years ago by Michael Love 43k • written 9.5 years ago by Mehmet Ilyas Cosacak • 0

score 0 · Answer 1 · 2015-10-20

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

Size factor estimation depends on the samples in the count matrix.

So if you estimate size factors, then remove samples and re-estimate the size factors they will not be the same. It's hard to say which is the 'correct' one.

Our standard workflow is just to have users run DESeq(), which does all the estimation automatically internally.

ADD COMMENT • link 9.5 years ago Michael Love 43k

0

Entering edit mode

Also, in case you are not aware, another option (maybe more typical) is to run the analysis altogether, and use 'contrast' to compare groups (see ?results for more information).

dds <- DESeq(dds)
results(dds, contrast=c("condition","B","A"))
results(dds, contrast=c("condition","D","C"))

etc.

This will not give the same results as running pairs of groups through DESeq(). Again, it's hard to say which way is 'correct', it's the difference between estimating a set of parameters for the whole experiment or for pairs at a time.

ADD REPLY • link 9.5 years ago Michael Love 43k

0

Entering edit mode

I think it would be better first estimating the size factors and then do further analyses, but of course I do not have a clear/reasonable explanation for that.

Thank you very much Michael.

ADD REPLY • link 9.5 years ago Mehmet Ilyas Cosacak • 0