estimateSizeFactors first or later?
1
0
Entering edit mode
@mehmet-ilyas-cosacak-9020
Last seen 6.7 years ago
Germany/Dresden/ CRTD - DZNE

Hi,

I have 3 replicates of 4 different samples sequenced by RNA-Seq (Untreated_GFP(-), Untreated_GFP(+), Treated_GFP(-) and Treated_GFP(+)) and I would like compare in 4 different ways;

(1) Untreated_GFP(+) vs Untreated_GFP(-)

(2) Treated_GFP(+) vs Treated_GFP(-)

(3) Treated_GFP(+) vs Untreated_GFP(-)

(4) Untreated_GFP(-) vs Treated_GFP(-).

I have different results when I first use esimateSizeFactors then run DESeq, compared if I run first DESeq then estimateSizeFactors. 

a part of the R script is as below:

#-----------------------------------------------------------------------#

input_file <- "fearureCountsTable.txt"; 
numberOfReplicates <- 3
all_conditions <- rep(c("Untreated_GFP_Neg","Treated_GFP_Pos","Untreated_GFP_Neg","Treated_GFP_Pos"),numberOfReplicates)

DataFromInputFile <- read.table(input_file, sep="\t", header = T);

numberOfColumns <- ncol(DataFromInputFile);
colnames(DataFromInputFile)[1] <- "Geneid";
countdata <- DataFromInputFile[,c(1,7:numberOfColumns)];
colnames(countdata) <- gsub("L\\d+_","",colnames(countdata));
colnames(countdata) <- gsub(".bam$","",colnames(countdata));
newMatrix_countdata <- convertDataFrameToMatrix(countdata)
coldata <- data.frame(row.names = colnames(newMatrix_countdata),condition = as.factor(all_conditions))
ddsFull <- DESeqDataSetFromMatrix(countData = newMatrix_countdata, colData = coldata, design = ~ condition)

ddsFull  <- estimateSizeFactors(ddsFull)  # ????

dds <- ddsFull[,colData(ddsFull)$condition %in% c("Treated_GFP_Neg","Treated_GFP_Pos")]
dds$condition <- droplevels(dds$condition)
dds$condition <- relevel(dds$condition,"Treated_GFP_Neg")
dds <- DESeq(dds)

#-----------------------------------------------------------------------#

Which way is beter? If there are any similar questions with answers, I will appreciate it if you could direct me.

best,

ilyas.

estimatesizefactors deseq2 • 2.2k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

Size factor estimation depends on the samples in the count matrix.

So if you estimate size factors, then remove samples and re-estimate the size factors they will not be the same. It's hard to say which is the 'correct' one.

Our standard workflow is just to have users run DESeq(), which does all the estimation automatically internally.

ADD COMMENT
0
Entering edit mode

Also, in case you are not aware, another option (maybe more typical) is to run the analysis altogether, and use 'contrast' to compare groups (see ?results for more information).

dds <- DESeq(dds)
results(dds, contrast=c("condition","B","A"))
results(dds, contrast=c("condition","D","C"))

etc.

This will not give the same results as running pairs of groups through DESeq(). Again, it's hard to say which way is 'correct', it's the difference between estimating a set of parameters for the whole experiment or for pairs at a time.

ADD REPLY
0
Entering edit mode

I think it would be better first estimating the size factors and then do further analyses, but of course I do not have a clear/reasonable explanation for that.

Thank you very much Michael.

ADD REPLY

Login before adding your answer.

Traffic: 818 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6