Hi,
I have 3 replicates of 4 different samples sequenced by RNA-Seq (Untreated_GFP(-), Untreated_GFP(+), Treated_GFP(-) and Treated_GFP(+)) and I would like compare in 4 different ways;
(1) Untreated_GFP(+) vs Untreated_GFP(-)
(2) Treated_GFP(+) vs Treated_GFP(-)
(3) Treated_GFP(+) vs Untreated_GFP(-)
(4) Untreated_GFP(-) vs Treated_GFP(-).
I have different results when I first use esimateSizeFactors then run DESeq, compared if I run first DESeq then estimateSizeFactors.
a part of the R script is as below:
#-----------------------------------------------------------------------#
input_file <- "fearureCountsTable.txt"; numberOfReplicates <- 3 all_conditions <- rep(c("Untreated_GFP_Neg","Treated_GFP_Pos","Untreated_GFP_Neg","Treated_GFP_Pos"),numberOfReplicates) DataFromInputFile <- read.table(input_file, sep="\t", header = T); numberOfColumns <- ncol(DataFromInputFile); colnames(DataFromInputFile)[1] <- "Geneid"; countdata <- DataFromInputFile[,c(1,7:numberOfColumns)]; colnames(countdata) <- gsub("L\\d+_","",colnames(countdata)); colnames(countdata) <- gsub(".bam$","",colnames(countdata)); newMatrix_countdata <- convertDataFrameToMatrix(countdata) coldata <- data.frame(row.names = colnames(newMatrix_countdata),condition = as.factor(all_conditions)) ddsFull <- DESeqDataSetFromMatrix(countData = newMatrix_countdata, colData = coldata, design = ~ condition) ddsFull <- estimateSizeFactors(ddsFull) # ???? dds <- ddsFull[,colData(ddsFull)$condition %in% c("Treated_GFP_Neg","Treated_GFP_Pos")] dds$condition <- droplevels(dds$condition) dds$condition <- relevel(dds$condition,"Treated_GFP_Neg") dds <- DESeq(dds)
#-----------------------------------------------------------------------#
Which way is beter? If there are any similar questions with answers, I will appreciate it if you could direct me.
best,
ilyas.
Also, in case you are not aware, another option (maybe more typical) is to run the analysis altogether, and use 'contrast' to compare groups (see
?results
for more information).etc.
This will not give the same results as running pairs of groups through DESeq(). Again, it's hard to say which way is 'correct', it's the difference between estimating a set of parameters for the whole experiment or for pairs at a time.
I think it would be better first estimating the size factors and then do further analyses, but of course I do not have a clear/reasonable explanation for that.
Thank you very much Michael.