Hi everyone, I've been trying to teach myself R to do mostly RNAseq analysis and I feel like I'm making good progress, but still I just can't completely wrap my head around the design formula. From what I've read, the order of factors after the '~' don't matter, is that correct?

I have a few 100 libraries from five different phenotypes (lets call them A, B, C, D & E) from patients that are either progressors (P) or non-progressors (NP). From what I can tell, based on running various PCAs, the major separator is phenotype.

I regularly want to find out differences between progressors (P) and non-progressors (NP) (colData$NP_P) for each given phenotype (colData$Pheno), but also differences between the 5 phenotypes irrespective of progression status of the patient.

At the moment I just do: dds <- DESeqDataSetFromMatrix(countData=mat,colData=colData,design=~Pheno)

And when I want to look at NP vs P for a given Phenotype, I filter the colData for that Phenotype and:

dds <- DESeqDataSetFromMatrix(countData=mat,colData=colData,design=~NP_P)

Is this the wrong way to go about it? Should I be doing ~Pheno+NP_P, or ~Pheno + NP_P + Pheno:NP_P, I'm confused!

Lastly, if I do ~Pheno + NP_P + Pheno:NP_P, how do I set up the contrast for the Pheno:NP_P part? I tried: res <- data.frame(results(dds, contrast=c("PhenoA","NP","P"))) but it doesn't work. I tried to figure it out with resultsNames(dds) but couldn't.

Any help is greatly appreciated!

Adam

Thanks!