Hi, On Tue, Feb 11, 2014 at 7:11 AM, Ming Yi <yi02 at="" hotmail.com=""> wrote: > Hi, Mike: > > > > Thanks a lot for the prompt response and input, which is very helpful > > Since some of the genes seem a bit interesting to us, and of course we love to keep. > > However, when I try: > >> resType <- results(dds, "Type_Tumor_vs_Normal",cooksCutoff=FALSE); > Error in results(dds, "Type_Tumor_vs_Normal", cooksCutoff = FALSE) : > unused argument (cooksCutoff = FALSE) What version of DESeq2 are you using? Is the cooksCutoff not defined in the documentation when you fire?results ?
Yes, Steve is on it. I assumed you were using the current release version of Bioconductor (2.13) with DESeq2 v1.2. In v1.0, the cooksCutoff argument was in DESeq(). > Also from your experience, if Cook's filtering is taken out, validation > rate much worse? In reality, some genes might have large variation than > others such as cancer-related genes. What do you think? âCook's filtering is just a heuristic, so it's hard to give general advice. The point is to help identify cases when individual samples have too much influence on the log fold changes. I would recommend plotting the counts of genes with large Cook's distance: # get the genes with highest max(Cook's distance for each sample) cooks <- mcols(dds)$maxCooks idx <- order(-cooks) # plot the normalized counts for the top gene by max Cook's distance plot( counts(dds,normalized=TRUE)[ idx[1], ], main=paste("Max Cook's:", cooks[idx[1]]) ) You can decide for yourself where to set the filter by setting cooksCutoff = x. Note that large variance alone will not lead to filtering; the filtering comes in when the variance for a majority of samples is small, but a minority of samples have extreme counts which have large influence on the log fold changes. For your experiment, if the subject variable is explaining a lot of the variance I would make sure to include it in the design, to help isolate the true condition effect. 