I had a query regarding what DESeq2 assigns as ControlGenes (housekeeping genes). As per the manual, the Normalized Counts can be determined by estimateSizeFactorsForMatrix(counts, locfunc = stats::median, geoMeans, controlGenes) While, geoMean is explained, controlGenes are the first 200 genes? Could anyone please explain the criteria for selecting the first 200 genes as Control? Is there a way in DESeq2 which can be used to determine what are the housekeeping genes and plot their expression?
Thanks for the reply, just few follow up questions. Will adding controlGenes also result in change in output of (by itself) dds <- DESeq(dds) or it would have to be defined herein?
If you run
estimateSizeFactors
beforeDESeq
it will use those pre-estimated size factors, and it will print a message saying so.So, I had uploaded genes to
controlGenes <- c("BJA_RS02215", "BJA_RS03430", "BJA_RS04155", "BJA_RS05410", "BJA_RS07010")
, but this creates a character class object, which when used to runthrows out a very expected error:
I have been trying to convert it to either of them using
as.is
and have failed to accomplish it. Is there another way of converting gene names into a numeric/logic object?Whenever trying out something new, you should check the help for the function.
will tell you that
controlGenes
is:A logical vector in R is a vector of TRUE or FALSE that can be used to index another vector or matrix-like object. Here you can just do:
Thank you so much! That was very very helpful!