Hi all,
We're looking at 700+ genes, about 20 of which are considered housekeeping while the rest are endogenous. Based on the other posts about using nanostring data here is how I've set up my model
dds <- DESeqDataSetFromMatrix(countData = counts, colData = sampledata, design = ~var1+var2+var3)
dds <-estimateSizeFactors(dds, controlGenes = controlgenes$Class.Name) dds <-estimateDispersions(dds, fitType = "local") dds <-nbinomLRT(dds, reduced = ~var1+var2)
and for QA, my MA plot is graphed like this
resp <- lfcShrink(dds, coef="conditionBvsA", type="apeglm") idx <- c("ABC", "DEF", "GHI", "JKL","MNO", "PQR", "STU", "VWX") xlim <- c(1,5e6) ylim <- c(-3,3)
plotMA(resp,ylim=ylim,xlim=xlim) with(resp[idx,], points(baseMean, log2FoldChange, cex=1, lwd=2, col="orange"))
and the housekeeping genes, rather than lying on the x-axis, look like this [https://ibb.co/wCcydFS
am I on the right track here? why do the housekeeping genes fall like this? Any help is much appreciated! Thanks!
Thank you, I leaned heavily on your replies from a few years ago on this topic.
Did you mean that you remove the housekeeping genes from normalization if you feel they are demonstrating little/no association with the condition?
Here is a link to the MA plot you requested, I can't get the image to embed properly. maplot
Thank you for all your help!
Here is the same plot as linked in the above comment with so-called housekeeping genes highlighted
ma plot with housekeeping genes
Can you redo this one, I think it's putting the orange circles on the shrunken LFC instead of the MLE LFC.
Sorry I had a lot of negatives in my statement above, to be clear I mean: if a "housekeeping" gene is not in line with the rest, but instead seems to fluctuate with the condition, then I would exclude it from use as a "housekeeping" gene.
here's an idea:
There is some common up and down pattern across "housekeeping" 1 & 2, but the third one in addition is showing substantial variation with respect to the condition. We've seen this before, because "housekeeping" is sometimes more of a wishful statement than what's actually going on in the sample.
Thank you for clarifying,
here is the plot with the MLE LFC labelled ma plot
edited to add that ours also appear to be all over the place with regard to log fold changes but I'll check out the raw counts more closely and exclude the ones with greater variability between the two comparison groups from analysis. Is that what you would suggest?
Yes, I'd recommend essentially iterating on normalization. If after a first pass, the scaled counts of a "housekeeping" gene are varying a lot across condition, it's a reason to consider excluding. It's a tough position to be in because what should be indicating minimal variability is larger in variability than the rest of the genes, so you're a bit stuck.
Thank you, it is a tough position since I don't want to remove what are essentially supposed to be controls modelling the true variation with the outcome that I'm artificially increasing the number of genes found to be DE but it feels that way.
I'd be more concerned with getting the sign of the LFC right, regardless of if there are more or less number of DE genes. The other approach would be to just use all the genes for size factor estimation. This operation is parsimonious in that it will try to balance the observed LFC distribution around 0.