DESeq2 with nanostring data
1
0
Entering edit mode
acs1990 ▴ 10
@acs1990-21798
Last seen 4.4 years ago

Hi all,

We're looking at 700+ genes, about 20 of which are considered housekeeping while the rest are endogenous. Based on the other posts about using nanostring data here is how I've set up my model

dds <- DESeqDataSetFromMatrix(countData = counts, colData = sampledata, design = ~var1+var2+var3)

dds <-estimateSizeFactors(dds, controlGenes = controlgenes$Class.Name) dds <-estimateDispersions(dds, fitType = "local") dds <-nbinomLRT(dds, reduced = ~var1+var2)

and for QA, my MA plot is graphed like this

resp <- lfcShrink(dds, coef="conditionBvsA", type="apeglm") idx <- c("ABC", "DEF", "GHI", "JKL","MNO", "PQR", "STU", "VWX") xlim <- c(1,5e6) ylim <- c(-3,3)

plotMA(resp,ylim=ylim,xlim=xlim) with(resp[idx,], points(baseMean, log2FoldChange, cex=1, lwd=2, col="orange"))

and the housekeeping genes, rather than lying on the x-axis, look like this [https://ibb.co/wCcydFS

am I on the right track here? why do the housekeeping genes fall like this? Any help is much appreciated! Thanks!

deseq2 nanostring • 1.6k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

We use DESeq2 on Nanostring in our lab, so I'm familiar with this setup.

One note is that the "housekeeping" genes are all over the place, relative to the rest of the genes. What we've done before is to evaluate the housekeeping genes one by one to see if they are truly not associated with condition, and if we feel that they may not be demonstrating small to no change across samples, we do not use them for normalization. Sometimes a few of our so called housekeeping are not in track with the others, and seem to be instead associated with condition. In your plot, there are maybe 4 above, 3 on the line, and maybe 12 below. Could you replot this MA with the MLEs just to see how that looks:

res <- results(dds)
plotMA(res)
ADD COMMENT
0
Entering edit mode

Thank you, I leaned heavily on your replies from a few years ago on this topic.

Did you mean that you remove the housekeeping genes from normalization if you feel they are demonstrating little/no association with the condition?

Here is a link to the MA plot you requested, I can't get the image to embed properly. maplot

Thank you for all your help!

ADD REPLY
0
Entering edit mode

Here is the same plot as linked in the above comment with so-called housekeeping genes highlighted

ma plot with housekeeping genes

ADD REPLY
0
Entering edit mode

Can you redo this one, I think it's putting the orange circles on the shrunken LFC instead of the MLE LFC.

ADD REPLY
0
Entering edit mode

Sorry I had a lot of negatives in my statement above, to be clear I mean: if a "housekeeping" gene is not in line with the rest, but instead seems to fluctuate with the condition, then I would exclude it from use as a "housekeeping" gene.

here's an idea:

cnd: A A A B B B
hk1: 1 2 1 2 1 2
hk2: 2 4 2 4 2 4
hk3: 1 2 1 8 4 8

There is some common up and down pattern across "housekeeping" 1 & 2, but the third one in addition is showing substantial variation with respect to the condition. We've seen this before, because "housekeeping" is sometimes more of a wishful statement than what's actually going on in the sample.

ADD REPLY
0
Entering edit mode

Thank you for clarifying,

here is the plot with the MLE LFC labelled ma plot

edited to add that ours also appear to be all over the place with regard to log fold changes but I'll check out the raw counts more closely and exclude the ones with greater variability between the two comparison groups from analysis. Is that what you would suggest?

ADD REPLY
0
Entering edit mode

Yes, I'd recommend essentially iterating on normalization. If after a first pass, the scaled counts of a "housekeeping" gene are varying a lot across condition, it's a reason to consider excluding. It's a tough position to be in because what should be indicating minimal variability is larger in variability than the rest of the genes, so you're a bit stuck.

ADD REPLY
0
Entering edit mode

Thank you, it is a tough position since I don't want to remove what are essentially supposed to be controls modelling the true variation with the outcome that I'm artificially increasing the number of genes found to be DE but it feels that way.

ADD REPLY
0
Entering edit mode

I'd be more concerned with getting the sign of the LFC right, regardless of if there are more or less number of DE genes. The other approach would be to just use all the genes for size factor estimation. This operation is parsimonious in that it will try to balance the observed LFC distribution around 0.

ADD REPLY

Login before adding your answer.

Traffic: 943 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6