Question

DESeq2 with nanostring data

0

Entering edit mode

acs1990 ▴ 10

@acs1990-21798

Last seen 5.6 years ago

Hi all,

We're looking at 700+ genes, about 20 of which are considered housekeeping while the rest are endogenous. Based on the other posts about using nanostring data here is how I've set up my model

dds <- DESeqDataSetFromMatrix(countData = counts, colData = sampledata, design = ~var1+var2+var3)

dds <-estimateSizeFactors(dds, controlGenes = controlgenes$Class.Name) dds <-estimateDispersions(dds, fitType = "local") dds <-nbinomLRT(dds, reduced = ~var1+var2)

and for QA, my MA plot is graphed like this

resp <- lfcShrink(dds, coef="conditionBvsA", type="apeglm") idx <- c("ABC", "DEF", "GHI", "JKL","MNO", "PQR", "STU", "VWX") xlim <- c(1,5e6) ylim <- c(-3,3)

plotMA(resp,ylim=ylim,xlim=xlim) with(resp[idx,], points(baseMean, log2FoldChange, cex=1, lwd=2, col="orange"))

and the housekeeping genes, rather than lying on the x-axis, look like this [https://ibb.co/wCcydFS

am I on the right track here? why do the housekeeping genes fall like this? Any help is much appreciated! Thanks!

deseq2 nanostring • 2.1k views

ADD COMMENT • link 5.6 years ago acs1990 ▴ 10

score 0 · Answer 1 · 2019-12-18

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 20 days ago

United States

We use DESeq2 on Nanostring in our lab, so I'm familiar with this setup.

One note is that the "housekeeping" genes are all over the place, relative to the rest of the genes. What we've done before is to evaluate the housekeeping genes one by one to see if they are truly not associated with condition, and if we feel that they may not be demonstrating small to no change across samples, we do not use them for normalization. Sometimes a few of our so called housekeeping are not in track with the others, and seem to be instead associated with condition. In your plot, there are maybe 4 above, 3 on the line, and maybe 12 below. Could you replot this MA with the MLEs just to see how that looks:

res <- results(dds)
plotMA(res)

ADD COMMENT • link 5.6 years ago Michael Love 43k

0

Entering edit mode

Thank you, I leaned heavily on your replies from a few years ago on this topic.

Did you mean that you remove the housekeeping genes from normalization if you feel they are demonstrating little/no association with the condition?

Here is a link to the MA plot you requested, I can't get the image to embed properly. maplot

Thank you for all your help!

ADD REPLY • link 5.6 years ago acs1990 ▴ 10

0

Entering edit mode

Here is the same plot as linked in the above comment with so-called housekeeping genes highlighted

ma plot with housekeeping genes

ADD REPLY • link 5.6 years ago acs1990 ▴ 10

0

Entering edit mode

Can you redo this one, I think it's putting the orange circles on the shrunken LFC instead of the MLE LFC.

ADD REPLY • link 5.6 years ago Michael Love 43k

0

Entering edit mode

Sorry I had a lot of negatives in my statement above, to be clear I mean: if a "housekeeping" gene is not in line with the rest, but instead seems to fluctuate with the condition, then I would exclude it from use as a "housekeeping" gene.

here's an idea:

cnd: A A A B B B
hk1: 1 2 1 2 1 2
hk2: 2 4 2 4 2 4
hk3: 1 2 1 8 4 8

There is some common up and down pattern across "housekeeping" 1 & 2, but the third one in addition is showing substantial variation with respect to the condition. We've seen this before, because "housekeeping" is sometimes more of a wishful statement than what's actually going on in the sample.

ADD REPLY • link 5.6 years ago Michael Love 43k

0

Entering edit mode

Thank you for clarifying,

here is the plot with the MLE LFC labelled ma plot

edited to add that ours also appear to be all over the place with regard to log fold changes but I'll check out the raw counts more closely and exclude the ones with greater variability between the two comparison groups from analysis. Is that what you would suggest?

ADD REPLY • link 5.6 years ago acs1990 ▴ 10

0

Entering edit mode

Yes, I'd recommend essentially iterating on normalization. If after a first pass, the scaled counts of a "housekeeping" gene are varying a lot across condition, it's a reason to consider excluding. It's a tough position to be in because what should be indicating minimal variability is larger in variability than the rest of the genes, so you're a bit stuck.

ADD REPLY • link 5.6 years ago Michael Love 43k

0

Entering edit mode

Thank you, it is a tough position since I don't want to remove what are essentially supposed to be controls modelling the true variation with the outcome that I'm artificially increasing the number of genes found to be DE but it feels that way.

ADD REPLY • link 5.6 years ago acs1990 ▴ 10

0

Entering edit mode

I'd be more concerned with getting the sign of the LFC right, regardless of if there are more or less number of DE genes. The other approach would be to just use all the genes for size factor estimation. This operation is parsimonious in that it will try to balance the observed LFC distribution around 0.

ADD REPLY • link 5.6 years ago Michael Love 43k