Question: Using DESeq2 with Nanostring data (for VST only)
0
gravatar for johnmcma
2.2 years ago by
johnmcma10
johnmcma10 wrote:

Hi all,

Right now I'm planning to perform some machine learning analyses on Nanostring data, and would like to experiment coercing the data to linear by DESeq2's VST functions, as VST has been working fine with out RNA-seq data.

However, unlike limma-voom, there're few guidance for using DESeq(2) on Nanostring data. Simon Anders A: Nanostring ncounterdata - DESeq, and I haven't noticed any discussions otherwise.

So, have anyone here experienced using DESeq(2) with Nanostring counts data? And how do you process the data?

ADD COMMENTlink modified 2.2 years ago by Michael Love24k • written 2.2 years ago by johnmcma10
Answer: Using DESeq2 with Nanostring data (for VST only)
2
gravatar for Michael Love
2.2 years ago by
Michael Love24k
United States
Michael Love24k wrote:

hi,

My main concern with Nanostring would be which genes you will use as control genes for normalization. You should specify these to the controlGenes argument of estimateSizeFactors:

dds <- DESeqDataSetFrom...
dds <- estimateSizeFactors(dds, controlGenes=...)
...
ADD COMMENTlink written 2.2 years ago by Michael Love24k

What if I just don't put in the spike-ins? And, from what you mentioned above, I guess you refer to nCounts that have not been normalized, right?

ADD REPLYlink written 2.2 years ago by johnmcma10

Yes, the raw counts need to be normalized, and if you have a panel of genes that are specifically chosen because they may change across samples, you need to have a set of housekeeping genes or positive spike-ins for reasonable normalization.

ADD REPLYlink written 2.2 years ago by Michael Love24k

By "normalized" I solely mean Nanostring's own CodeSet normalization. But should I put in the positive controls only, or also the negative controls as well? From what I know, Nanostring's own guidelines no longer recommends performing background deductions.

ADD REPLYlink written 2.2 years ago by johnmcma10
1

I took a look at Nanostring's normalization guidelines, and they are essentially recommending DESeq normalization.

What I would recommend, knowing you have 768 genes measured, is to supply DESeq2 with the raw counts and to use the default normalization, so just DESeq() as normal.

My concern with Nanostring counts is that sometimes people are only looking at a small subset of genes (say 100-200) known to be DE across samples, and in that case I'd really prefer if there were known housekeeping genes included on the panel. Housekeeping is better than positive controls, which is better than nothing in my opinion, in this case with a small panel. But with 768 genes -- James is correct -- you can probably identify the per-sample size factors using the default DESeq2 steps.

What I would recommend is, after doing the standard DESeq2 analysis, make an MA plot and draw the positive controls just to see where they fall:

plotMA(res)
idx <- c("...","...","...") # here you should fill in the rownames of the pos controls
with(res[idx,], points(baseMean, log2FoldChange, cex=2, lwd=3, col="orange")

 

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Michael Love24k

I updated this comment, earlier I didn't have code for selecting the positive controls.

ADD REPLYlink written 2.2 years ago by Michael Love24k

Another thing to consider is the number of probes in your codeset. With the smaller, more directed codesets there is always the possibility that most if not all of them are being affected by the experimental conditions. However, as you get into the larger codesets this may be less true, and you may be able to use the conventional assumptions for RNA-Seq data.

The devil, as always, is in the details, and whatever assumptions you make have to be backed up by various types of exploratory data analysis. In my experience, once you get past maybe 350-400 genes in the codeset, you can start thinking that maybe the usual RNA-Seq normalizations are applicable.

ADD REPLYlink written 2.2 years ago by James W. MacDonald50k

The number of probesets is not an issue; this project involves more than one complete set (of 768 "Endogenous" probes).

ADD REPLYlink written 2.2 years ago by johnmcma10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 178 users visited in the last hour