Hello, I am trying to analyze differential expressed genes using NanoString data. There are a number of tools that deal with RNA-seq data, but Nanostring data is not well documented. Since Nanostring data is also a gene count data, I think it is possible to analyze using RNA-seq tool.
Is it possible to analyze differential expressed genes with the DESeq2 package using NanoString data?
Many posts say that if you perform data normalization on your own, you can use DESeq to analyze differential gene analysis. Is that clear?
If you have good tools for dealing NanoString data, please let me know.
Thanks.
Hi Michael,
I inherited a NanoString dataset and would like to use NanoNormIter and DESeq2 for its analysis. Before I begin, I wanted to clarify a few points which I summarize below to be more comprehensive:
1) You start by reading the raw files and do the pre-processing until the raw count matrixes. You next define housekeeping genes, test for differential expression in the group of interest, would you remove such genes and perhaps replace them with those with the lowest variance in the dataset?
2) You detect the LOD for all samples. This is for flagging up problematic samples, right? Same for the
HK_Gene_Miss
genes as I see no other use of them later in the code.3) I noticed that you run
RUV_total
twice to generate two objects, 'vsd' and 'set', respectively. Am I right that you use the former for visualization and decision on thek
parameter and the latter for the design of the actualdds
object?4) One thing that was not clear was that you used the
W_1
matrix for downstream analysis. Would you then imply to re-runbetweenLaneNormalization
andRUVg
from the RUVSeq package outside theRUV_total
function? I could not find where elseW_1
was generated in the script.5) Would you recommend additional filtering like one does in RNA-seq (e.g. filtering out genes which are <10 counts in all samples) before normalization?
Thanks in advance for your help!
Let me ping the author of NanoNormIter for his thoughts.
For 1) there are numerous ways to define control/housekeeping genes which have been discussed in the RUV-Seq paper. 5) Yes, if you have original counts, filtering out the very low count features may be a good approach.
Thank you Michael. Have you heard back from the authors of NanoNormIter regarding their code on github?
Hello,
I had one more question about the NanoString analysis pipeline that I needed some clarification. As I understand it, it is expected that one would perform UQ normalization and
RUVg()
using housekeeping genes to estimate size factors. Then variance stabilizing transformation takes place and the batch is removed to generate the normalized count table without unwanted variation for visualization. But this is independent of any downstream steps (e.g. DE analysis) where the standard DESeq2 pipeline can be used using the unwanted variation vector in the design matrix, right? So the point of theRUV.total()
function is only to remove the unwanted variation, but not for later analysis.Thanks in advance!
Yes, that sounds correct to me.