Can NanoString data be analyzed using DESeq2?
1
1
Entering edit mode
lim6432 ▴ 30
@lim6432-21478
Last seen 21 months ago

Hello, I am trying to analyze differential expressed genes using NanoString data. There are a number of tools that deal with RNA-seq data, but Nanostring data is not well documented. Since Nanostring data is also a gene count data, I think it is possible to analyze using RNA-seq tool.

Is it possible to analyze differential expressed genes with the DESeq2 package using NanoString data?

Many posts say that if you perform data normalization on your own, you can use DESeq to analyze differential gene analysis. Is that clear?

If you have good tools for dealing NanoString data, please let me know.

Thanks.

NanoString Differential Expressed Genes Analysis DESeq2 • 1.4k views
ADD COMMENT
3
Entering edit mode
@mikelove
Last seen 7 hours ago
United States

We use DESeq2 on Nanostring datasets in our lab. Our approach is to estimate RUV factors using the endogenous housekeeping genes and use these in the design formula. You could also specify the endogenous housekeeping as controlGenes in DESeq2.

Update: see our manuscript on bioRxiv:

https://doi.org/10.1101/2020.04.08.032490

ADD COMMENT
0
Entering edit mode

Thank you for your reply.

I have a question for your saying.

Is it a process for data normalization to calculate using controlGenes in DESeq2?(You could also specify the endogenous housekeeping as controlGenes in DESeq2.)

I have heard that the process of normalization of the NanoString is fixed, and I want to normalize them using other tools.(for normalization tool for NanoString nCounter data / e.g. NanoStringNorm etc.) But I can not find a function in DESeq2 that can receive data that has already been normalized.

Can I use the processed data(normalized count data) instead of the raw data(raw count data)?

ADD REPLY
1
Entering edit mode

Their normalized values are not really the optimal input. We found in our internal testing it was much better it is to use the original counts and RUV or control genes plus various EDA and diagnostic checks, like MA plots with labeled control genes.

ADD REPLY
0
Entering edit mode

DESeq assumes most genes are not DE. In Nanostring, as far as I understand, only genes that are expected to change are inspected (besides the housekeeping genes).

Is this DESeq assumption necessary only for the normalization step? If it is necessary for other steps in the process, it's hard to understand how can Nanostring data be analyzed with it.

ADD REPLY
1
Entering edit mode

This is why we use the endogenous housekeeping genes passed to RUV or to DESeq2 as controlGenes for normalization, followed by MA plots for quality control. The endogenous housekeeping should fall on the x-axis.

DESeq2 assumes that the median ratio captures the sequencing depth (not exactly the same as saying that most genes are not DE). But still, you do need to modify only the normalization step, to either inform DESeq2 about which are the control genes, or to calculate normalization with RUV. It’s just the normalization that needs a modification not other steps.

ADD REPLY
0
Entering edit mode

Dear Michael - very insightful thank you.

What should be done with the NanoString Positive and Negative spike-in control genes in the DESeq2 analysis approach you described above?

Should they be filtered out from the original counts before starting?

Also, why are the endogenous Housekeeping genes best for RUV or controlGenes in estimateSizeFactors and not one of the other spike-in groups?

ADD REPLY
1
Entering edit mode

We’ve now written up our NanoString normalization recommendations in a manuscript:

https://doi.org/10.1101/2020.04.08.032490

Let me know if you have any questions.

ADD REPLY
0
Entering edit mode

Very nice paper, just have a couple questions regarding the details:

You only do UQ normalization on the original counts for the RUVg analysis is this correct? And the UQ normalized counts are not used for below.

When you create the dds on the original counts for varianceStabilizingTransformation, does the design formula have the RUV factors? VST is also normalizing the data with RLE via estimateSizeFactors as well as the design matrix you provide, so somewhat confusing given you are also doing removeBatchEffect next to normalize with RUV factors.

When you do subsequent limma removeBatchEffect on the VST data, are you passing the RUV factors into the covariates option and leaving batch=NULL and batch2=NULL?

ADD REPLY
1
Entering edit mode

1) Yes, only for RUVg

2) No, design does not have RUV factors for VST, it's just using design=~1 (nor does VST do any removing of variance or shifting of values associated with variables the design, by the way). Then VST corrects for size factors, then removeBatchEffect corrects for RUV factors. I don't see the mixup.

3) Only using covariates, take a look at the exact code here:

https://github.com/bhattacharya-a-bt/CBCS_normalization/blob/master/nanostring_RUV_functions.R#L28-L40

ADD REPLY
0
Entering edit mode

Thank you, and sorry you are right it does when VST is only correcting for size factors.

On the related topic of performing general differential expression testing with DESeq2 and edgeR with RUVg factors, why in Section 2.3 of the RUVSeq vignette:

https://bioconductor.org/packages/release/bioc/vignettes/RUVSeq/inst/doc/RUVSeq.pdf#page5

do they show doing UQ calcNormFactors of raw counts before testing with edgeR, though in the following Section 2.5 they do not do that with DESeq2?

ADD REPLY
1
Entering edit mode

They are just using one norm method within edgeR and a different in DESeq2.

ADD REPLY
0
Entering edit mode

Also, I see that in lines 31-35 it’s deviating from the standard dds <- estimateDispersions(dds)

Could you elaborate on what exactly is being done here and why?

ADD REPLY
1
Entering edit mode

The parametric trend (which was derived specifically for RNA-seq in Anders and Huber) I think was not doing well for our NanoString datasets so we just use the same prior across probes (fitType="mean").

ADD REPLY
0
Entering edit mode

Thanks very much for spending the time to answer these questions, will definitely try this on a couple NanoString datasets and compare to NanoStringNorm.

ADD REPLY
0
Entering edit mode

I apologize, one more question, would you also recommend applying lines 31-35 prior to differential expression testing with DESeq2 (when using the RUVg factors in the design formula)?

ADD REPLY
1
Entering edit mode

Those lines are using a methods of moments estimator for dispersion. We did this to reduce computing time because we had many samples. With fewer samples I’d recommend just using the standard DESeq2 steps.

ADD REPLY
0
Entering edit mode

For clarification, for smaller datasets that only have ~100s samples or less:

You recommend applying all of lines 31-35 for normalization with VST and removeBatchEffects with RUVg factors.

Though for DE testing you recommend standard DESeq2 with default DESeq and RUVg factors in design formula.

ADD REPLY
1
Entering edit mode

You could delete the MoM dispersion lines for smaller datasets.

Yes.

ADD REPLY
1
Entering edit mode

Just to be explicit, for smaller datasets you could replace lines 31-35 (MoM estimate of gene-wise dispersion) with standard dispersion estimation:

dds <- estimateDispersions(dds, fitType="mean")
ADD REPLY
0
Entering edit mode

Our comments crossed paths.. thank you yes that makes sense

ADD REPLY
0
Entering edit mode

So for both normalization and DE testing though still a fitType=“mean”

ADD REPLY
0
Entering edit mode

Hi Michael,

Interesting post. One question...

Is this method valid also for nanostring miRNAs nCounter data? Or it should be normalised before with nanostring software and then perform the differential analysis with DESeq2?

Kind regards,

ADD REPLY
0
Entering edit mode

We've used DESeq2 with NanoString without any issue (but with careful selection of genes for column scaling). For large cohorts we use RUV upstream and would then supply factors in the design matrix for DE analysis. Here is an example of a published framework for these tools applied to NanoString:

https://pubmed.ncbi.nlm.nih.gov/32789507/

ADD REPLY
0
Entering edit mode

Thank you Michael, as always.

I will try to follow the paper.... in this case is a very small cohort (6 exp vs 5 controls). The first approach I used was to use NACHO to perform the reading of the .RCC files and perform a ·GEO" normalization:

data.norm <- normalise(
  nacho_object = data,
  housekeeping_genes = my_housekeeping,
  housekeeping_predict = FALSE,
  housekeeping_norm = TRUE,
  normalisation_method = "GEO", # to choose between "GLM", o "GEO"
  remove_outliers = TRUE)

then I extracted these normalised counts:

expr_counts <- data.norm[["nacho"]] %>% 
  filter(grepl("Endogenous", CodeClass)) %>% 
  select(NcounterFile, Name, Count_Norm) %>% 
  pivot_wider(names_from = "Name", values_from = "Count_Norm") %>% 
  column_to_rownames("NcounterFile") %>% 
  t()

and finally aplied DESeq2:

dds <- DESeqDataSetFromMatrix(countData = expr_counts,
                              colData = selected_pheno,
                              design= ~ Phenotype)

dds <- DESeq(dds)
res <- results(dds)

but this, as mentioned before, is introducing two normalizations, the one form NACHO, and then deseq2. (is this that bad???)

Lets see if I can manage to make it work following your paper pipeline.

Thank you again!

ADD REPLY
0
Entering edit mode

Our approach is to use original counts with factors of unwanted variation in the design. In general DESeq2 should not be used with pre-normalized counts, so I wouldn't recommend the above.

ADD REPLY

Login before adding your answer.

Traffic: 434 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6