Nanostring analysis with limma
3
7
Entering edit mode
mali salmon ▴ 370
@mali-salmon-4532
Last seen 5.4 years ago
Israel

Hello list

In the recent  limma paper (Nucleic Acids Research 2015) it was mentioned that limma is applicable for analyzing Nanostring data.

I have nanostring miRNA counts and I'm wondering on the best way to analyze them. The data can be treated as regular digital counts (RNA-seq) and be analysed using voom-limma approach. Alternatively I can apply the normalization steps recommended by the company (using NanoStringNorm R package), and then continue with differential expression analysis with limma or voom-limma.

Which approach is recommended?

Thanks

Mali

nanostring limma • 12k views
ADD COMMENT
8
Entering edit mode
@james-w-macdonald-5106
Last seen 2 hours ago
United States

I am not sure there is a 'best' way to analyze NanoString data, particularly the normalization step.

Here are some observations from a set of recent analyses that we performed. With a sufficiently large number of genes (we had a set of data with ~600 genes, with technical replicates for each sample), we found that normalizing to total counts using voom() gave better results than normalizing to the geometric mean of the supplied housekeeping genes, as well as guessing at housekeeping genes using a set of low variance genes. Using an additional quantile or cyclic loess normalization improved results moderately, but at the cost of violating Occam's razor, so we chose to just use voom() without further normalization.

Note here that we assessed the results by comparing the technical replicates, and went with the normalization that did the best job of making the technical replicates look more similar to each other.

For this first set of samples we are making the assumption that most of the genes are being expressed, and most are being expressed at relatively similar levels, so normalizing to library size may be a reasonable thing to do. We analyzed a smaller panel of genes (~200) that was more focused, and for which we couldn't assume that the genes were in general being expressed at the same levels (in other words, a particular treatment may well have reduced the expression of most of these genes, in which case normalizing to library size would have unduly affected the biological differences). In this situation we had to use the NanoString-supplied housekeeping genes.

In your situation, NanoString supplies both mRNA housekeeping genes, as well as some probes against non-mammalian small RNAs. I assume you spike in the non-mammalian small RNAs, like the ERCC probes. However, the spiked-in RNA only gives you information about the variability introduced after the spike-in occurred (plus, if you are spiking in using small volumes and a Rainin pipettor, then you are likely introducing more variability in that step than you think). So you have mRNA housekeeping genes that may correlate with the amount of miRNA that you originally started with, and some spike-ins that help assess variability introduced later in the process.

In my experience, most miRNAs seem not to be expressed, and those that are expressed appear to be at low levels. This is based on results from Affy's miRNA arrays, which are sort of questionable, but that seems to be the expectation for miRNAs, so I don't think it is far from the truth.

So all of the normalization methods have pretty serious issues for the miRNA panel, and I think you have to choose which one you think is the least worst. But do note that if you decide to use limma-voom to analyze the data, and you want to normalize using something other than total counts, you have to take steps to keep voom() from doing so.

In other words, if you just take your counts and feed into voom(), then by default you will compute the library sizes and then compute cpm, which is normalizing to library size. If you don't want that to happen, you need to give some value for the 'lib.size' argument to voom().

 

 

ADD COMMENT
0
Entering edit mode

Hi James,

I am currently trying to use voom() without normalising to library size, as you suggest in the above post. 

As you mention above, I can specify a value for lib.size prior to running voom e.g.

lib.size <- ...

and then specific lib.size in the voom function e.g.

voom(DGElist, design, lib.size=lib.size)

I am struggling with what to change lib.size to, however? 

Thanks,

Martha 

ADD REPLY
1
Entering edit mode

When you read data into a DGEList, it computes the column sums of the input data matrix and uses that as the library size (because, by definition that IS the library size). And when you use voom, the counts are converted to counts/million counts (where you are dividing by the library size, in millions). If you don't want to adjust for library size you have to specify some sensible, constant value for all of the samples.

Given that this is your analysis, you will have to decide what a 'sensible' constant value might be.
 

ADD REPLY
0
Entering edit mode

Thanks for your help James, and for replying so quickly! I am analysing a nanostring panel of 150 and as per your suggestion above, don't want to minimise biological differences by normalising to library size (or total count for nanostring). I thought that I could use limma-voom to calculate DEGs by making limma-voom use nanostring normalised count data rather than cpm.

To do this I have calculated a normalisation factor based on the nanostring positive controls and housekeeping genes = Nanostring NF. 

I have then read my raw count data into a DEGList. 

I have then changed all values for DEGlist$samples$lib.size to 10^6. 

I have then changed all values for DEGlist$samples$norm.factor to 1/Nanostring NF (specific for each sample)

Am I right in thinking that voom(DEGlist, design) will calculate cpm as:

(count/(10^6 * (1/nanostring NF)) * 10^6 = count*(nanostring NF). 

I was also wondering if the voom transformation still makes statistical sense if the nanostring normalised counts rather than cpm normalised counts are calculated as a base for the mean-variance relationship to be estimated? 

Thanks again! 

Martha 

ADD REPLY
0
Entering edit mode

When using limma-voom to analyze Nanostring counts do you only include the "Endogenous" genes in the input count matrix to voom, or all genes including Positive/Negative/Housekeeping?

ADD REPLY
5
Entering edit mode
@gordon-smyth
Last seen 45 minutes ago
WEHI, Melbourne, Australia

I agree with James' comments.

Our experience with NanoString is still very limited, but you should not use NanoStrongNorm before limma and voom because voom wants to see the real counts. Putting the counts into a DGEList is a good first step:

library(edgeR)
d <- DGEList(counts=yourcounts)

Here are some normalization options, in increasing order of strength of normalization. In general, the noisier the data, the stronger the normalization that might be appropriate.

1. Normalizing only to library size:

v <- voom(d, design)

2. TMM normalization:

d <- calcNormFactors(d)
v <- voom(d, design)

3. Cyclic loess normalization:

v <- voom(d, design, normalize="cyclicloess")

4. Quantile normalization:

v <- voom(d, design, normalize="quantile")

 

 

ADD COMMENT
0
Entering edit mode

Sorry, I have digital raw read counts of HTG EdgeSeq Oncology Biomarker Panel (OBP). Whatever I am googling I am not finding anybody used DESeq2, limma, voom or EdgeR for differenial expresseion in this assay. In this assay the expression levels of 2559 genes are assessed. Do you thinK I can use the suggested code as above for normalization following differential expression?

Thanks for any help

ADD REPLY
0
Entering edit mode

Dear Dr. Smyth - I ask this question in a comment to Dr MacDonald above, but got no response. Do you recommend filtering for Endogenous gene probe rows (excluding Positive, Negative, Housekeeping gene probes) before running calcNormFactors or voom?

ADD REPLY
1
Entering edit mode

People don't usually answer new follow-up questions added to a thread 5 years after the original question. It is generally better to ask a new question of your own if you are bringing in new issues not mentioned by the original poster.

I am not very familiar with the NanoString platform, but for most platforms it is better to use only endogeneous probes.

ADD REPLY
0
Entering edit mode

Very sorry about that, different forums have different rules (some don't want you to start a new thread for one that already exists). I imagined it makes sense to filter for the Endogenous only as well, just thought to ask in case there was something I didn't know about.

ADD REPLY
2
Entering edit mode
ker61 ▴ 20
@ker61-7708
Last seen 8.9 years ago
United States

Hello,

I have run voom/limma on miRNA Nanostring data.

The data does not include technical replicates. When creating a design matrix from a matrix where the first column includes individual sample identifiers, I received an error that the package could not be run without replicates. I am wondering if this refers to biological and/or technical replicates. If the former, I am wondering if it would be permissible to use general class identifiers (e.g. disease, control) to define replicate conditions. 

Thank you,

Kelly

ADD COMMENT
0
Entering edit mode

Yes, you've got the general idea. You don't need technical replicates (well -- you might, if you're trying to analyze variability due to sample prep, etc), but biological replicates are what you want.

If you're not using "general class identifiers" (as you say), what did you use?

You're trying to analyze differential expression between 2+ groups, and the point is you need biological replicate samples per group to be able to do that.

ADD REPLY

Login before adding your answer.

Traffic: 677 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6