Nanostring ncounterdata - DESeq
1
1
Entering edit mode
@vanessa-vermeirssen-4771
Last seen 9.6 years ago
Dear all, I need to statistically analyse Nanostring ncounter data to see if there is differential expression between experiment and control. I have 3 biological replicates of each and the experimental set-up would slightly favor a "_paired_" statistical approach. Nanostring nCounter data are mRNA counts, like RNA-Seq, but I wonder if they have the same properties like RNASeq data i.e. I do only have the counts for 110 specifically selected genes. The deeper sampling of one sample compared to another e.g. is less applicable. The manufacturer suggested some preprocessing of the data: scaling against positive spike-ins, substracting background (and absent/present call generation). In addition, we performed a normalization with 4 household genes (selected out of the 8 included in the 110 genes). I did the DESeq package analysis using these preprocessed data, is this package also appropriate in this case (e.g. the library normalization step?)? Is the preprocessing correct for this? In addition, I also did a t-test (paired and normal, equal variance, which I tested, on the log2 data), because this has been described in literature before. Another paper describes an FDR permutation approach, but they don't seem to have any biological replicates, but 32 control experiments and 10 control genes (Amit et al., 2009). I also tried to do this on our data. We have some nice "trends" in our data, which we kind of expected, but the most significance is obtained with DESeq. Could you advise me if DESeq is the most correct approach in our case? What about the other statistical approaches I have tried? A minor question relates to the preprocessing. How should I deal with absent/present calls obtained after the preprocessing in the course of statistical analysis? Should I include them as NAs from the beginning, or re-evaluate the results at the end? Thank you so much in advance already. Best regards, Vanessa Vermeirssen -- ===================================================================== Vanessa Vermeirssen, PhD Tel:+32 (0)9 331 38 10 Fax:+32 (0)9 3313809 Bioinformatics and Systems Biology VIB Department of Plant Systems Biology, Ghent University Technologiepark 927, 9052 Gent, BELGIUM vamei at psb.vib-ugent.be http://bioinformatics.psb.ugent.be/
RNASeq Normalization Preprocessing DESeq RNASeq Normalization Preprocessing DESeq • 2.9k views
ADD COMMENT
2
Entering edit mode
Simon Anders ★ 3.7k
@simon-anders-3855
Last seen 3.7 years ago
Zentrum für Molekularbiologie, Universi…
Hi Vanessa On 2011-07-25 13:33, Vanessa Vermeirssen wrote: > [...] > Nanostring nCounter data are mRNA counts, like RNA-Seq, but I wonder if > they have the same properties like RNASeq data i.e. I do only have the > counts for 110 > specifically selected genes. The deeper sampling of one sample compared > to another e.g. is less > applicable. > The manufacturer suggested some preprocessing of the data: scaling > against positive spike-ins, substracting background (and absent/present > call generation). > In addition, we performed a normalization with 4 household genes > (selected out of the 8 > included in the 110 genes). > > I did the DESeq package analysis using these preprocessed data, is this > package also appropriate in this case (e.g. the library normalization > step?)? Is the preprocessing correct for this? In principle, the model used by DESeq should work for any kind of count data. The crucial part is that you give it count data, i.e., integer counts of detected tags, without any normalization or the like. If you decide against using DESeq's normalization scheme and prefer to use Nanostring's, you need to hand the scaling factors calculated by their algorithm to DESeq by writing them into the 'sizeFactors' slot of the CountDataSet, i.e., use sizeFactors(cds) <- c( 1.2, 1.0, .95, ...) whith the scaling factors that you somehow got from the Nanostrings software instead of cds <- estimateSizeFactors(cds) Look at pairwise MA plots to check whether the normalization worked. There is, however, no way to incorporate background correction to DESeq, because, for RNA-Seq analysis, this is not needed. With nanostrings, you have cross-hybridization and hence background. If the background level is the same in all samples, it should not influence your differential expression calculation, and you can ignore it. Otherwise, it might be an issue. > In addition, I also did a t-test (paired and normal, equal variance, > which I tested, on the log2 data), because this has been described in > literature before. This is no big surprise. With only few replicates, a standard t test does not have much power. This was, after all, the motivation behind the development of limma. > Another paper describes an FDR permutation approach, but they don't seem > to have any biological replicates, but 32 control experiments and 10 > control genes (Amit et al., 2009). > I also tried to do this on our data. I'm a bit puzzled how a permutation test without replicates might work, but I don't know the paper. > [...] > A minor question relates to the preprocessing. How should I deal with > absent/present calls obtained after the preprocessing in the course of > statistical analysis? > Should I include them as NAs from the beginning, or re-evaluate the > results at the end? Hard to say. As I am not familiar with the technology, i don't know what they base these calls on. As you cannot incorporate background correction, anyway, it might be best to ignore the absence/presence calls as well and use all data. Please let us know whether it works. Simon
ADD COMMENT
0
Entering edit mode

I know this post is ~5 years old, but I found no other reference regarding the background subtraction of Nanostring and deseq2, I got a simple question: Is it right to go this way: ?

regarding the normalization:

Nanostring is : Positive normalization -> Negative norm -> whole codeset normalization ( what is the difference between + norm and whole codeset norm by the way? Is it one to eliminate the hybridization noise and the other to eliminate sample to sample variability?)

Deseq2 : whole codeset normalization without background normalization.

But deseq2 needs counts, so is it more accurate if i do the Positive and negative normalization manually according to the manufacturer's guidelines, then convert all the numbers to INTEGERS (down gradeing with excel int() ), and then give it to deseq2? ( After all, after the background normalization we have more accurate counts).

Last question: In case my research includes only 21 samples, is deseq2 relevant at all ? Due to the use of wald test which is parametric and in the case of low sample number it is recommended to use non-parametric.. ?

I really thank you ! I wish you receive this message :)

ADD REPLY

Login before adding your answer.

Traffic: 879 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6