Question

Nanostring ncounterdata - DESeq

1

Entering edit mode

Vanessa Vermeirssen ▴ 20

@vanessa-vermeirssen-4771

Last seen 11.3 years ago

Dear all, I need to statistically analyse Nanostring ncounter data to see if there is differential expression between experiment and control. I have 3 biological replicates of each and the experimental set-up would slightly favor a "_paired_" statistical approach. Nanostring nCounter data are mRNA counts, like RNA-Seq, but I wonder if they have the same properties like RNASeq data i.e. I do only have the counts for 110 specifically selected genes. The deeper sampling of one sample compared to another e.g. is less applicable. The manufacturer suggested some preprocessing of the data: scaling against positive spike-ins, substracting background (and absent/present call generation). In addition, we performed a normalization with 4 household genes (selected out of the 8 included in the 110 genes). I did the DESeq package analysis using these preprocessed data, is this package also appropriate in this case (e.g. the library normalization step?)? Is the preprocessing correct for this? In addition, I also did a t-test (paired and normal, equal variance, which I tested, on the log2 data), because this has been described in literature before. Another paper describes an FDR permutation approach, but they don't seem to have any biological replicates, but 32 control experiments and 10 control genes (Amit et al., 2009). I also tried to do this on our data. We have some nice "trends" in our data, which we kind of expected, but the most significance is obtained with DESeq. Could you advise me if DESeq is the most correct approach in our case? What about the other statistical approaches I have tried? A minor question relates to the preprocessing. How should I deal with absent/present calls obtained after the preprocessing in the course of statistical analysis? Should I include them as NAs from the beginning, or re-evaluate the results at the end? Thank you so much in advance already. Best regards, Vanessa Vermeirssen -- ===================================================================== Vanessa Vermeirssen, PhD Tel:+32 (0)9 331 38 10 Fax:+32 (0)9 3313809 Bioinformatics and Systems Biology VIB Department of Plant Systems Biology, Ghent University Technologiepark 927, 9052 Gent, BELGIUM vamei at psb.vib-ugent.be http://bioinformatics.psb.ugent.be/

RNASeq Normalization Preprocessing DESeq RNASeq Normalization Preprocessing DESeq • 3.4k views

ADD COMMENT • link updated 14.4 years ago by Simon Anders ★ 3.8k • written 14.4 years ago by Vanessa Vermeirssen ▴ 20

score 2 · Answer 1 · 2011-07-25

Hi Vanessa On 2011-07-25 13:33, Vanessa Vermeirssen wrote: > [...] > Nanostring nCounter data are mRNA counts, like RNA-Seq, but I wonder if > they have the same properties like RNASeq data i.e. I do only have the > counts for 110 > specifically selected genes. The deeper sampling of one sample compared > to another e.g. is less > applicable. > The manufacturer suggested some preprocessing of the data: scaling > against positive spike-ins, substracting background (and absent/present > call generation). > In addition, we performed a normalization with 4 household genes > (selected out of the 8 > included in the 110 genes). > > I did the DESeq package analysis using these preprocessed data, is this > package also appropriate in this case (e.g. the library normalization > step?)? Is the preprocessing correct for this? In principle, the model used by DESeq should work for any kind of count data. The crucial part is that you give it count data, i.e., integer counts of detected tags, without any normalization or the like. If you decide against using DESeq's normalization scheme and prefer to use Nanostring's, you need to hand the scaling factors calculated by their algorithm to DESeq by writing them into the 'sizeFactors' slot of the CountDataSet, i.e., use sizeFactors(cds) <- c( 1.2, 1.0, .95, ...) whith the scaling factors that you somehow got from the Nanostrings software instead of cds <- estimateSizeFactors(cds) Look at pairwise MA plots to check whether the normalization worked. There is, however, no way to incorporate background correction to DESeq, because, for RNA-Seq analysis, this is not needed. With nanostrings, you have cross-hybridization and hence background. If the background level is the same in all samples, it should not influence your differential expression calculation, and you can ignore it. Otherwise, it might be an issue. > In addition, I also did a t-test (paired and normal, equal variance, > which I tested, on the log2 data), because this has been described in > literature before. This is no big surprise. With only few replicates, a standard t test does not have much power. This was, after all, the motivation behind the development of limma. > Another paper describes an FDR permutation approach, but they don't seem > to have any biological replicates, but 32 control experiments and 10 > control genes (Amit et al., 2009). > I also tried to do this on our data. I'm a bit puzzled how a permutation test without replicates might work, but I don't know the paper. > [...] > A minor question relates to the preprocessing. How should I deal with > absent/present calls obtained after the preprocessing in the course of > statistical analysis? > Should I include them as NAs from the beginning, or re-evaluate the > results at the end? Hard to say. As I am not familiar with the technology, i don't know what they base these calls on. As you cannot incorporate background correction, anyway, it might be best to ignore the absence/presence calls as well and use all data. Please let us know whether it works. Simon