Entering edit mode
casey.rimland
▴
170
@caseyrimland-14915
Last seen 6.4 years ago
University of Cambridge, National Insti…
I was wondering if there has been any more consensus recently on using DESeq2 to perform analyses of NanoString data? I have a NanoString dataset with 506 endogenous genes for four sample groups and was looking for the best way to analyze the data.
What if I just wanted to use DESeq2 to be able to normalize the count data from the Nanostring using VST to make PCA plots and heat maps for instance?
Thanks!
Thanks for the quick reply! I have not used RUV before so will have to go take a look at that. Do you use RUV before DESeq()? Or is it your method for normalization before things like PCAs/heatmaps and then you still just give the raw Nanostring counts to DESeq() as you would with RNA-Seq data? I've been only working with RNA-Seq data recently, but a labmate has NanoString data we are trying to analyze now.
Do you have any opinions on the NanoStringDiff package?
Here's an example of how to use RUV with DESeq2 (usually I would link directly to the bioc landing page but it seems to be down right now, so i link to my github page):
https://github.com/mikelove/rnaseqGene/blob/master/vignettes/rnaseqGene.Rmd#using-ruv-with-deseq2
The approach is to calculate factors of unwanted variation (you can use control genes -- this is what i did with the housekeeping genes on the panel -- or use all the genes but this is probably not a good idea with Nanostring).
Then you supply the factors to DESeq2 in the front of the design formula with the biological condition at the end of the design formula. Importantly, DESeq2 runs on the original counts.
Note: I do not recommend any kind of subtracting of the negative control counts, although this is suggested in the Nanostring documentation. Just provide DESeq2 with the original counts for the samples that pass QC.
I obtained nice results so far using this approach, the results make sense and everything checks out in quality control plots, like MA with the housekeeping genes highlighted.
Sorry, forgot to add: for heatmaps and PCA, you can work with the normalized output of RUV.
See the plotPCA and normalizedCounts functions in RUVSeq:
https://bioconductor.org/packages/release/bioc/vignettes/RUVSeq/inst/doc/RUVSeq.pdf
To make a heatmap on variance stabilized normalized counts, I think these two approaches should be roughly equivalent. Normally I would say to estimate the factors of unwanted variation, apply vst() to raw counts, then use limma's removeBatchEffect to remove the factors from the transformed data, then re-assign to the DESeqTranform object. I can provide code for this if you like, but you can probably find it by searching the site for "vst removeBatchEffect". Alternatively, you could round the normalized counts output from RUV and create a DESeqDataSet from these, then apply vst(). I think these would be about the same.
As always thank you so much! I will give this a try and let you know how it goes.
One last question: do you do anything at all with the “positive” and “negative” outputs from the Nanostring? Do you still keep them in the data set?
I use housekeeping genes only. I didn't keep any non-endogenous in the dataset.
I think non-endogenous genes are useful for deciding which samples to throw out entirely, but I think they probably introduce more noise and artifact than provide any benefit when they are used for normalization. I have limited experience, but I could see that known associations and particular known distributions among the donors pop out only after using RUV on the endogenous genes, with the housekeeping set as control genes. This is the whole RMA story again but with counts.
Sounds reasonable. Thanks so much!! I will let you know how it goes :)
Hi Michael, Thanks for this reply! I've been searching for how to incorporate DESeq2 differential expression into Nanostring data for a while. I'm a little confused as to how to perform the final step of the
removeBatchEffect()
method. I assumed that the RUVg factors are passed toremoveBatchEffect()
as thecovariates
argument, but A: How do I extract read counts from DESeq2 suggests that it may be passed as thebatch
argument.This is a summary of my steps (I can provide sample data, if needed):
rmBatchCounts
andnewRmBatchCounts
are both pretty similar, and additionally are more similar than if I were to usebatch
as the argument when makingrmBatchCounts
.As an aside, I'm using the steps outlined in your github post/the RUVSeq vignette for the actual differential expression analysis, but I would like to be able to plot heatmaps/PCA of the batch-adjusted counts, as Casey initially mentioned.
Any help is greatly appreciated!
I'll tell you the way that I used RUV and DESeq2 for differential expression. I added the factors estimated by RUV into the DESeq2 design formula, e.g.:
For making heatmaps of VST data where a batch effect is removed, you can do:
Where
factors
is a matrix of the factors of unwanted variation,cbind
-ed together.Perfect, thanks so much for the response!
Hi, as a follow up question to this. Do you still import the raw counts or the expression counts after normalization with housekeeping, positive and negative control genes?
This is answered in a thread above