Question

DESeq2 on NanoString Data

4

Entering edit mode

casey.rimland ▴ 170

@caseyrimland-14915

Last seen 7.5 years ago

University of Cambridge, National Insti…

I was wondering if there has been any more consensus recently on using DESeq2 to perform analyses of NanoString data? I have a NanoString dataset with 506 endogenous genes for four sample groups and was looking for the best way to analyze the data.

What if I just wanted to use DESeq2 to be able to normalize the count data from the Nanostring using VST to make PCA plots and heat maps for instance?

Thanks!

deseq2 nanostring • 7.5k views

ADD COMMENT • link updated 5 months ago by Michael Love 43k • written 7.7 years ago by casey.rimland ▴ 170

Michael Love · Answer 1 · 2018-06-06

2

Entering edit mode

Michael Love 43k

@mikelove

Last seen 1 day ago

United States

I’ve used in lately on Nanostring data across 100s of samples and it works well. Recovers the expected genes. I use it in combination with RUV, controlling for housekeeping genes.

ADD COMMENT • link 7.7 years ago Michael Love 43k

0

Entering edit mode

Thanks for the quick reply! I have not used RUV before so will have to go take a look at that. Do you use RUV before DESeq()? Or is it your method for normalization before things like PCAs/heatmaps and then you still just give the raw Nanostring counts to DESeq() as you would with RNA-Seq data? I've been only working with RNA-Seq data recently, but a labmate has NanoString data we are trying to analyze now.

Do you have any opinions on the NanoStringDiff package?

ADD REPLY • link updated 7.7 years ago by Michael Love 43k • written 7.7 years ago by casey.rimland ▴ 170

1

Entering edit mode

Here's an example of how to use RUV with DESeq2 (usually I would link directly to the bioc landing page but it seems to be down right now, so i link to my github page):

https://github.com/mikelove/rnaseqGene/blob/master/vignettes/rnaseqGene.Rmd#using-ruv-with-deseq2

The approach is to calculate factors of unwanted variation (you can use control genes -- this is what i did with the housekeeping genes on the panel -- or use all the genes but this is probably not a good idea with Nanostring).

Then you supply the factors to DESeq2 in the front of the design formula with the biological condition at the end of the design formula. Importantly, DESeq2 runs on the original counts.

Note: I do not recommend any kind of subtracting of the negative control counts, although this is suggested in the Nanostring documentation. Just provide DESeq2 with the original counts for the samples that pass QC.

I obtained nice results so far using this approach, the results make sense and everything checks out in quality control plots, like MA with the housekeeping genes highlighted.

ADD REPLY • link 7.7 years ago Michael Love 43k

0

Entering edit mode

Thanks Michael. To this end, do you routinely use RUV or only when you detect a batch? It looks like it works in a similar fashion to SVA which I have seen in the past that it over corrects. Just wanted to pick your brain on this.

ADD REPLY • link 5 months ago thkapell ▴ 10

0

Entering edit mode

Yes, if I discern a batch from EDA including looking at PCs, sample correlations, etc.

It is close to SVA, we discuss both in the workflow I linked to.

ADD REPLY • link 5 months ago Michael Love 43k

1

Entering edit mode

Sorry, forgot to add: for heatmaps and PCA, you can work with the normalized output of RUV.

See the plotPCA and normalizedCounts functions in RUVSeq:

https://bioconductor.org/packages/release/bioc/vignettes/RUVSeq/inst/doc/RUVSeq.pdf

To make a heatmap on variance stabilized normalized counts, I think these two approaches should be roughly equivalent. Normally I would say to estimate the factors of unwanted variation, apply vst() to raw counts, then use limma's removeBatchEffect to remove the factors from the transformed data, then re-assign to the DESeqTranform object. I can provide code for this if you like, but you can probably find it by searching the site for "vst removeBatchEffect". Alternatively, you could round the normalized counts output from RUV and create a DESeqDataSet from these, then apply vst(). I think these would be about the same.

ADD REPLY • link 7.7 years ago Michael Love 43k

0

Entering edit mode

As always thank you so much! I will give this a try and let you know how it goes.

One last question: do you do anything at all with the “positive” and “negative” outputs from the Nanostring? Do you still keep them in the data set?

ADD REPLY • link 7.7 years ago casey.rimland ▴ 170

1

Entering edit mode

I use housekeeping genes only. I didn't keep any non-endogenous in the dataset.

I think non-endogenous genes are useful for deciding which samples to throw out entirely, but I think they probably introduce more noise and artifact than provide any benefit when they are used for normalization. I have limited experience, but I could see that known associations and particular known distributions among the donors pop out only after using RUV on the endogenous genes, with the housekeeping set as control genes. This is the whole RMA story again but with counts.

ADD REPLY • link 7.7 years ago Michael Love 43k

0

Entering edit mode

Sounds reasonable. Thanks so much!! I will let you know how it goes :)

ADD REPLY • link 7.7 years ago casey.rimland ▴ 170

0

Entering edit mode

Hi Michael, Thanks for this reply! I've been searching for how to incorporate DESeq2 differential expression into Nanostring data for a while. I'm a little confused as to how to perform the final step of the removeBatchEffect() method. I assumed that the RUVg factors are passed to removeBatchEffect() as the covariates argument, but A: How do I extract read counts from DESeq2 suggests that it may be passed as the batch argument.

This is a summary of my steps (I can provide sample data, if needed):

# Create DESeqDataSet and SeqExpression Sets from raw data

DDS = DESeqDataSetFromMatrix(countData = myData, colData = myMeta, design = myDesign)

SES = newSeqExpressionSet(counts = as.matrix(myData), phenoData = myMeta

# Apply vst() to raw counts 

VSD = varianceStabilizingTransformation(DDS) `

### Method 1 - removeBatchEffect() 

#Estimate factors of unwanted variation 

normSES = RUVg(SES, housekeepingGenes, k = 1) 

# Remove factors 

rmBatchCounts = removeBatchEffect(assay(VSD), covariates = normSES$W_1)  

### Method 2 - RUV norm counts 

# Make new DESeqDataSet using normalized RUV counts 

newDDS = DESeqDataSetFromMatrix(countData = normCounts(normSES), colData = pData(normSES), design = ~ W_1 + Treatment) 

# Apply vst to this dataset 

newVSD = varianceStabilizingTransformation(newDDS) 

newRmBatchCounts = assay(newVSD)

rmBatchCounts and newRmBatchCounts are both pretty similar, and additionally are more similar than if I were to use batch as the argument when making rmBatchCounts.

As an aside, I'm using the steps outlined in your github post/the RUVSeq vignette for the actual differential expression analysis, but I would like to be able to plot heatmaps/PCA of the batch-adjusted counts, as Casey initially mentioned.

Any help is greatly appreciated!

ADD REPLY • link 7.5 years ago wes ▴ 10

0

Entering edit mode

I'll tell you the way that I used RUV and DESeq2 for differential expression. I added the factors estimated by RUV into the DESeq2 design formula, e.g.:

design(dds) <- ~ W1 + W2 + condition

For making heatmaps of VST data where a batch effect is removed, you can do:

assay(vsd) <- removeBatchEffects(assay(vsd), covariates=factors)

Where factors is a matrix of the factors of unwanted variation, cbind-ed together.

ADD REPLY • link 7.5 years ago Michael Love 43k

0

Entering edit mode

Perfect, thanks so much for the response!

ADD REPLY • link 7.5 years ago wes ▴ 10

0

Entering edit mode

Hi, as a follow up question to this. Do you still import the raw counts or the expression counts after normalization with housekeeping, positive and negative control genes?