Question: DESeq2 on NanoString Data
gravatar for casey.rimland
10 weeks ago by
University of Cambridge, National Institutes of Health, Chapel Hill School of Medicine
casey.rimland100 wrote:

I was wondering if there has been any more consensus recently on using DESeq2 to perform analyses of NanoString data? I have a NanoString dataset with 506 endogenous genes for four sample groups and was looking for the best way to analyze the data.

What if I just wanted to use DESeq2 to be able to normalize the count data from the Nanostring using VST to make PCA plots and heat maps for instance?


ADD COMMENTlink modified 10 weeks ago by Michael Love19k • written 10 weeks ago by casey.rimland100
gravatar for Michael Love
10 weeks ago by
Michael Love19k
United States
Michael Love19k wrote:

I’ve used in lately on Nanostring data across 100s of samples and it works well. Recovers the expected genes. I use it in combination with RUV, controlling for housekeeping genes.

ADD COMMENTlink written 10 weeks ago by Michael Love19k

Thanks for the quick reply! I have not used RUV before so will have to go take a look at that. Do you use RUV before DESeq()? Or is it your method for normalization before things like PCAs/heatmaps and then you still just give the raw Nanostring counts to DESeq() as you would with RNA-Seq data? I've been only working with RNA-Seq data recently, but a labmate has NanoString data we are trying to analyze now.

Do you have any opinions on the NanoStringDiff package? 

ADD REPLYlink modified 10 weeks ago by Michael Love19k • written 10 weeks ago by casey.rimland100

Here's an example of how to use RUV with DESeq2 (usually I would link directly to the bioc landing page but it seems to be down right now, so i link to my github page):

The approach is to calculate factors of unwanted variation (you can use control genes -- this is what i did with the housekeeping genes on the panel -- or use all the genes but this is probably not a good idea with Nanostring).

Then you supply the factors to DESeq2 in the front of the design formula with the biological condition at the end of the design formula. Importantly, DESeq2 runs on the original counts.

Note: I do not recommend any kind of subtracting of the negative control counts, although this is suggested in the Nanostring documentation. Just provide DESeq2 with the original counts for the samples that pass QC.

I obtained nice results so far using this approach, the results make sense and everything checks out in quality control plots, like MA with the housekeeping genes highlighted.

ADD REPLYlink written 10 weeks ago by Michael Love19k

Sorry, forgot to add: for heatmaps and PCA, you can work with the normalized output of RUV.

See the plotPCA and normalizedCounts functions in RUVSeq:

To make a heatmap on variance stabilized normalized counts, I think these two approaches should be roughly equivalent. Normally I would say to estimate the factors of unwanted variation, apply vst() to raw counts, then use limma's removeBatchEffect to remove the factors from the transformed data, then re-assign to the DESeqTranform object. I can provide code for this if you like, but you can probably find it by searching the site for "vst removeBatchEffect". Alternatively, you could round the normalized counts output from RUV and create a DESeqDataSet from these, then apply vst(). I think these would be about the same.

ADD REPLYlink written 10 weeks ago by Michael Love19k

As always thank you so much! I will give this a try and let you know how it goes. 

One last question: do you do anything at all with the “positive” and “negative” outputs from the Nanostring? Do you still keep them in the data set?

ADD REPLYlink written 10 weeks ago by casey.rimland100

I use housekeeping genes only. I didn't keep any non-endogenous in the dataset.

I think non-endogenous genes are useful for deciding which samples to throw out entirely, but I think they probably introduce more noise and artifact than provide any benefit when they are used for normalization. I have limited experience, but I could see that known associations and particular known distributions among the donors pop out only after using RUV on the endogenous genes, with the housekeeping set as control genes. This is the whole RMA story again but with counts.

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by Michael Love19k

Sounds reasonable. Thanks so much!! I will let you know how it goes :)

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by casey.rimland100

Hi Michael, Thanks for this reply! I've been searching for how to incorporate DESeq2 differential expression into Nanostring data for a while. I'm a little confused as to how to perform the final step of the removeBatchEffect() method. I assumed that the RUVg factors are passed to removeBatchEffect() as the covariates argument, but A: How do I extract read counts from DESeq2 suggests that it may be passed as the batch argument.

This is a summary of my steps (I can provide sample data, if needed):

# Create DESeqDataSet and SeqExpression Sets from raw data

DDS = DESeqDataSetFromMatrix(countData = myData, colData = myMeta, design = myDesign)

SES = newSeqExpressionSet(counts = as.matrix(myData), phenoData = myMeta

# Apply vst() to raw counts 

VSD = varianceStabilizingTransformation(DDS) `

### Method 1 - removeBatchEffect() 

#Estimate factors of unwanted variation 

normSES = RUVg(SES, housekeepingGenes, k = 1) 

# Remove factors 

rmBatchCounts = removeBatchEffect(assay(VSD), covariates = normSES$W_1)  

### Method 2 - RUV norm counts 

# Make new DESeqDataSet using normalized RUV counts 

newDDS = DESeqDataSetFromMatrix(countData = normCounts(normSES), colData = pData(normSES), design = ~ W_1 + Treatment) 

# Apply vst to this dataset 

newVSD = varianceStabilizingTransformation(newDDS) 

newRmBatchCounts = assay(newVSD)

rmBatchCounts and newRmBatchCounts are both pretty similar, and additionally are more similar than if I were to use batch as the argument when making rmBatchCounts.

As an aside, I'm using the steps outlined in your github post/the RUVSeq vignette for the actual differential expression analysis, but I would like to be able to plot heatmaps/PCA of the batch-adjusted counts, as Casey initially mentioned.

Any help is greatly appreciated!

ADD REPLYlink modified 24 days ago • written 24 days ago by wes0

I'll tell you the way that I used RUV and DESeq2 for differential expression. I added the factors estimated by RUV into the DESeq2 design formula, e.g.:

design(dds) <- ~ W1 + W2 + condition

For making heatmaps of VST data where a batch effect is removed, you can do:

assay(vsd) <- removeBatchEffects(assay(vsd), covariates=factors)

Where factors is a matrix of the factors of unwanted variation, cbind-ed together.

ADD REPLYlink modified 22 days ago • written 22 days ago by Michael Love19k

Perfect, thanks so much for the response!

ADD REPLYlink written 22 days ago by wes0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 127 users visited in the last hour