How to access normalized data in the NanoStringDiff package?
1
0
Entering edit mode
casey.rimland ▴ 150
@caseyrimland-14915
Last seen 4.0 years ago
University of Cambridge, National Insti…

I am trying to use NanoStringDiff for differential expression analysis of a nanostring data-set with 506 endogenous genes in the set. I was wondering how/if there is a way to output the normalized data that NanoStringDiff uses to run the differential expression LRT tests? I have been able to run the differential expression analyses correctly (I hope!), but now would like to know if there is a way to access the normalized data to use for PCA plots, heatmaps, etc? I tried assay(exprs) but it just gave me the raw counts. Thanks!

path<-paste(dir,"nanostring_R.csv",sep="/")
designs <- data.frame(group=c("WT_IL13", "WT_IL13", "WT_IL13", "WT_CTRL", "WT_CTRL", "WT_CTRL", "RA1_IL13", "RA1_IL13", "RA1_IL13", "RA1_CTRL", "RA1_CTRL", "RA1_CTRL"))

#Create a Nanostring dataset
nanostringdata <- createNanoStringSetFromCsv(path = path, header = TRUE, designs = designs)

#Run DE analysis
pheno=pData(nanostringdata)
group=pheno\$group
design.full=model.matrix(~0+group)
design.full

NanoStringData_Norm <- estNormalizationFactors(nanostringdata)

#Get Results for pairwise contrasts
result_WT <- glm.LRT(NanoStringData_Norm,design.full,contrast=c(0,0,-1,1))
nanostringdiff nanostring NanoStringDiff • 1.1k views
1
Entering edit mode
@james-w-macdonald-5106
Last seen 16 hours ago
United States

I don't think there is a direct accessor, but this is what is done to the data prior to fitting any model:

    c = positiveFactor(NanoStringData)
d = housekeepingFactor(NanoStringData)
k = c * d
lamda_i = negativeFactor(NanoStringData)
Y = exprs(NanoStringData)
Y_n = sweep(Y, 2, lamda_i, FUN = "-")
Y_nph = sweep(Y_n, 2, k, FUN = "/")
Y_nph[Y_nph <= 0] = 0.1

And then

     Y_nph <- log(Y_nph)

will give you data that you can plot.

0
Entering edit mode

Thank you!

I just gave the code a try and I got stuck on this step with a warning message:

Y_n = sweep(Y, 2, lamda_i, FUN = "-")

Warning message:
In max(cumDim[cumDim <= lstats]) :
no non-missing arguments to max; returning -Inf

Anything I might be doing wrong? The code runs through but there are just NA in the final log(Y_nph)

0
Entering edit mode

That error comes from some checking in sweep to make sure that the length of lambda_i is reasonable for the dimensions of the matrix you are sweeping on. So there appears to be a problem with either your Y matrix or whatever you are getting for lambda_i. You need to take a look at those data and see what's up.

0
Entering edit mode

I was trying to run it before calling the estNormalizationFactors. Fixed it now and have the output. Thank you bunches!

0
Entering edit mode

Hello,

I get similar situation like above.

To get normalized data for plotting, I tried to use NanoStringDataNormalization , but that normalized data looks not consistent to the logFC provided by glm.LRT.

I found this comment and compared the normalized matrix using this code (without the last log transformation) after estNormalizationFactors with raw data, and that by NanoStringDataNormalization with the same raw data, but those two are quite different.

which one should I use?

ps. I really appreciate your package though.

0
Entering edit mode

You cannot generate log fold changes you get from a generalized linear model 'by hand'. In other words, there is no formula that you can plug data into, in order to get the results the GLM will provide. The parameters for the GLM are estimated using an iterative procedure that you won't be able to replicate, and the 'normalized' data we are talking about are just gross estimates that are useful for plotting.