I am trying to use NanoStringDiff for differential expression analysis of a nanostring data-set with 506 endogenous genes in the set. I was wondering how/if there is a way to output the normalized data that NanoStringDiff uses to run the differential expression LRT tests? I have been able to run the differential expression analyses correctly (I hope!), but now would like to know if there is a way to access the normalized data to use for PCA plots, heatmaps, etc? I tried assay(exprs) but it just gave me the raw counts. Thanks!
#Load data
path<-paste(dir,"nanostring_R.csv",sep="/")
designs <- data.frame(group=c("WT_IL13", "WT_IL13", "WT_IL13", "WT_CTRL", "WT_CTRL", "WT_CTRL", "RA1_IL13", "RA1_IL13", "RA1_IL13", "RA1_CTRL", "RA1_CTRL", "RA1_CTRL"))
#Create a Nanostring dataset
nanostringdata <- createNanoStringSetFromCsv(path = path, header = TRUE, designs = designs)
#Run DE analysis
pheno=pData(nanostringdata) group=pheno$group design.full=model.matrix(~0+group) design.full
NanoStringData_Norm <- estNormalizationFactors(nanostringdata)
#Get Results for pairwise contrasts
result_WT <- glm.LRT(NanoStringData_Norm,design.full,contrast=c(0,0,-1,1))
Thank you!
I just gave the code a try and I got stuck on this step with a warning message:
Y_n = sweep(Y, 2, lamda_i, FUN = "-")
Warning message:
In max(cumDim[cumDim <= lstats]) :
no non-missing arguments to max; returning -Inf
Anything I might be doing wrong? The code runs through but there are just NA in the final log(Y_nph)
That error comes from some checking in sweep to make sure that the length of lambda_i is reasonable for the dimensions of the matrix you are sweeping on. So there appears to be a problem with either your Y matrix or whatever you are getting for lambda_i. You need to take a look at those data and see what's up.
I was trying to run it before calling the estNormalizationFactors. Fixed it now and have the output. Thank you bunches!
Hello,
I get similar situation like above.
To get normalized data for plotting, I tried to use
NanoStringDataNormalization
, but that normalized data looks not consistent to the logFC provided by glm.LRT.I found this comment and compared the normalized matrix using this code (without the last log transformation) after
estNormalizationFactors
with raw data, and that byNanoStringDataNormalization
with the same raw data, but those two are quite different.which one should I use?
ps. I really appreciate your package though.
You cannot generate log fold changes you get from a generalized linear model 'by hand'. In other words, there is no formula that you can plug data into, in order to get the results the GLM will provide. The parameters for the GLM are estimated using an iterative procedure that you won't be able to replicate, and the 'normalized' data we are talking about are just gross estimates that are useful for plotting.