Search
Question: msmsTests, NA values in test.results output
0
gravatar for laural710
5 weeks ago by
laural7100
laural7100 wrote:

Hi All

I'm strugglin with the analysis of DEP in a protein dataset i have. I have managed to analyse to this point in R using variants of MSGF etc and quantified via SI. Having run the samples via DAPAR and Prostar, i have knowldege of the number (or at least the region) of DEP to expect in this particular comparison (eg. CTRL to 0.01 mg/L). Using msmsTests, i create the hypothesis tests (H0 and H1), subset the data set to only these two groups and then run msms.EdgeR() with div = to the colSums().

This all runs fine but when i get to the portion or adding FDR to the data via test.results, something weird happens where instead of getting a dataset (so to speak) of protein id, raw spectral counts for control condition, raw spectral conts for comparitive condition, logfold change (LFC.AV) (logfc, LR, p.value etc), the second column (which is the raw spectral counts for the ctrl condition is always NA irrespective of which concentration i choose. I''m unsure why this is the case, when the msms.EdgeR runs ok with the comparisons. Has this happened to anyone else? Is there a particular reason this might occur that i can check to see if my data has been entered incorrectly?

I'm scratching my head over what would cause this. I did wonder whether it had anything to do with my estimates of dispersion used in the msms.EdgeR model build but i'm unsure how to proceed now.

Any help would be much appreciated. I can of course provide the code but i am hoping someone else has stumbled across this.

Many thanks :) 

ADD COMMENTlink modified 4 weeks ago by Laurent Gatto1.0k • written 5 weeks ago by laural7100
Do you have any `NA` values in your original count data by any chance?
ADD REPLYlink written 4 weeks ago by Laurent Gatto1.0k

No, this is what is frustrating me. As part of the preparation of my experimental MSnSet, i remove any lines with NA (pNA=1/3 and then impute any remaining NA values with knn). When combining the data set, i use imputation QRILC to ensure that there are no NA values and then check with is.na(exprs(e2)) || exprs(e2) < 0) which is always False. 

ADD REPLYlink written 4 weeks ago by laural7100
It would probably be helpful to provide code, detailed output and possibly example data. Also, don't forget to add the output of `sessionInfo()`.
ADD REPLYlink written 4 weeks ago by Laurent Gatto1.0k

Many thanks for your quick reply. I've attached the code that i have used but i'm unsure if it is in the correct format? Any help would be greatly appreicated

ADD REPLYlink written 4 weeks ago by laural7100

BaP_Exp consists of 12 samples, 3 biological replicates of 4 conditions. The data below as been subsetted to two comparisons. BaP_exp is a large MSnSet.

  1. Code for analysis mix1=BaP_exp[,BaP_exp$Group %in% c("Ctrl","5")]

  2. new=pp.msms.data(mix1) pData(new) ##6 samples

  3. H0="y~" H1="y~Group" dispersion=colSums(exprs(new))

check for na values

  1. is.na(exprs(new))|| exprs(new) < 0 #always FALSE

  2. mixed=msms.edgeR(new,H0,H1,div=dispersion, fnm="Group")

returns NA values or more recently error: negative counts not allowed

SessionInfo()

R version 3.5.0 (2018-04-23)

Platform: x86_64-w64-mingw32/x64 (64-bit)

Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:

[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 LC_MONETARY=English_United Kingdom.1252

[4] LC_NUMERIC=C LC_TIME=English_United Kingdom.1252

attached base packages:

[1] parallel statsgraphic grDevices utils datasets methods base

other attached packages:

[1] msmsTests_1.18.0 msmsEDA_1.18.0 MSnID_1.14.0 MSGFplus_1.14.0 BiocInstaller_1.30.0 MSnbase_2.6.1 
[7] ProtGenerics_1.12.0 BiocParallel_1.14.1 mzR_2.14.0 Rcpp_0.12.19 Biobase_2.40.0 BiocGenerics_0.26.0

loaded via a namespace (and not attached):

Error in x[["Version"]] : subscript out of bounds

In addition: Warning messages:

1: In FUN(X[[i]], ...) :

DESCRIPTION file of package 'IRanges' is missing or broken

2: In FUN(X[[i]], ...) :

DESCRIPTION file of package 'limma' is missing or broken
ADD REPLYlink modified 4 weeks ago by Laurent Gatto1.0k • written 4 weeks ago by laural7100

It's difficult to say much without the actual code and it's output, as it is executed in R. Two things though

  1. there are warnings at the end of your sessionInfo; I would recommend fixing them befoe doing anything else. Re-installing the packages light help. If you have issues or more questions about this step, please ask them in a new question on the forum.

  2. Following exactly what is presented in the msmsTests vignette, your H0 should be defined as H0 = "y~1". I don't know if this is the reason for your issue though.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Laurent Gatto1.0k

How can i provide the code as its produced in R? Do i just copy it under the line of run code? 

I've never been able to attach code before on this forum and am slightly lost. Sorry for the inconvenience

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by laural7100

I didn't mean to attach it, but run all the commands in your R session, then copy the whole lot and their output and paste it here. For example:

> library(msdata)
> library(MSnbase)
> f <- msdata::proteomics(full.names = TRUE, pattern = "TMT11")
> f
[1] "/home/lgatto/R/x86_64-pc-linux-gnu-library/3.5/msdata/proteomics/MS3TMT11.mzML"
> x <- readMSData(f, mode = "onDisk")
> x
MSn experiment data ("OnDiskMSnExp")
Object size in memory: 0.5 Mb
- - - Spectra data - - -
 MS level(s): 1 2 3
 Number of spectra: 994
 MSn retention times: 45:27 - 47:6 minutes
- - - Processing information - - -
Data loaded [Thu Oct 18 15:37:06 2018]
 MSnbase version: 2.7.8
- - - Meta data  - - -
phenoData
  rowNames: MS3TMT11.mzML
  varLabels: sampleNames
  varMetadata: labelDescription
Loaded from:
  MS3TMT11.mzML
protocolData: none
featureData
  featureNames: F1.S001 F1.S002 ... F1.S994 (994 total)
  fvarLabels: fileIdx spIdx ... spectrum (30 total)
  fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by Laurent Gatto1.0k

Many thanks for your quick reply. I've ran the code again and am now getting a new error message. I've updated my packages since i first asked my question, but i don't think that is what is going wrong. Library(MSGFplus, MSnbase, MSnID,vsn, imputeLCMD)

> t1=c("20171124VS37.mzML")

> q1=c("20171124VS37.mzid")
> bap_0a=readMSData(t1, mode="onDisk")
> bap_0a=addIdentificationData(bap_0a,q1)
> t2=c("20171124VS38.mzML")
> q2=c("20171124VS38.mzid")
> bap_0b=readMSData(t2, mode="onDisk")
> bap_0b=addIdentificationData(bap_0b,q2)
> bap_0c=readMSData(t3, mode="onDisk") .... (3 MORE SAMPLES)

> nms=c(paste0("bap_",0,c("a","b","c")),
+       paste0("bap_",5,c("a","b","c")))
> tmp <- sapply(nms, function(.bap) {
+   cat("Processing", .bap, "... ")
+   x <- get(.bap, envir = .GlobalEnv)
+   x=quantify(x,method="SI")
+   x <- topN(x, groupBy = fData(x)$DatabaseAccess, n = 3)
+   nPeps <- nQuants(x, fData(x)$DatabaseAccess)
+   x <- combineFeatures(x, fData(x)$DatabaseAccess,
+                        redundancy.handler = "unique",
+                        fun="sum", na.rm = TRUE, CV=FALSE)
+   exprs(x) <- exprs(x) * (3/nPeps)
+   x <- updateFvarLabels(x, .bap)
+   varnm <- sub("bap", "bap", .bap)
+   assign(varnm, x, envir = .GlobalEnv)
+   cat("done\n")
+ })
> ctrl=combine(bap_0a,bap_0b)
> ctrl=combine(ctrl, bap_0c)
> ctrl=filterNA(ctrl, pNA=1/3)
> ctrl=impute(ctrl, method="knn")
> sampleNames(ctrl)=c("CTRL.a","CTRL.b","CTRL.c")
> bap5=combine(bap_5a,bap_5b)
> bap5=combine(bap5,bap_5c)
> bap5=filterNA(bap5,pNA=1/3)
> bap5=impute(bap5, method="knn")
> sampleNames(bap5)=c("BaP.5a","BaP.5b","BaP.5c")
> BaP_exp=combine(ctrl,bap5)
> BaP_exp=impute(BaP_exp, "QRILC")
> BaP_exp
MSnSet (storageMode: lockedEnvironment)
assayData: 2265 features, 6 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: CTRL.a CTRL.b ... BaP.5c (6 total)
  varLabels: sampleNames
  varMetadata: labelDescription
featureData
  featureNames: sp|P56839|PEPM_MYTED sp|P91958|TPM_MYTGA ... tr|U3PWG3|U3PWG3_9BIVA (2265 total)
  fvarLabels: fileIdx.bap_0a spIdx.bap_0a ... CV.20171124VS42.mzML.1.bap_5c (390 total)
  fvarMetadata: labelDescription
experimentData: use 'experimentData(object)'
Annotation:  
- - - Processing information - - -
Combined [1461,3] and [1285,3] MSnSets Thu Oct 18 15:33:34 2018 
Data imputation using QRILC Thu Oct 18 15:33:41 2018 
  Using default parameters 
 MSnbase version: 2.6.1 

 

ADD REPLYlink written 4 weeks ago by laural7100

DEP (load msmsTests & msmsEDA)

pData(BaP_exp)$Group=rep(c("Ctrl","5"),each=3)
> pData(BaP_exp)
             sampleNames Group
CTRL.a 20171124VS37.mzML  Ctrl
CTRL.b 20171124VS38.mzML  Ctrl
CTRL.c 20171124VS39.mzML  Ctrl
BaP.5a 20171124VS40.mzML     5
BaP.5b 20171124VS41.mzML     5
BaP.5c 20171124VS42.mzML     5

> bap_data=pp.msms.data(BaP_exp)
> processingData(BaP_exp)
- - - Processing information - - -
Combined [1461,3] and [1285,3] MSnSets Thu Oct 18 15:33:34 2018 
Data imputation using QRILC Thu Oct 18 15:33:41 2018 
  Using default parameters 
 MSnbase version: 2.6.1 

> H0="y~1" 

> H1="y~Group" 
> disp2=colSums(exprs(bap_data))
> disp2
  CTRL.a   CTRL.b   CTRL.c   BaP.5a   BaP.5b   BaP.5c 
40684.87 40461.76 40564.24 38956.89 39035.39 38819.56 

> is.na(exprs(bap_data)) || exprs(bap_data) < 0
[1] FALSE

 mixed=msms.edgeR(bap_data,H0,H1,div=disp2, facs=facs)
Error in glmLRT(fit, coef = vc) : 
  Need at least two columns for design, usually the first is the intercept column

> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                            LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] imputeLCMD_2.0       impute_1.54.0        pcaMethods_1.72.0    norm_1.0-9.5         tmvtnorm_1.4-10      gmm_1.6-2           
 [7] sandwich_2.5-0       Matrix_1.2-14        mvtnorm_1.0-8        vsn_3.48.1           MSnID_1.14.0         MSGFplus_1.14.0     
[13] msmsTests_1.18.0     msmsEDA_1.18.0       BiocInstaller_1.30.0 MSnbase_2.6.1        ProtGenerics_1.12.0  BiocParallel_1.14.1 
[19] mzR_2.14.0           Rcpp_0.12.19         Biobase_2.40.0       BiocGenerics_0.26.0 

 

ADD REPLYlink written 4 weeks ago by laural7100
2
gravatar for Laurent Gatto
4 weeks ago by
Laurent Gatto1.0k
United Kingdom
Laurent Gatto1.0k wrote:

I'll attempt an answer here. I would suggest you reformulate your logical operator that checks whether you have any missing or negative values. The double operators && and || can be very confusing. Here's an example showing that they don't necessarily work in cases such as yours.

> x
   F1  F2   F3
P1  1 1.2  1.3
P2  1 1.0  1.0
P3  1 1.0  1.0
P4  1 1.0 -1.0
>  is.na(x) || x < 0
[1] FALSE
> anyNA(x)
[1] FALSE
> any(x < 0)
[1] TRUE

Also, care should be taken in the presence of NAs

> x[4, 3] <- 1
> x[2, 2] <- NA
> x
   F1  F2  F3
P1  1 1.2 1.3
P2  1  NA 1.0
P3  1 1.0 1.0
P4  1 1.0 1.0
>  is.na(x) || x < 0
[1] FALSE
> anyNA(x)
[1] TRUE
> any(x < 0)
[1] NA

You can also use any(..., na.rm = TRUE).

I would suggest to use the following syntax

> x
   F1  F2  F3
P1  1 1.2 1.3
P2  1  NA 1.0
P3  1 1.0 1.0
P4  1 1.0 1.0
> any(is.na(x) | x < 0)
[1] TRUE
> x[2, 2] <- 1
> x[4, 3] <- -1
> x
   F1  F2   F3
P1  1 1.2  1.3
P2  1 1.0  1.0
P3  1 1.0  1.0
P4  1 1.0 -1.0
> any(is.na(x) | x < 0)
[1] TRUE

Re the double form of the logical operators, have a look at the manual page (?& will get you there), which says:

‘&’ and ‘&&’ indicate logical AND and ‘|’ and ‘||’ indicate logical OR. The shorter form performs elementwise comparisons in much the same way as arithmetic operators. The longer form evaluates left to right examining only the first element of each vector. Evaluation proceeds only until the result is determined. The longer form is appropriate for programming control-flow and typically preferred in ‘if’ clauses.

Hope this help!

ADD COMMENTlink written 4 weeks ago by Laurent Gatto1.0k

Thanks. Turns out it was | operator that seemed to be throwing it off. Many thanks. 

ADD REPLYlink written 4 weeks ago by laural7100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 359 users visited in the last hour