removal of outliers in matrix
1
0
Entering edit mode
@johannes-hanson-1604
Last seen 9.6 years ago
Dear all, After some work with analysis of micro array data I am now facing my first metabolomics dataset. The first problem I encountered is that the structure of the data is different from what I am used to. Due to the alignment of the chromatogram I do have extreme outliers within the dataset. The alignment is good (and I don't want to manually adjust 8000 peaks). If I could easily remove the outliers the rest of the analysis would be easier. The outliers I want to remove are most often a total lack of signal as the peak is missing. I do have five replicates of each treatment I am looking for something that could remove only the extreme outliers (sample nr nine in the example below). A typical outlier: Untreated 0.00040016 0.001029071 0.00101226 0.000739958 0.000288475 Treated 5.58151787 4.146639291 4.080655391 0.00120032 4.786810001 The data is structured as a matrix with one line per peak and the replicates as individual columns (much like micro array data). Thanks for any suggestions on how to continue Johannes
Alignment Alignment • 1.3k views
ADD COMMENT
0
Entering edit mode
@saroj-mohapatra-1446
Last seen 9.6 years ago
Hello Johannes: If I understand correctly, you have a matrix of data that have variables (metabolites) as rows and sample-replicates as columns. For example, for two metabolites: > my.data Con.1 Con.2 Con.3 Con.4 Con.5 Trt.1 Trt.2 Trt.3 Trt.4 Trt.5 Metab.1 0 0 0 0 0 5.58 4.15 4.08 0.00 4.79 Metab.2 0 0 0 0 0 5.58 0.00 4.08 4.08 4.79 The outliers are. for Metab.1, Trt.4 and for Metab.2, Trt.2 I could use simple rules like (any value that is 1 S.D below or above mean) to detect the outliers. > apply(my.data, 1, function(y) {x=y[6:10]; which(x<(mean(x)-sd(x)) | x > (mean(x)+sd(x))) } ) Metab.1 Metab.2 4 2 Gives you the sample that is the outlier for each metabolite. If you want a new matrix with the outliers removed: > new.data=t(apply(my.data, 1, function(y) {x=y[6:10]; sel=(x>(mean(x)-sd(x))&(x<(mean(x)+sd(x))));c(y[1:5],x[sel])})) > new.data [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] Metab.1 0 0 0 0 0 5.58 4.15 4.08 4.79 Metab.2 0 0 0 0 0 5.58 4.08 4.08 4.79 I have assumed that (1) there is only one outlier, and (2) the replicates are tightly close to each other, except for the outlier. HTH Saroj Johannes Hanson wrote: >Dear all, > >After some work with analysis of micro array data I am now facing my first >metabolomics dataset. >The first problem I encountered is that the structure of the data is >different from what I am used to. Due to the alignment of the chromatogram I >do have extreme outliers within the dataset. The alignment is good (and I >don't want to manually adjust 8000 peaks). If I could easily remove the >outliers the rest of the analysis would be easier. >The outliers I want to remove are most often a total lack of signal as the >peak is missing. I do have five replicates of each treatment I am looking >for something that could remove only the extreme outliers (sample nr nine in >the example below). > >A typical outlier: >Untreated >0.00040016 0.001029071 0.00101226 0.000739958 0.000288475 >Treated >5.58151787 4.146639291 4.080655391 0.00120032 4.786810001 > >The data is structured as a matrix with one line per peak and the replicates >as individual columns (much like micro array data). > >Thanks for any suggestions on how to continue > >Johannes > >_______________________________________________ >Bioconductor mailing list >Bioconductor at stat.math.ethz.ch >https://stat.ethz.ch/mailman/listinfo/bioconductor >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >
ADD COMMENT

Login before adding your answer.

Traffic: 727 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6