probeset level visualization with standard deviation...

0

Entering edit mode

Paul Hammer ▴ 10

@paul-hammer-2669

Last seen 10.1 years ago

hi members, i am looking for a function which plots me average probeset levels with standard deviations (e.g. tissues 1 against tissue 2). Is there any functions which could do this. i work with affymetrix chips which have often three times determination. an example for a better understanding: after using the functions read.exon() and rma() i specify my probeset_ids: > probeset = gene.to.probeset("ensemle_gene_id") > exon_probesets = select.probewise(probeset, filter="exonic") now i would like to have the average values with the standard deviation for these probeset_id log levels to plot e.g. two tissues (here heart and breast) against later. probeset_id heart_A.CEL heart_B.CEL heart_C.CEL breast_A.CEL breast_B.CEL breast_C.CEL 1. 6.5432 6.7833 6.3234 4.6566 5.2121 4.7654 2. 5.8422 6.0813 6.2334 5.2526 5.1236 4.9234 3. 5.8422 6.0813 6.2334 5.2526 5.1236 4.9234 is there any simple way? till now i extract all data from a data.frame given by splicing.index(), calculate average and sd for every probeset_id and tissue, which is quite complicated. thanks paul

• 900 views

ADD COMMENT • link updated 16.6 years ago by Martin Morgan 25k • written 16.6 years ago by Paul Hammer ▴ 10

0

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 11 weeks ago

United States

Hi Paul -- Someone will probably contribute a one line solution to your problem, and be able to provide examples more directly related to the packages you are using. However... If you have a matrix m and an index idx, as you might with > library(Biobase) > data(sample.ExpressionSet) > obj <- sample.ExpresssionSet # save typing later > m <- exprs(obj) > m <- m[apply(m, 1, function(row) all(row>0)),] > m <- log(m) # log expression values, no NA's for simplicity below > idx <- sample(1:10, nrow(m)) # fake 'probe' values then to calculate the mean for each group represented by idx it is actually quite easy: > probeset <- rep(idx, ncol(m)) > meanByProbe <- tapply(m, probeset, mean) and similarly for variances or whatever other measure you'd like. You can think of this as 'flattening' m into a long vector, one column after another, then splitting the vector into groups based on a similarly-lengthed vector created by replicating 'idx', and finally calculating the mean for each group. For the more complicated summary I think you're after, you might (using median rather than mean, since these will be more robust measures of central tendency for non-normal data) > probeset <- rep(idx, ncol(m)) > trt <- rep(obj$type, each=nrow(m)) > meds <- tapply(m, list(probeset, trt), median) > meds Case Control 1 4.857189 4.894641 2 4.584452 4.671403 3 4.491588 4.556983 4 4.835686 4.769879 5 4.305809 4.432940 6 5.006192 5.022188 7 4.727795 4.652226 8 4.994262 5.153690 9 4.878079 5.000073 10 4.717114 4.786241 A measure of dispersion, like the median average deviation, can be calculated in a similar way: > mads <- tapply(m, list(probeset, trt), mad) A simple > plot(meds) gives a scatter plot of this data. I'm a fan of lattice. The equivalent of the above plot command might be > library(lattice) > df <- as.data.frame(mads) > with(df, xyplot(Case~Control)) If you were interested in just one group, I'd suggest a box-and-whiskers plot rather than summarizing the data as mean + standard error, something along the lines of > df <- data.frame(exprs=as.vector(m), probeset=probeset) > with(df, bwplot(probeset~exprs)) Confidence measures in each direction are definitely more complicated, and one might quickly become satisfied with something like > df <- data.frame(exprs=as.vector(m), probeset=probeset, treatment=trt) > with(df, bwplot(probeset~exprs|treatment)) which draws two different bwplot 'panels', one for each treatment, or > with(df, bwplot(probeset~exprs|treatment, + panel=function(...) { + panel.bwplot(...) + panel.abline(v=median(exprs)) + })) which draws the bwplot and then adds a vertical line corresponding to the median over all treatments (providing a useful visual reference for comparison). However, there's a demo that you can follow along with at > file.show(system.file("demo/intervals.R", package = "lattice")) The idea is that lattice displays plots in a 'panel'. This function from the demo prepanel.ci <- function(x, y, lx, ux, subscripts, ...) { x <- as.numeric(x) lx <- as.numeric(lx[subscripts]) ux <- as.numeric(ux[subscripts]) list(xlim = range(x, ux, lx, finite = TRUE)) } sets up the panel so that the x-axis limits are appropriate ('lx', 'ux' are vectors of lower and upper limits for each x point; you'd have to add a your own variables, e.g. ly, uy, to the function signature, and a ylim element to the list for error bars on both points). This function from the demo panel.ci <- function(x, y, lx, ux, subscripts, pch = 16, ...) { x <- as.numeric(x) y <- as.numeric(y) lx <- as.numeric(lx[subscripts]) ux <- as.numeric(ux[subscripts]) panel.abline(h = unique(y), col = "grey") panel.arrows(lx, y, ux, y, col = 'black', length = 0.25, unit = "native", angle = 90, code = 3) panel.xyplot(x, y, pch = pch, ...) } does the actual display of the points (panel.xyplot) and of the confidence interval (panel.arrows, see ?arrows for some hints). You'd add another panel.arrows to display the second set of confidence intervals. Finally, from the demo, with(singer.ucl, xyplot(voice.part ~ median, lx = lower, ux = upper, prepanel = prepanel.ci, panel = panel.ci)) invokes xyplot with the appropriate prepanel and panel functions, and with a data frame (singer.ucl) that, by this point in the example, has columns lower and upper describing where the confidence bounds are. Too much info, I know, but hopefully useful. Martin Paul Hammer <hammer_p at="" molgen.mpg.de=""> writes: > hi members, > > i am looking for a function which plots me average probeset levels with > standard deviations (e.g. tissues 1 against tissue 2). Is there any > functions which could do this. i work with affymetrix chips which have > often three times determination. > > an example for a better understanding: > > after using the functions read.exon() and rma() i specify my probeset_ids: > > > probeset = gene.to.probeset("ensemle_gene_id") > > exon_probesets = select.probewise(probeset, filter="exonic") > > now i would like to have the average values with the standard deviation > for these probeset_id log levels to plot e.g. two tissues (here heart > and breast) against later. > > probeset_id heart_A.CEL heart_B.CEL heart_C.CEL > breast_A.CEL breast_B.CEL breast_C.CEL > 1. 6.5432 6.7833 6.3234 > 4.6566 5.2121 4.7654 > 2. 5.8422 6.0813 6.2334 > 5.2526 5.1236 4.9234 > 3. 5.8422 6.0813 6.2334 > 5.2526 5.1236 4.9234 > > is there any simple way? till now i extract all data from a data.frame > given by splicing.index(), calculate average and sd for every > probeset_id and tissue, which is quite complicated. > > thanks > paul > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793

ADD COMMENT • link 16.6 years ago Martin Morgan 25k

0

Entering edit mode

Juan C Oliveros Collazos ▴ 40

@juan-c-oliveros-collazos-2489

Last seen 10.1 years ago

Dear Paul, Starting from your example with 3 genes and 6 columns, stored in a file named "testTable.txt", try the following code in R: data<-read.table("testTable.txt",sep="\t",header=1,row.names=1); sd_heart<-sd(t(data[1:3])); sd_breast<-sd(t(data[4:6])); avg_heart<-mean(as.data.frame(t(data[1:3]))); avg_breast<-mean(as.data.frame(t(data[4:6]))); write.table(file="result.txt",cbind(avg_heart,sd_heart,avg_breast,sd_b reast),sep="\t",col.names=NA); Now you should have a new file named "result.txt" with the averaged values and standard deviations. To plot these data (including error bars for SD) you can use FIESTA VIEWER: http://bioinfogp.cnb.csic.es/tools/FIESTA This on line tool transforms any text-tabulated file of microarray results into an interactive web site containing interactive plots with error bars and many filtering and visualizing options. Note that FIESTA VIEWER will not work with only 3 genes (as your example). It is designed to be used with the full microarray geneset. I hope that helps, Regards, Juan Carlos Oliveros BioinfoGP, CNB-CSIC, Spain. > hi members, > > i am looking for a function which plots me average probeset levels with > standard deviations (e.g. tissues 1 against tissue 2). Is there any > functions which could do this. i work with affymetrix chips which have > often three times determination. > > an example for a better understanding: > > after using the functions read.exon() and rma() i specify my probeset_ids: > > > probeset = gene.to.probeset("ensemle_gene_id") > > exon_probesets = select.probewise(probeset, filter="exonic") > > now i would like to have the average values with the standard deviation > for these probeset_id log levels to plot e.g. two tissues (here heart > and breast) against later. > > probeset_id heart_A.CEL heart_B.CEL heart_C.CEL > breast_A.CEL breast_B.CEL breast_C.CEL > 1. 6.5432 6.7833 6.3234 > 4.6566 5.2121 4.7654 > 2. 5.8422 6.0813 6.2334 > 5.2526 5.1236 4.9234 > 3. 5.8422 6.0813 6.2334 > 5.2526 5.1236 4.9234 > > is there any simple way? till now i extract all data from a data.frame > given by splicing.index(), calculate average and sd for every > probeset_id and tissue, which is quite complicated. > > thanks > paul > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 16.6 years ago Juan C Oliveros Collazos ▴ 40

Login before adding your answer.