Question

aggregate_summarizing expression values over entrez gene ids

0

Entering edit mode

Vanessa Vermeirssen ▴ 40

@vanessa-vermeirssen-2253

Last seen 9.6 years ago

Hi, I have a dataframe containing RMA normalized and summarized expression values for affymetrix probesets, av.data. I have looked up the Entrez gene ids for the probesets in the annotation package, entrezids. Multiple probesets map of course to the same entrez id and I would like to combine these data into one row, by averaging the expression values for the same entrez ids over the different experiments. I tried the function "aggregate" to do this, but somehow it gives an error that the arguments are not of the same length, but they are...??? How can I solve this or is there any other way to do this? See my code below... av.data <- read.table("humanGPL570avdata.txt", row.names = 1, sep = "\t", header = T, na.strings = "NA", fill = T) av.data[1:5,1:5] X1_Schwann_p1 X1_Schwann_p3 X2_accumbens X2_adipose 1007_s_at 9.281857 9.340795 9.151775 8.319741 1053_at 7.000684 6.867318 4.633061 5.101534 117_at 6.007608 6.124562 5.425565 5.692270 121_at 6.543294 6.728119 7.651856 7.692947 1255_g_at 3.077289 2.989938 4.622865 2.955812 X2_adipose_omental 1007_s_at 7.909480 1053_at 4.509407 117_at 6.298798 121_at 7.598834 1255_g_at 3.040816 probes <- ls(hgu133plus2ENTREZID) entrezids <- unlist(mget(probes,hgu133plus2ENTREZID)) newdata <- data.frame(entrezids,av.data) sum <- aggregate(av.data,as.list(entrezids),mean) Error in FUN(X[[1L]], ...) : arguments must have same length > length(as.list(entrezids)) [1] 54675 > dim(av.data) [1] 54675 69 sumdata <- aggregate(newdata,as.list(newdata$entrezids),mean) Error in FUN(X[[1L]], ...) : arguments must have same length > length(as.list(newdata$entrezids)) [1] 54675 > dim(newdata) [1] 54675 70 Thank you so much! Vanessa -- ================================================================== Vanessa Vermeirssen, PhD Tel:+32 (0)9 331 38 10 fax:+32 (0)9 3313809 VIB Department of Plant Systems Biology, Ghent University Technologiepark 927, 9052 Gent, BELGIUM vamei at psb.ugent.be http://www.psb.ugent.be

• 994 views

ADD COMMENT • link updated 15.5 years ago by James W. MacDonald 65k • written 15.5 years ago by Vanessa Vermeirssen ▴ 40

score 0 · Answer 1 · 2008-11-13

Hi Vanessa, Vanessa Vermeirssen wrote: > Hi, > > I have a dataframe containing RMA normalized and summarized expression > values for affymetrix probesets, av.data. > I have looked up the Entrez gene ids for the probesets in the annotation > package, entrezids. > Multiple probesets map of course to the same entrez id and I would like > to combine these data into one row, > by averaging the expression values for the same entrez ids over the > different experiments. > I tried the function "aggregate" to do this, but somehow it gives an > error that the arguments are not of the same length, but they are...??? > How can I solve this or is there any other way to do this? > > See my code below... > > av.data <- read.table("humanGPL570avdata.txt", row.names = 1, sep = > "\t", header = T, na.strings = "NA", fill = T) > av.data[1:5,1:5] > X1_Schwann_p1 X1_Schwann_p3 X2_accumbens X2_adipose > 1007_s_at 9.281857 9.340795 9.151775 8.319741 > 1053_at 7.000684 6.867318 4.633061 5.101534 > 117_at 6.007608 6.124562 5.425565 5.692270 > 121_at 6.543294 6.728119 7.651856 7.692947 > 1255_g_at 3.077289 2.989938 4.622865 2.955812 > X2_adipose_omental > 1007_s_at 7.909480 > 1053_at 4.509407 > 117_at 6.298798 > 121_at 7.598834 > 1255_g_at 3.040816 > > probes <- ls(hgu133plus2ENTREZID) > entrezids <- unlist(mget(probes,hgu133plus2ENTREZID)) > newdata <- data.frame(entrezids,av.data) > > sum <- aggregate(av.data,as.list(entrezids),mean) > Error in FUN(X[[1L]], ...) : arguments must have same length The problem here is you need a list of vectors, each as long as dim(av.data)[1]. What you have given is a list of vectors, each of length one. The difference is between list() and as.list(). If you use list(entrezids), you will get a list of length one, containing a vector of length 54675. If you use as.list(entrezids) you get a list of length 54675, each item containing one Entrez Gene ID. Does this make sense? Best, Jim > > > length(as.list(entrezids)) > [1] 54675 > > dim(av.data) > [1] 54675 69 > > sumdata <- aggregate(newdata,as.list(newdata$entrezids),mean) > Error in FUN(X[[1L]], ...) : arguments must have same length > > length(as.list(newdata$entrezids)) > [1] 54675 > > dim(newdata) > [1] 54675 70 > > > Thank you so much! > Vanessa > -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662

score 0 · Answer 2 · 2008-11-13

Hi Vanessa, Have a look at "tapply" and "by". But you could also think a bit more about the rationale for summarizing. The different probesets for the same Entrez gene ID are not replicates, and they are not equivalent. Some may be more valid or useful than others. An approach that I find useful is to determine the probeset that shows most variability, and then believe that one. Of course, one can also look at the actual mapping of the probes to the transcript and to the gene structure, and make a decision based on that. For imporant results, this is what I would recommend (besides, of course, wet-lab follow- up.) Best wishes Wolfgang -- ---------------------------------------------------- Wolfgang Huber EMBL-EBI http://www.ebi.ac.uk/huber Vanessa Vermeirssen wrote: > Hi, > > I have a dataframe containing RMA normalized and summarized expression > values for affymetrix probesets, av.data. > I have looked up the Entrez gene ids for the probesets in the annotation > package, entrezids. > Multiple probesets map of course to the same entrez id and I would like > to combine these data into one row, > by averaging the expression values for the same entrez ids over the > different experiments. > I tried the function "aggregate" to do this, but somehow it gives an > error that the arguments are not of the same length, but they are...??? > How can I solve this or is there any other way to do this? > > See my code below... > > av.data <- read.table("humanGPL570avdata.txt", row.names = 1, sep = > "\t", header = T, na.strings = "NA", fill = T) > av.data[1:5,1:5] > X1_Schwann_p1 X1_Schwann_p3 X2_accumbens X2_adipose > 1007_s_at 9.281857 9.340795 9.151775 8.319741 > 1053_at 7.000684 6.867318 4.633061 5.101534 > 117_at 6.007608 6.124562 5.425565 5.692270 > 121_at 6.543294 6.728119 7.651856 7.692947 > 1255_g_at 3.077289 2.989938 4.622865 2.955812 > X2_adipose_omental > 1007_s_at 7.909480 > 1053_at 4.509407 > 117_at 6.298798 > 121_at 7.598834 > 1255_g_at 3.040816 > > probes <- ls(hgu133plus2ENTREZID) > entrezids <- unlist(mget(probes,hgu133plus2ENTREZID)) > newdata <- data.frame(entrezids,av.data) > > sum <- aggregate(av.data,as.list(entrezids),mean) > Error in FUN(X[[1L]], ...) : arguments must have same length > > > length(as.list(entrezids)) > [1] 54675 > > dim(av.data) > [1] 54675 69 > > sumdata <- aggregate(newdata,as.list(newdata$entrezids),mean) > Error in FUN(X[[1L]], ...) : arguments must have same length > > length(as.list(newdata$entrezids)) > [1] 54675 > > dim(newdata) > [1] 54675 70 > > > Thank you so much! > Vanessa >