aggregate_summarizing expression values over entrez gene ids
2
0
Entering edit mode
@vanessa-vermeirssen-2253
Last seen 9.6 years ago
Hi, I have a dataframe containing RMA normalized and summarized expression values for affymetrix probesets, av.data. I have looked up the Entrez gene ids for the probesets in the annotation package, entrezids. Multiple probesets map of course to the same entrez id and I would like to combine these data into one row, by averaging the expression values for the same entrez ids over the different experiments. I tried the function "aggregate" to do this, but somehow it gives an error that the arguments are not of the same length, but they are...??? How can I solve this or is there any other way to do this? See my code below... av.data <- read.table("humanGPL570avdata.txt", row.names = 1, sep = "\t", header = T, na.strings = "NA", fill = T) av.data[1:5,1:5] X1_Schwann_p1 X1_Schwann_p3 X2_accumbens X2_adipose 1007_s_at 9.281857 9.340795 9.151775 8.319741 1053_at 7.000684 6.867318 4.633061 5.101534 117_at 6.007608 6.124562 5.425565 5.692270 121_at 6.543294 6.728119 7.651856 7.692947 1255_g_at 3.077289 2.989938 4.622865 2.955812 X2_adipose_omental 1007_s_at 7.909480 1053_at 4.509407 117_at 6.298798 121_at 7.598834 1255_g_at 3.040816 probes <- ls(hgu133plus2ENTREZID) entrezids <- unlist(mget(probes,hgu133plus2ENTREZID)) newdata <- data.frame(entrezids,av.data) sum <- aggregate(av.data,as.list(entrezids),mean) Error in FUN(X[[1L]], ...) : arguments must have same length > length(as.list(entrezids)) [1] 54675 > dim(av.data) [1] 54675 69 sumdata <- aggregate(newdata,as.list(newdata$entrezids),mean) Error in FUN(X[[1L]], ...) : arguments must have same length > length(as.list(newdata$entrezids)) [1] 54675 > dim(newdata) [1] 54675 70 Thank you so much! Vanessa -- ================================================================== Vanessa Vermeirssen, PhD Tel:+32 (0)9 331 38 10 fax:+32 (0)9 3313809 VIB Department of Plant Systems Biology, Ghent University Technologiepark 927, 9052 Gent, BELGIUM vamei at psb.ugent.be http://www.psb.ugent.be
• 994 views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 2 hours ago
United States
Hi Vanessa, Vanessa Vermeirssen wrote: > Hi, > > I have a dataframe containing RMA normalized and summarized expression > values for affymetrix probesets, av.data. > I have looked up the Entrez gene ids for the probesets in the annotation > package, entrezids. > Multiple probesets map of course to the same entrez id and I would like > to combine these data into one row, > by averaging the expression values for the same entrez ids over the > different experiments. > I tried the function "aggregate" to do this, but somehow it gives an > error that the arguments are not of the same length, but they are...??? > How can I solve this or is there any other way to do this? > > See my code below... > > av.data <- read.table("humanGPL570avdata.txt", row.names = 1, sep = > "\t", header = T, na.strings = "NA", fill = T) > av.data[1:5,1:5] > X1_Schwann_p1 X1_Schwann_p3 X2_accumbens X2_adipose > 1007_s_at 9.281857 9.340795 9.151775 8.319741 > 1053_at 7.000684 6.867318 4.633061 5.101534 > 117_at 6.007608 6.124562 5.425565 5.692270 > 121_at 6.543294 6.728119 7.651856 7.692947 > 1255_g_at 3.077289 2.989938 4.622865 2.955812 > X2_adipose_omental > 1007_s_at 7.909480 > 1053_at 4.509407 > 117_at 6.298798 > 121_at 7.598834 > 1255_g_at 3.040816 > > probes <- ls(hgu133plus2ENTREZID) > entrezids <- unlist(mget(probes,hgu133plus2ENTREZID)) > newdata <- data.frame(entrezids,av.data) > > sum <- aggregate(av.data,as.list(entrezids),mean) > Error in FUN(X[[1L]], ...) : arguments must have same length The problem here is you need a list of vectors, each as long as dim(av.data)[1]. What you have given is a list of vectors, each of length one. The difference is between list() and as.list(). If you use list(entrezids), you will get a list of length one, containing a vector of length 54675. If you use as.list(entrezids) you get a list of length 54675, each item containing one Entrez Gene ID. Does this make sense? Best, Jim > > > length(as.list(entrezids)) > [1] 54675 > > dim(av.data) > [1] 54675 69 > > sumdata <- aggregate(newdata,as.list(newdata$entrezids),mean) > Error in FUN(X[[1L]], ...) : arguments must have same length > > length(as.list(newdata$entrezids)) > [1] 54675 > > dim(newdata) > [1] 54675 70 > > > Thank you so much! > Vanessa > -- James W. MacDonald, M.S. Biostatistician Hildebrandt Lab 8220D MSRB III 1150 W. Medical Center Drive Ann Arbor MI 48109-0646 734-936-8662
ADD COMMENT
0
Entering edit mode
@wolfgang-huber-3550
Last seen 17 days ago
EMBL European Molecular Biology Laborat…
Hi Vanessa, Have a look at "tapply" and "by". But you could also think a bit more about the rationale for summarizing. The different probesets for the same Entrez gene ID are not replicates, and they are not equivalent. Some may be more valid or useful than others. An approach that I find useful is to determine the probeset that shows most variability, and then believe that one. Of course, one can also look at the actual mapping of the probes to the transcript and to the gene structure, and make a decision based on that. For imporant results, this is what I would recommend (besides, of course, wet-lab follow- up.) Best wishes Wolfgang -- ---------------------------------------------------- Wolfgang Huber EMBL-EBI http://www.ebi.ac.uk/huber Vanessa Vermeirssen wrote: > Hi, > > I have a dataframe containing RMA normalized and summarized expression > values for affymetrix probesets, av.data. > I have looked up the Entrez gene ids for the probesets in the annotation > package, entrezids. > Multiple probesets map of course to the same entrez id and I would like > to combine these data into one row, > by averaging the expression values for the same entrez ids over the > different experiments. > I tried the function "aggregate" to do this, but somehow it gives an > error that the arguments are not of the same length, but they are...??? > How can I solve this or is there any other way to do this? > > See my code below... > > av.data <- read.table("humanGPL570avdata.txt", row.names = 1, sep = > "\t", header = T, na.strings = "NA", fill = T) > av.data[1:5,1:5] > X1_Schwann_p1 X1_Schwann_p3 X2_accumbens X2_adipose > 1007_s_at 9.281857 9.340795 9.151775 8.319741 > 1053_at 7.000684 6.867318 4.633061 5.101534 > 117_at 6.007608 6.124562 5.425565 5.692270 > 121_at 6.543294 6.728119 7.651856 7.692947 > 1255_g_at 3.077289 2.989938 4.622865 2.955812 > X2_adipose_omental > 1007_s_at 7.909480 > 1053_at 4.509407 > 117_at 6.298798 > 121_at 7.598834 > 1255_g_at 3.040816 > > probes <- ls(hgu133plus2ENTREZID) > entrezids <- unlist(mget(probes,hgu133plus2ENTREZID)) > newdata <- data.frame(entrezids,av.data) > > sum <- aggregate(av.data,as.list(entrezids),mean) > Error in FUN(X[[1L]], ...) : arguments must have same length > > > length(as.list(entrezids)) > [1] 54675 > > dim(av.data) > [1] 54675 69 > > sumdata <- aggregate(newdata,as.list(newdata$entrezids),mean) > Error in FUN(X[[1L]], ...) : arguments must have same length > > length(as.list(newdata$entrezids)) > [1] 54675 > > dim(newdata) > [1] 54675 70 > > > Thank you so much! > Vanessa >
ADD COMMENT

Login before adding your answer.

Traffic: 648 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6