unique values for repeated geneIDs
1
0
Entering edit mode
@hernando-martinez-4124
Last seen 10.4 years ago
Hello everyone, my name is Hernando, and I am new to R. I have a little problem that maybe you can help me with, as I have been looking through the packages with no success, and it shouldn't be very difficult to solve. I have a text file containing a list of genes, with expression values for each along a set of microarray experiments. Ex: geneID sample1 sample 2 .... gene1 45 58 .... gene1 43 63 ..... gene2 32 21 .... ...... ..... ...... ..... In this list, there are some genes repeated, but with different values (like in the example). This repetitions come from different probes targeting the same gene. What I want is a new text file, but with each gene appearing only once, and with three possibilities for the expression values of repeated genes: - Each value (for each column (sample)) is the average of the previous values (in the example, sample 1 for gene1 should be 44, and 60,5 in sample 2) - Instead of the average, the median. - The highest values. I would prefer the median or the average, but I don't know if getting the highest values is easier. I have seen this function: "findLargest" of "genefilter" package, but it works with probes and I have already converted files (to geneIDs). I hope you can help me or letting me know any function or package to start with. Many thanks -- Hernando Martínez Vergara [[alternative HTML version deleted]]
Microarray Microarray • 1.2k views
ADD COMMENT
0
Entering edit mode
@sean-davis-490
Last seen 5 months ago
United States
On Fri, Jun 11, 2010 at 8:12 AM, Hernando Martínez <hernybiotec@gmail.com>wrote: > Hello everyone, my name is Hernando, and I am new to R. I have a little > problem that maybe you can help me with, as I have been looking through the > packages with no success, and it shouldn't be very difficult to solve. > I have a text file containing a list of genes, with expression values for > each along a set of microarray experiments. Ex: > > geneID sample1 sample 2 .... > > gene1 45 58 .... > > gene1 43 63 ..... > > gene2 32 21 .... > > ...... ..... ...... ..... > > In this list, there are some genes repeated, but with different values > (like > in the example). This repetitions come from different probes targeting the > same gene. > What I want is a new text file, but with each gene appearing only once, and > with three possibilities for the expression values of repeated genes: > > - Each value (for each column (sample)) is the average of the previous > values (in the example, sample 1 for gene1 should be 44, and 60,5 in sample > 2) > - Instead of the average, the median. > - The highest values. > > I would prefer the median or the average, but I don't know if getting the > highest values is easier. > > I have seen this function: "findLargest" of "genefilter" package, but it > works with probes and I have already converted files (to geneIDs). > > Hi, Hernando. Have a look at the aggregate() function. Sean [[alternative HTML version deleted]]
ADD COMMENT
0
Entering edit mode
Thank you very much Sean, I have been working with function aggregate and it is exactly what I need. However, there is still a painful detail that I cannot get rid of. I hope you can help me too with this. I have this text file: A B C D d1 2 23 2 d1 4 22 2 d1 5 24 2 d2 10 7 2 d2 20 8 3 d1 7 23 2 d3 2 14 30 d3 4 14 50 d2 30 8 4 d4 12 13 15 d5 1 5 90 d2 40 7 3 d6 34 2 5 (I use it as a test) If I type: > data<-read.table("test.txt",sep="\t") > agr<-aggregate(data[2:4], by=list(data$V1), FUN=mean) I get 21 warning messages and all the values are "NA", including header B, C, and D. However, if I remove A,B,C,D from the previous file, and type the same commands, it works perfectly fine, getting what I wanted. The problem is that the real datasets I need to work with are really large and it is difficult to remove and add the headers without danger of doing something wrong. Is there any command or parameter that I should introduce to the function in order to solve this issue? Thank you so much, Hernando 2010/6/11 Sean Davis <sdavis2@mail.nih.gov> > > > On Fri, Jun 11, 2010 at 8:12 AM, Hernando Martínez <hernybiotec@gmail.com>wrote: > >> Hello everyone, my name is Hernando, and I am new to R. I have a little >> problem that maybe you can help me with, as I have been looking through >> the >> packages with no success, and it shouldn't be very difficult to solve. >> I have a text file containing a list of genes, with expression values for >> each along a set of microarray experiments. Ex: >> >> geneID sample1 sample 2 .... >> >> gene1 45 58 .... >> >> gene1 43 63 ..... >> >> gene2 32 21 .... >> >> ...... ..... ...... ..... >> >> In this list, there are some genes repeated, but with different values >> (like >> in the example). This repetitions come from different probes targeting the >> same gene. >> What I want is a new text file, but with each gene appearing only once, >> and >> with three possibilities for the expression values of repeated genes: >> >> - Each value (for each column (sample)) is the average of the previous >> values (in the example, sample 1 for gene1 should be 44, and 60,5 in >> sample >> 2) >> - Instead of the average, the median. >> - The highest values. >> >> I would prefer the median or the average, but I don't know if getting the >> highest values is easier. >> >> I have seen this function: "findLargest" of "genefilter" package, but it >> works with probes and I have already converted files (to geneIDs). >> >> > Hi, Hernando. Have a look at the aggregate() function. > > Sean > > -- Hernando Martínez Vergara [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hernando; > I have this text file: > > A B C D > d1 2 23 2 > d1 4 22 2 > d1 5 24 2 > d2 10 7 2 > d2 20 8 3 > d1 7 23 2 > d3 2 14 30 > d3 4 14 50 > d2 30 8 4 > d4 12 13 15 > d5 1 5 90 > d2 40 7 3 > d6 34 2 5 > > > data<-read.table("test.txt",sep="\t") > > agr<-aggregate(data[2:4], by=list(data$V1), FUN=mean) > > I get 21 warning messages and all the values are "NA", including > header B, C, and D. However, if I remove A,B,C,D from the previous > file, and type the same commands, it works perfectly fine, getting > what I wanted. Add header=TRUE to the read.table command: data<-read.table("test.txt",sep="\t", header=TRUE) agr<-aggregate(data[2:4], by=list(data$A), FUN=mean) Try help(read.table) to learn more about the available options. Brad
ADD REPLY

Login before adding your answer.

Traffic: 662 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6