clustering genes in GO categories

0

Entering edit mode

Assa Yeroslaviz ★ 1.5k

@assa-yeroslaviz-1597

Last seen 4 months ago

Germany

Hi, everybody, I was wondering whether there is a package to cluster a list of genes to different GO categories my problem is as such: i have a list of genes (a tab delimited file): id flybasename_gene flybase_gene_id entrezgene GOMF 1616608_a_at Gpdh FBgn0001128 33824 carboxylesterase activity hydrolase activity 3',5'-cyclic-nucleotide phosphodiesterase activity protein binding 1622892_s_at CG33057 FBgn0053057 318833 nucleotide binding protein binding ATP binding chaperone binding ammonium transmembrane transporter activity 1622892_s_at mkg-p FBgn0035889 38955 nucleotide binding protein binding ATP binding chaperone binding ammonium transmembrane transporter activity 1622893_at IM3 FBgn0040736 50209 aminopeptidase activity metalloexopeptidase activity hydrolase activity manganese ion bindin 1622894_at CG15120 FBgn0034454 37248 protein binding I would like to try and group the genes in various GO categories, which are mentioned here in the last columns. The GO categories take more than one column and the number is not equal in each line, deending on the depth of the annotation for each gene. Is there a way of transforming the table, so that I in the first column a list of my GO categories and than on each line a list with gene IDs (the right ID are not important as I can change them as I wish). I would like to have something like that: GO genes protein binding FBgn0001128 FBgn0053057 FBgn0035889 etc. ammonium transmembrane transporter activity FBgn0053057 FBgn0035889 hydrolayse activity FBgn0040736 FBgn0001128 I would appriciate any kind of help or ideas Thanks Assa [[alternative HTML version deleted]]

Annotation GO Annotation GO • 1.4k views

ADD COMMENT • link updated 15.1 years ago by James W. MacDonald 68k • written 15.1 years ago by Assa Yeroslaviz ★ 1.5k

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 21 hours ago

United States

Hi Assa, I don't think you need a package for that. A call to tapply() followed by a call to do.call() should get you where you want to go. Say you read your table into R, and call it 'dat'. thelist <- tapply(1:nrow(dat), dat$GOMF, function(x) dat[x, 3]) then you will have a list, with the names being the GOMF and the list items being all the gene ids. Collapsing that to a matrix is difficult because you will have different numbers of columns. So you can either collapse all the list items using commas, or directly write out to a file. Collapsing with commas is easy: commalist <- lapply(thelist, paste, collapse = ",") avector <- do.call("c", commalist) names(vector) <- names(commalist) or you could just write out to a file using something like con <- file("mydata.txt", "w") for(i in seq(along = commalist)) cat(names(commalist)[i], commalist[[i]], "\n", sep = "\t", file = con) close(con) All untested, so you might have to fiddle a bit to get the results you want. Best, Jim James W. MacDonald, M.S. Biostatistician Douglas Lab 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 >>> Assa Yeroslaviz 01/06/11 1:02 PM >>> Hi, everybody, I was wondering whether there is a package to cluster a list of genes to different GO categories my problem is as such: i have a list of genes (a tab delimited file): id flybasename_gene flybase_gene_id entrezgene GOMF 1616608_a_at Gpdh FBgn0001128 33824 carboxylesterase activity hydrolase activity 3',5'-cyclic-nucleotide phosphodiesterase activity protein binding 1622892_s_at CG33057 FBgn0053057 318833 nucleotide binding protein binding ATP binding chaperone binding ammonium transmembrane transporter activity 1622892_s_at mkg-p FBgn0035889 38955 nucleotide binding protein binding ATP binding chaperone binding ammonium transmembrane transporter activity 1622893_at IM3 FBgn0040736 50209 aminopeptidase activity metalloexopeptidase activity hydrolase activity manganese ion bindin 1622894_at CG15120 FBgn0034454 37248 protein binding I would like to try and group the genes in various GO categories, which are mentioned here in the last columns. The GO categories take more than one column and the number is not equal in each line, deending on the depth of the annotation for each gene. Is there a way of transforming the table, so that I in the first column a list of my GO categories and than on each line a list with gene IDs (the right ID are not important as I can change them as I wish). I would like to have something like that: GO genes protein binding FBgn0001128 FBgn0053057 FBgn0035889 etc. ammonium transmembrane transporter activity FBgn0053057 FBgn0035889 hydrolayse activity FBgn0040736 FBgn0001128 I would appriciate any kind of help or ideas Thanks Assa [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at r-project.org https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD COMMENT • link 15.1 years ago James W. MacDonald 68k

0

Entering edit mode

Hi James, thanks for this idea, but unfortunately it wasn't exactly what I needed. This kind of transformation I was able to do on my own. Ye problem is, that I would like to split the third column into single GO categories. this waht I have until now, after applying the tapply command: "carboxylesterase activity:hydrolase activity:3',5'-cyclic-nucleotide phosphodiesterase activity:protein binding" FBgn0001128 aminopeptidase activity:metalloexopeptidase activity:hydrolase activity:manganese ion binding FBgn0040736 nucleotide binding:protein binding:ATP binding:chaperone binding:ammonium transmembrane transporter activity FBgn0053057,FBgn0035889 protein binding FBgn0034454 What I need is to split the first column (or in the original file the third column) in to separate names (in this column these are separated by ':'). and concatenate ALL the right IDs to the ALL the right GO categories. As if to get something like: carboxylesterase activity FBgn0001128 .... hydrolase activity FBgn0001128 FBgn0040736 ..... 3',5'-cyclic-nucleotide phosphodiesterase activity FBgn0001128 .... protein binding FBgn0001128 FBgn0034454 FBgn0053057 FBgn0035889 .... nucleotide binding FBgn0053057 FBgn0035889 ... ATP binding FBgn0053057 FBgn0035889 .... chaperone binding FBgn0053057 FBgn0035889 .... ammonium transmembrane transporter activity FBgn0053057 FBgn0035889 .... aminopeptidase activity FBgn0040736 .... metalloexopeptidase activity FBgn0040736 .... manganese ion binding FBgn0040736 .... .... I would appreciate any help on that subject. THX Assa On Thu, Jan 6, 2011 at 22:09, James MacDonald <jmacdon@med.umich.edu> wrote: > Hi Assa, > > I don't think you need a package for that. A call to tapply() followed by a > call to do.call() should get you where you want to go. > > Say you read your table into R, and call it 'dat'. > > thelist <- tapply(1:nrow(dat), dat$GOMF, function(x) dat[x, 3]) > > then you will have a list, with the names being the GOMF and the list items > being all the gene ids. Collapsing that to a matrix is difficult because you > will have different numbers of columns. So you can either collapse all the > list items using commas, or directly write out to a file. Collapsing with > commas is easy: > > commalist <- lapply(thelist, paste, collapse = ",") > avector <- do.call("c", commalist) > names(vector) <- names(commalist) > > or you could just write out to a file using something like > > con <- file("mydata.txt", "w") > > for(i in seq(along = commalist)) cat(names(commalist)[i], commalist[[i]], > "\n", sep = "\t", file = con) > > close(con) > > All untested, so you might have to fiddle a bit to get the results you > want. > > Best, > > Jim > > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 > >>> Assa Yeroslaviz 01/06/11 1:02 PM >>> > Hi, everybody, > > I was wondering whether there is a package to cluster a list of genes to > different GO categories > > my problem is as such: > i have a list of genes (a tab delimited file): > id flybasename_gene flybase_gene_id entrezgene GOMF > > 1616608_a_at Gpdh FBgn0001128 33824 carboxylesterase activity > hydrolase activity 3',5'-cyclic-nucleotide phosphodiesterase activity > protein binding > 1622892_s_at CG33057 FBgn0053057 318833 nucleotide binding > protein binding ATP binding chaperone binding ammonium > transmembrane transporter activity > 1622892_s_at mkg-p FBgn0035889 38955 nucleotide binding > protein binding ATP binding chaperone binding ammonium > transmembrane transporter activity > 1622893_at IM3 FBgn0040736 50209 aminopeptidase activity > metalloexopeptidase activity hydrolase activity manganese ion bindin > 1622894_at CG15120 FBgn0034454 37248 protein binding > > I would like to try and group the genes in various GO categories, which are > mentioned here in the last columns. The GO categories take more than one > column and the number is not equal in each line, deending on the depth of > the annotation for each gene. > Is there a way of transforming the table, so that I in the first column a > list of my GO categories and than on each line a list with gene IDs (the > right ID are not important as I can change them as I wish). > I would like to have something like that: > GO genes > protein binding FBgn0001128 FBgn0053057 FBgn0035889 etc. > ammonium transmembrane transporter activity FBgn0053057 FBgn0035889 > hydrolayse activity FBgn0040736 FBgn0001128 > > > I would appriciate any kind of help or ideas > > Thanks > Assa > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > ********************************************************** > Electronic Mail is not secure, may not be read every day, and should not be > used for urgent or sensitive issues > > [[alternative HTML version deleted]]

ADD REPLY • link 15.1 years ago Assa Yeroslaviz ★ 1.5k

0

Entering edit mode

Hi Assa, OK, I see your point. This is still pretty easy. lst <- tapply(1:nrow(dat), dat$flybase_gene_id, function(x) dat[x,"GOMF") lst2 <- lapply(lst, function(x) unlist(strsplit(x, ":")) unlst <- cbind(rep(names(lst2), sapply(lst2, length)), unlist(lst2, use.names = FALSE)) done <- tapply(1:nrow(unlst), unlst[,2], function(x) unlst[x,1]) There are assuredly other more elegant ways to do this, but this should suffice. Best, Jim On 1/12/2011 7:28 AM, Assa Yeroslaviz wrote: > Hi James, > > thanks for this idea, but unfortunately it wasn't exactly what I needed. > This kind of transformation I was able to do on my own. Ye problem is, that > I would like to split the third column into single GO categories. > > this waht I have until now, after applying the tapply command: > "carboxylesterase activity:hydrolase activity:3',5'-cyclic- nucleotide > phosphodiesterase activity:protein binding" FBgn0001128 > aminopeptidase activity:metalloexopeptidase activity:hydrolase > activity:manganese ion binding FBgn0040736 > nucleotide binding:protein binding:ATP binding:chaperone binding:ammonium > transmembrane transporter activity FBgn0053057,FBgn0035889 > protein binding FBgn0034454 > > What I need is to split the first column (or in the original file the third > column) in to separate names (in this column these are separated by ':'). > and concatenate ALL the right IDs to the ALL the right GO categories. > As if to get something like: > carboxylesterase activity FBgn0001128 .... > hydrolase activity FBgn0001128 FBgn0040736 ..... > 3',5'-cyclic-nucleotide phosphodiesterase activity FBgn0001128 .... > protein binding FBgn0001128 FBgn0034454 FBgn0053057 FBgn0035889 > .... > nucleotide binding FBgn0053057 FBgn0035889 ... > ATP binding FBgn0053057 FBgn0035889 .... > chaperone binding FBgn0053057 FBgn0035889 .... > ammonium transmembrane transporter activity FBgn0053057 > FBgn0035889 .... > aminopeptidase activity FBgn0040736 .... > metalloexopeptidase activity FBgn0040736 .... > manganese ion binding FBgn0040736 .... > .... > > I would appreciate any help on that subject. > > THX > Assa > > On Thu, Jan 6, 2011 at 22:09, James MacDonald<jmacdon at="" med.umich.edu=""> wrote: > >> Hi Assa, >> >> I don't think you need a package for that. A call to tapply() followed by a >> call to do.call() should get you where you want to go. >> >> Say you read your table into R, and call it 'dat'. >> >> thelist<- tapply(1:nrow(dat), dat$GOMF, function(x) dat[x, 3]) >> >> then you will have a list, with the names being the GOMF and the list items >> being all the gene ids. Collapsing that to a matrix is difficult because you >> will have different numbers of columns. So you can either collapse all the >> list items using commas, or directly write out to a file. Collapsing with >> commas is easy: >> >> commalist<- lapply(thelist, paste, collapse = ",") >> avector<- do.call("c", commalist) >> names(vector)<- names(commalist) >> >> or you could just write out to a file using something like >> >> con<- file("mydata.txt", "w") >> >> for(i in seq(along = commalist)) cat(names(commalist)[i], commalist[[i]], >> "\n", sep = "\t", file = con) >> >> close(con) >> >> All untested, so you might have to fiddle a bit to get the results you >> want. >> >> Best, >> >> Jim >> >> James W. MacDonald, M.S. >> Biostatistician >> Douglas Lab >> 5912 Buhl >> 1241 E. Catherine St. >> Ann Arbor MI 48109-5618 >> 734-615-7826 >>>>> Assa Yeroslaviz 01/06/11 1:02 PM>>> >> Hi, everybody, >> >> I was wondering whether there is a package to cluster a list of genes to >> different GO categories >> >> my problem is as such: >> i have a list of genes (a tab delimited file): >> id flybasename_gene flybase_gene_id entrezgene GOMF >> >> 1616608_a_at Gpdh FBgn0001128 33824 carboxylesterase activity >> hydrolase activity 3',5'-cyclic-nucleotide phosphodiesterase activity >> protein binding >> 1622892_s_at CG33057 FBgn0053057 318833 nucleotide binding >> protein binding ATP binding chaperone binding ammonium >> transmembrane transporter activity >> 1622892_s_at mkg-p FBgn0035889 38955 nucleotide binding >> protein binding ATP binding chaperone binding ammonium >> transmembrane transporter activity >> 1622893_at IM3 FBgn0040736 50209 aminopeptidase activity >> metalloexopeptidase activity hydrolase activity manganese ion bindin >> 1622894_at CG15120 FBgn0034454 37248 protein binding >> >> I would like to try and group the genes in various GO categories, which are >> mentioned here in the last columns. The GO categories take more than one >> column and the number is not equal in each line, deending on the depth of >> the annotation for each gene. >> Is there a way of transforming the table, so that I in the first column a >> list of my GO categories and than on each line a list with gene IDs (the >> right ID are not important as I can change them as I wish). >> I would like to have something like that: >> GO genes >> protein binding FBgn0001128 FBgn0053057 FBgn0035889 etc. >> ammonium transmembrane transporter activity FBgn0053057 FBgn0035889 >> hydrolayse activity FBgn0040736 FBgn0001128 >> >> >> I would appriciate any kind of help or ideas >> >> Thanks >> Assa >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor >> >> ********************************************************** >> Electronic Mail is not secure, may not be read every day, and should not be >> used for urgent or sensitive issues >> >> > -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD REPLY • link 15.1 years ago James W. MacDonald 68k

0

Entering edit mode

Hi James, thanks for the help, but unfortunately I get an error message when running the second line > list <- tapply(1:nrow(dat), dat$flybase_gene_id, function(x) dat[x,"GOMF") > lst2 <- lapply(list, function(x) unlist(strsplit(x, ":")) Error in strsplit(x, ":") : non-character argument > str(list) List of 13369 $ FBgn0000008: Factor w/ 3814 levels "\"1,3-beta-glucan synthase activity:transferase activity:transferase activity, transferring glycosyl groups\"",..: NA $ FBgn0000014: Factor w/ 3814 levels "\"1,3-beta-glucan synthase activity:transferase activity:transferase activity, transferring glycosyl groups\"",..: 3330 NA $ FBgn0000015: Factor w/ 3814 levels "\"1,3-beta-glucan synthase activity:transferase activity:transferase activity, transferring glycosyl groups\"",..: 2546 880 $ FBgn0000017: Factor w/ 3814 levels "\"1,3-beta-glucan synthase activity:transferase activity:transferase activity, transferring glycosyl groups\"",..: NA 35 $ FBgn0000018: Factor w/ 3814 levels "\"1,3-beta-glucan synthase activity:transferase activity:transferase activity, transferring glycosyl groups\"",..: NA $ FBgn0000022: Factor w/ 3814 levels "\"1,3-beta-glucan synthase activity:transferase activity:transferase activity, transferring glycosyl groups\"",..: 893 $ FBgn0000024: Factor w/ 3814 levels "\"1,3-beta-glucan synthase activity:transferase activity:transferase activity, transferring glycosyl groups\"",..: 2546 $ FBgn0000028: Factor w/ 3814 levels "\"1,3-beta-glucan synthase activity:transferase activity:transferase activity, transferring glycosyl groups\"",..: NA I tried to convert the factor of the data.frame into characters, but it still give me the same error. list1 <- data.frame(lapply(list, as.character), stringsAsFactors=FALSE) Is there a way of converting the lines to characters? THX Assa Hi Assa, > > OK, I see your point. This is still pretty easy. > > lst <- tapply(1:nrow(dat), dat$flybase_gene_id, function(x) dat[x,"GOMF") > > lst2 <- lapply(lst, function(x) unlist(strsplit(x, ":")) > > unlst <- cbind(rep(names(lst2), sapply(lst2, length)), unlist(lst2, > use.names = FALSE)) > > done <- tapply(1:nrow(unlst), unlst[,2], function(x) unlst[x,1]) > > There are assuredly other more elegant ways to do this, but this should > suffice. > > Best, > > Jim > > > > > On 1/12/2011 7:28 AM, Assa Yeroslaviz wrote: > >> Hi James, >> >> thanks for this idea, but unfortunately it wasn't exactly what I needed. >> This kind of transformation I was able to do on my own. Ye problem is, >> that >> I would like to split the third column into single GO categories. >> >> this waht I have until now, after applying the tapply command: >> "carboxylesterase activity:hydrolase activity:3',5'-cyclic- nucleotide >> phosphodiesterase activity:protein binding" FBgn0001128 >> aminopeptidase activity:metalloexopeptidase activity:hydrolase >> activity:manganese ion binding FBgn0040736 >> nucleotide binding:protein binding:ATP binding:chaperone binding:ammonium >> transmembrane transporter activity FBgn0053057,FBgn0035889 >> protein binding FBgn0034454 >> >> What I need is to split the first column (or in the original file the >> third >> column) in to separate names (in this column these are separated by ':'). >> and concatenate ALL the right IDs to the ALL the right GO categories. >> As if to get something like: >> carboxylesterase activity FBgn0001128 .... >> hydrolase activity FBgn0001128 FBgn0040736 ..... >> 3',5'-cyclic-nucleotide phosphodiesterase activity FBgn0001128 .... >> protein binding FBgn0001128 FBgn0034454 FBgn0053057 FBgn0035889 >> .... >> nucleotide binding FBgn0053057 FBgn0035889 ... >> ATP binding FBgn0053057 FBgn0035889 .... >> chaperone binding FBgn0053057 FBgn0035889 .... >> ammonium transmembrane transporter activity FBgn0053057 >> FBgn0035889 .... >> aminopeptidase activity FBgn0040736 .... >> metalloexopeptidase activity FBgn0040736 .... >> manganese ion binding FBgn0040736 .... >> .... >> >> I would appreciate any help on that subject. >> >> THX >> Assa >> >> On Thu, Jan 6, 2011 at 22:09, James MacDonald<jmacdon@med.umich.edu> >> wrote: >> >> Hi Assa, >>> >>> I don't think you need a package for that. A call to tapply() followed by >>> a >>> call to do.call() should get you where you want to go. >>> >>> Say you read your table into R, and call it 'dat'. >>> >>> thelist<- tapply(1:nrow(dat), dat$GOMF, function(x) dat[x, 3]) >>> >>> then you will have a list, with the names being the GOMF and the list >>> items >>> being all the gene ids. Collapsing that to a matrix is difficult because >>> you >>> will have different numbers of columns. So you can either collapse all >>> the >>> list items using commas, or directly write out to a file. Collapsing with >>> commas is easy: >>> >>> commalist<- lapply(thelist, paste, collapse = ",") >>> avector<- do.call("c", commalist) >>> names(vector)<- names(commalist) >>> >>> or you could just write out to a file using something like >>> >>> con<- file("mydata.txt", "w") >>> >>> for(i in seq(along = commalist)) cat(names(commalist)[i], commalist[[i]], >>> "\n", sep = "\t", file = con) >>> >>> close(con) >>> >>> All untested, so you might have to fiddle a bit to get the results you >>> want. >>> >>> Best, >>> >>> Jim >>> >>> James W. MacDonald, M.S. >>> Biostatistician >>> Douglas Lab >>> 5912 Buhl >>> 1241 E. Catherine St. >>> Ann Arbor MI 48109-5618 >>> 734-615-7826 >>> >>>> Assa Yeroslaviz 01/06/11 1:02 PM>>> >>>>>> >>>>> Hi, everybody, >>> >>> I was wondering whether there is a package to cluster a list of genes to >>> different GO categories >>> >>> my problem is as such: >>> i have a list of genes (a tab delimited file): >>> id flybasename_gene flybase_gene_id entrezgene GOMF >>> >>> 1616608_a_at Gpdh FBgn0001128 33824 carboxylesterase activity >>> hydrolase activity 3',5'-cyclic-nucleotide phosphodiesterase activity >>> protein binding >>> 1622892_s_at CG33057 FBgn0053057 318833 nucleotide binding >>> protein binding ATP binding chaperone binding ammonium >>> transmembrane transporter activity >>> 1622892_s_at mkg-p FBgn0035889 38955 nucleotide binding >>> protein binding ATP binding chaperone binding ammonium >>> transmembrane transporter activity >>> 1622893_at IM3 FBgn0040736 50209 aminopeptidase activity >>> metalloexopeptidase activity hydrolase activity manganese ion >>> bindin >>> 1622894_at CG15120 FBgn0034454 37248 protein binding >>> >>> I would like to try and group the genes in various GO categories, which >>> are >>> mentioned here in the last columns. The GO categories take more than one >>> column and the number is not equal in each line, deending on the depth of >>> the annotation for each gene. >>> Is there a way of transforming the table, so that I in the first column a >>> list of my GO categories and than on each line a list with gene IDs (the >>> right ID are not important as I can change them as I wish). >>> I would like to have something like that: >>> GO genes >>> protein binding FBgn0001128 FBgn0053057 FBgn0035889 etc. >>> ammonium transmembrane transporter activity FBgn0053057 >>> FBgn0035889 >>> hydrolayse activity FBgn0040736 FBgn0001128 >>> >>> >>> I would appriciate any kind of help or ideas >>> >>> Thanks >>> Assa >>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >>> ********************************************************** >>> Electronic Mail is not secure, may not be read every day, and should not >>> be >>> used for urgent or sensitive issues >>> >>> >>> >> > -- > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > University of Michigan > Department of Human Genetics > > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 > ********************************************************** > Electronic Mail is not secure, may not be read every day, and should not be > used for urgent or sensitive issues > [[alternative HTML version deleted]]

ADD REPLY • link 15.0 years ago Assa Yeroslaviz ★ 1.5k

0

Entering edit mode

Hi again, ok. I solved it. well to be honest, it wasn't that difficult. I just added > lst2 <- lapply(list, function(x) unlist(strsplit(*as.character(x)*, ":")) Assa On Mon, Jan 17, 2011 at 16:42, Assa Yeroslaviz <frymor@gmail.com> wrote: > > Hi James, > > thanks for the help, but unfortunately I get an error message when running > the second line > > > list <- tapply(1:nrow(dat), dat$flybase_gene_id, function(x) > dat[x,"GOMF") > > > lst2 <- lapply(list, function(x) unlist(strsplit(x, ":")) > > Error in strsplit(x, ":") : non-character argument > > > str(list) > List of 13369 > $ FBgn0000008: Factor w/ 3814 levels "\"1,3-beta-glucan synthase > activity:transferase activity:transferase activity, transferring glycosyl > groups\"",..: NA > $ FBgn0000014: Factor w/ 3814 levels "\"1,3-beta-glucan synthase > activity:transferase activity:transferase activity, transferring glycosyl > groups\"",..: 3330 NA > $ FBgn0000015: Factor w/ 3814 levels "\"1,3-beta-glucan synthase > activity:transferase activity:transferase activity, transferring glycosyl > groups\"",..: 2546 880 > $ FBgn0000017: Factor w/ 3814 levels "\"1,3-beta-glucan synthase > activity:transferase activity:transferase activity, transferring glycosyl > groups\"",..: NA 35 > $ FBgn0000018: Factor w/ 3814 levels "\"1,3-beta-glucan synthase > activity:transferase activity:transferase activity, transferring glycosyl > groups\"",..: NA > $ FBgn0000022: Factor w/ 3814 levels "\"1,3-beta-glucan synthase > activity:transferase activity:transferase activity, transferring glycosyl > groups\"",..: 893 > $ FBgn0000024: Factor w/ 3814 levels "\"1,3-beta-glucan synthase > activity:transferase activity:transferase activity, transferring glycosyl > groups\"",..: 2546 > $ FBgn0000028: Factor w/ 3814 levels "\"1,3-beta-glucan synthase > activity:transferase activity:transferase activity, transferring glycosyl > groups\"",..: NA > > I tried to convert the factor of the data.frame into characters, but it > still give me the same error. > list1 <- data.frame(lapply(list, as.character), stringsAsFactors=FALSE) > > Is there a way of converting the lines to characters? > > THX > Assa > > > > > Hi Assa, >> >> OK, I see your point. This is still pretty easy. >> >> lst <- tapply(1:nrow(dat), dat$flybase_gene_id, function(x) dat[x,"GOMF") >> >> lst2 <- lapply(lst, function(x) unlist(strsplit(x, ":")) >> > > > >> unlst <- cbind(rep(names(lst2), sapply(lst2, length)), unlist(lst2, >> use.names = FALSE)) >> >> done <- tapply(1:nrow(unlst), unlst[,2], function(x) unlst[x,1]) >> >> There are assuredly other more elegant ways to do this, but this should >> suffice. >> >> Best, >> >> Jim >> >> >> >> >> On 1/12/2011 7:28 AM, Assa Yeroslaviz wrote: >> >>> Hi James, >>> >>> thanks for this idea, but unfortunately it wasn't exactly what I needed. >>> This kind of transformation I was able to do on my own. Ye problem is, >>> that >>> I would like to split the third column into single GO categories. >>> >>> this waht I have until now, after applying the tapply command: >>> "carboxylesterase activity:hydrolase activity:3',5'-cyclic- nucleotide >>> phosphodiesterase activity:protein binding" FBgn0001128 >>> aminopeptidase activity:metalloexopeptidase activity:hydrolase >>> activity:manganese ion binding FBgn0040736 >>> nucleotide binding:protein binding:ATP binding:chaperone binding:ammonium >>> transmembrane transporter activity FBgn0053057,FBgn0035889 >>> protein binding FBgn0034454 >>> >>> What I need is to split the first column (or in the original file the >>> third >>> column) in to separate names (in this column these are separated by ':'). >>> and concatenate ALL the right IDs to the ALL the right GO categories. >>> As if to get something like: >>> carboxylesterase activity FBgn0001128 .... >>> hydrolase activity FBgn0001128 FBgn0040736 ..... >>> 3',5'-cyclic-nucleotide phosphodiesterase activity FBgn0001128 .... >>> protein binding FBgn0001128 FBgn0034454 FBgn0053057 >>> FBgn0035889 >>> .... >>> nucleotide binding FBgn0053057 FBgn0035889 ... >>> ATP binding FBgn0053057 FBgn0035889 .... >>> chaperone binding FBgn0053057 FBgn0035889 .... >>> ammonium transmembrane transporter activity FBgn0053057 >>> FBgn0035889 .... >>> aminopeptidase activity FBgn0040736 .... >>> metalloexopeptidase activity FBgn0040736 .... >>> manganese ion binding FBgn0040736 .... >>> .... >>> >>> I would appreciate any help on that subject. >>> >>> THX >>> Assa >>> >>> On Thu, Jan 6, 2011 at 22:09, James MacDonald<jmacdon@med.umich.edu> >>> wrote: >>> >>> Hi Assa, >>>> >>>> I don't think you need a package for that. A call to tapply() followed >>>> by a >>>> call to do.call() should get you where you want to go. >>>> >>>> Say you read your table into R, and call it 'dat'. >>>> >>>> thelist<- tapply(1:nrow(dat), dat$GOMF, function(x) dat[x, 3]) >>>> >>>> then you will have a list, with the names being the GOMF and the list >>>> items >>>> being all the gene ids. Collapsing that to a matrix is difficult because >>>> you >>>> will have different numbers of columns. So you can either collapse all >>>> the >>>> list items using commas, or directly write out to a file. Collapsing >>>> with >>>> commas is easy: >>>> >>>> commalist<- lapply(thelist, paste, collapse = ",") >>>> avector<- do.call("c", commalist) >>>> names(vector)<- names(commalist) >>>> >>>> or you could just write out to a file using something like >>>> >>>> con<- file("mydata.txt", "w") >>>> >>>> for(i in seq(along = commalist)) cat(names(commalist)[i], >>>> commalist[[i]], >>>> "\n", sep = "\t", file = con) >>>> >>>> close(con) >>>> >>>> All untested, so you might have to fiddle a bit to get the results you >>>> want. >>>> >>>> Best, >>>> >>>> Jim >>>> >>>> James W. MacDonald, M.S. >>>> Biostatistician >>>> Douglas Lab >>>> 5912 Buhl >>>> 1241 E. Catherine St. >>>> Ann Arbor MI 48109-5618 >>>> 734-615-7826 >>>> >>>>> Assa Yeroslaviz 01/06/11 1:02 PM>>> >>>>>>> >>>>>> Hi, everybody, >>>> >>>> I was wondering whether there is a package to cluster a list of genes to >>>> different GO categories >>>> >>>> my problem is as such: >>>> i have a list of genes (a tab delimited file): >>>> id flybasename_gene flybase_gene_id entrezgene GOMF >>>> >>>> 1616608_a_at Gpdh FBgn0001128 33824 carboxylesterase >>>> activity >>>> hydrolase activity 3',5'-cyclic-nucleotide phosphodiesterase activity >>>> protein binding >>>> 1622892_s_at CG33057 FBgn0053057 318833 nucleotide binding >>>> protein binding ATP binding chaperone binding ammonium >>>> transmembrane transporter activity >>>> 1622892_s_at mkg-p FBgn0035889 38955 nucleotide binding >>>> protein binding ATP binding chaperone binding ammonium >>>> transmembrane transporter activity >>>> 1622893_at IM3 FBgn0040736 50209 aminopeptidase activity >>>> metalloexopeptidase activity hydrolase activity manganese ion >>>> bindin >>>> 1622894_at CG15120 FBgn0034454 37248 protein binding >>>> >>>> I would like to try and group the genes in various GO categories, which >>>> are >>>> mentioned here in the last columns. The GO categories take more than one >>>> column and the number is not equal in each line, deending on the depth >>>> of >>>> the annotation for each gene. >>>> Is there a way of transforming the table, so that I in the first column >>>> a >>>> list of my GO categories and than on each line a list with gene IDs (the >>>> right ID are not important as I can change them as I wish). >>>> I would like to have something like that: >>>> GO genes >>>> protein binding FBgn0001128 FBgn0053057 FBgn0035889 etc. >>>> ammonium transmembrane transporter activity FBgn0053057 >>>> FBgn0035889 >>>> hydrolayse activity FBgn0040736 FBgn0001128 >>>> >>>> >>>> I would appriciate any kind of help or ideas >>>> >>>> Thanks >>>> Assa >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor@r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>>> ********************************************************** >>>> Electronic Mail is not secure, may not be read every day, and should not >>>> be >>>> used for urgent or sensitive issues >>>> >>>> >>>> >>> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> Douglas Lab >> University of Michigan >> Department of Human Genetics >> >> 5912 Buhl >> 1241 E. Catherine St. >> Ann Arbor MI 48109-5618 >> 734-615-7826 >> ********************************************************** >> Electronic Mail is not secure, may not be read every day, and should not >> be used for urgent or sensitive issues >> > > [[alternative HTML version deleted]]

ADD REPLY • link 15.0 years ago Assa Yeroslaviz ★ 1.5k

0

Entering edit mode

Hello James and Bioconductor users, It starts to look better now. here is a short summary of my script: dat <- changedGenes.sub# changedGenes.sub is the complete data from the file FB_simulated_contrasts for Luke lst <- tapply(1:nrow(dat), dat$flybase_gene_id, function(x) dat[x,"bioProc"]) lst2 <- lapply(lst, function(x) unlist(strsplit(as.character(x), ":"))) unlst <- cbind(rep(names(lst2), sapply(lst2, length)), unlist(lst2, use.names = FALSE)) done <- tapply(1:nrow(unlst), unlst[,2], function(x) unlst[x,1]) The result I get is a list of lists: > str(done) List of 103 $ : chr [1:4] "FBgn0010359" "FBgn0021800" "FBgn0031420" "FBgn0034345" $ actin cytoskeleton organization : chr "FBgn0000318" $ actin filament organization : chr "FBgn0000318" $ adenosine to inosine editing : chr "FBgn0044510" $ adult behavior : chr "FBgn0044510" $ adult locomotory behavior : chr "FBgn0044510" $ antimicrobial humoral response : chr "FBgn0000318" $ apoptosis : chr "FBgn0016977" $ apposition of dorsal and ventral imaginal disc-derived wing surfaces : chr "FBgn0034326" $ asymmetric cell division : chr "FBgn0052484" ... Unfortunately I can't find a way of converting this list of lists into an exportable table/file to work with. What I would like to have is the same is in the same form as this list of lists, but as a data.frame with two columns. like that (this is a *hypothetical object,* which I couldn't generate until now) : > done GO category Gene_IDs no category : chr [1:4] "FBgn0010359" "FBgn0021800" "FBgn0031420" "FBgn0034345" actin cytoskeleton organization : chr "FBgn0000318" actin filament organization : chr "FBgn0000318" adenosine to inosine editing : chr "FBgn0044510" adult behavior : chr "FBgn0044510" adult locomotory behavior : chr "FBgn0044510" antimicrobial humoral response : chr "FBgn0000318" apoptosis : chr "FBgn0016977" apposition of dorsal and ventral imaginal disc-derived wing surfaces : chr "FBgn0034326" asymmetric cell division : chr "FBgn0052484" Just using as.data.frame can't convert it as it still stays a list, which is not exportable. I tried to convert the list of lists using: > done.df <- do.call('rbind', lapply(names(done), function(.name){data.frame(done[[.name]], Name=.name)})) But I get the error message that I have different length of rows. Error in data.frame(done[[.name]], Name = .name) : arguments imply differing number of rows: 0, 1 I would like to know if there is a way of exporting a list of lists into a table, or to convert it into a data.frame. Thanks for any help Assa On Mon, Jan 17, 2011 at 16:56, Assa Yeroslaviz <frymor@gmail.com> wrote: > Hi again, > > ok. I solved it. well to be honest, it wasn't that difficult. I just added > > > lst2 <- lapply(list, function(x) unlist(strsplit(*as.character(x)*, > ":")) > > Assa > > > On Mon, Jan 17, 2011 at 16:42, Assa Yeroslaviz <frymor@gmail.com> wrote: > >> >> Hi James, >> >> thanks for the help, but unfortunately I get an error message when running >> the second line >> >> > list <- tapply(1:nrow(dat), dat$flybase_gene_id, function(x) >> dat[x,"GOMF") >> >> > lst2 <- lapply(list, function(x) unlist(strsplit(x, ":")) >> >> Error in strsplit(x, ":") : non-character argument >> >> > str(list) >> List of 13369 >> $ FBgn0000008: Factor w/ 3814 levels "\"1,3-beta-glucan synthase >> activity:transferase activity:transferase activity, transferring glycosyl >> groups\"",..: NA >> $ FBgn0000014: Factor w/ 3814 levels "\"1,3-beta-glucan synthase >> activity:transferase activity:transferase activity, transferring glycosyl >> groups\"",..: 3330 NA >> $ FBgn0000015: Factor w/ 3814 levels "\"1,3-beta-glucan synthase >> activity:transferase activity:transferase activity, transferring glycosyl >> groups\"",..: 2546 880 >> $ FBgn0000017: Factor w/ 3814 levels "\"1,3-beta-glucan synthase >> activity:transferase activity:transferase activity, transferring glycosyl >> groups\"",..: NA 35 >> $ FBgn0000018: Factor w/ 3814 levels "\"1,3-beta-glucan synthase >> activity:transferase activity:transferase activity, transferring glycosyl >> groups\"",..: NA >> $ FBgn0000022: Factor w/ 3814 levels "\"1,3-beta-glucan synthase >> activity:transferase activity:transferase activity, transferring glycosyl >> groups\"",..: 893 >> $ FBgn0000024: Factor w/ 3814 levels "\"1,3-beta-glucan synthase >> activity:transferase activity:transferase activity, transferring glycosyl >> groups\"",..: 2546 >> $ FBgn0000028: Factor w/ 3814 levels "\"1,3-beta-glucan synthase >> activity:transferase activity:transferase activity, transferring glycosyl >> groups\"",..: NA >> >> I tried to convert the factor of the data.frame into characters, but it >> still give me the same error. >> list1 <- data.frame(lapply(list, as.character), stringsAsFactors=FALSE) >> >> Is there a way of converting the lines to characters? >> >> THX >> Assa >> >> >> >> >> Hi Assa, >>> >>> OK, I see your point. This is still pretty easy. >>> >>> lst <- tapply(1:nrow(dat), dat$flybase_gene_id, function(x) dat[x,"GOMF") >>> >>> lst2 <- lapply(lst, function(x) unlist(strsplit(x, ":")) >>> >> >> >> >>> unlst <- cbind(rep(names(lst2), sapply(lst2, length)), unlist(lst2, >>> use.names = FALSE)) >>> >>> done <- tapply(1:nrow(unlst), unlst[,2], function(x) unlst[x,1]) >>> >>> There are assuredly other more elegant ways to do this, but this should >>> suffice. >>> >>> Best, >>> >>> Jim >>> >>> >>> >>> >>> On 1/12/2011 7:28 AM, Assa Yeroslaviz wrote: >>> >>>> Hi James, >>>> >>>> thanks for this idea, but unfortunately it wasn't exactly what I needed. >>>> This kind of transformation I was able to do on my own. Ye problem is, >>>> that >>>> I would like to split the third column into single GO categories. >>>> >>>> this waht I have until now, after applying the tapply command: >>>> "carboxylesterase activity:hydrolase activity:3',5'-cyclic- nucleotide >>>> phosphodiesterase activity:protein binding" FBgn0001128 >>>> aminopeptidase activity:metalloexopeptidase activity:hydrolase >>>> activity:manganese ion binding FBgn0040736 >>>> nucleotide binding:protein binding:ATP binding:chaperone >>>> binding:ammonium >>>> transmembrane transporter activity FBgn0053057,FBgn0035889 >>>> protein binding FBgn0034454 >>>> >>>> What I need is to split the first column (or in the original file the >>>> third >>>> column) in to separate names (in this column these are separated by >>>> ':'). >>>> and concatenate ALL the right IDs to the ALL the right GO categories. >>>> As if to get something like: >>>> carboxylesterase activity FBgn0001128 .... >>>> hydrolase activity FBgn0001128 FBgn0040736 ..... >>>> 3',5'-cyclic-nucleotide phosphodiesterase activity FBgn0001128 .... >>>> protein binding FBgn0001128 FBgn0034454 FBgn0053057 >>>> FBgn0035889 >>>> .... >>>> nucleotide binding FBgn0053057 FBgn0035889 ... >>>> ATP binding FBgn0053057 FBgn0035889 .... >>>> chaperone binding FBgn0053057 FBgn0035889 .... >>>> ammonium transmembrane transporter activity FBgn0053057 >>>> FBgn0035889 .... >>>> aminopeptidase activity FBgn0040736 .... >>>> metalloexopeptidase activity FBgn0040736 .... >>>> manganese ion binding FBgn0040736 .... >>>> .... >>>> >>>> I would appreciate any help on that subject. >>>> >>>> THX >>>> Assa >>>> >>>> On Thu, Jan 6, 2011 at 22:09, James MacDonald<jmacdon@med.umich.edu> >>>> wrote: >>>> >>>> Hi Assa, >>>>> >>>>> I don't think you need a package for that. A call to tapply() followed >>>>> by a >>>>> call to do.call() should get you where you want to go. >>>>> >>>>> Say you read your table into R, and call it 'dat'. >>>>> >>>>> thelist<- tapply(1:nrow(dat), dat$GOMF, function(x) dat[x, 3]) >>>>> >>>>> then you will have a list, with the names being the GOMF and the list >>>>> items >>>>> being all the gene ids. Collapsing that to a matrix is difficult >>>>> because you >>>>> will have different numbers of columns. So you can either collapse all >>>>> the >>>>> list items using commas, or directly write out to a file. Collapsing >>>>> with >>>>> commas is easy: >>>>> >>>>> commalist<- lapply(thelist, paste, collapse = ",") >>>>> avector<- do.call("c", commalist) >>>>> names(vector)<- names(commalist) >>>>> >>>>> or you could just write out to a file using something like >>>>> >>>>> con<- file("mydata.txt", "w") >>>>> >>>>> for(i in seq(along = commalist)) cat(names(commalist)[i], >>>>> commalist[[i]], >>>>> "\n", sep = "\t", file = con) >>>>> >>>>> close(con) >>>>> >>>>> All untested, so you might have to fiddle a bit to get the results you >>>>> want. >>>>> >>>>> Best, >>>>> >>>>> Jim >>>>> >>>>> James W. MacDonald, M.S. >>>>> Biostatistician >>>>> Douglas Lab >>>>> 5912 Buhl >>>>> 1241 E. Catherine St. >>>>> Ann Arbor MI 48109-5618 >>>>> 734-615-7826 >>>>> >>>>>> Assa Yeroslaviz 01/06/11 1:02 PM>>> >>>>>>>> >>>>>>> Hi, everybody, >>>>> >>>>> I was wondering whether there is a package to cluster a list of genes >>>>> to >>>>> different GO categories >>>>> >>>>> my problem is as such: >>>>> i have a list of genes (a tab delimited file): >>>>> id flybasename_gene flybase_gene_id entrezgene GOMF >>>>> >>>>> 1616608_a_at Gpdh FBgn0001128 33824 carboxylesterase >>>>> activity >>>>> hydrolase activity 3',5'-cyclic-nucleotide phosphodiesterase >>>>> activity >>>>> protein binding >>>>> 1622892_s_at CG33057 FBgn0053057 318833 nucleotide binding >>>>> protein binding ATP binding chaperone binding ammonium >>>>> transmembrane transporter activity >>>>> 1622892_s_at mkg-p FBgn0035889 38955 nucleotide binding >>>>> protein binding ATP binding chaperone binding ammonium >>>>> transmembrane transporter activity >>>>> 1622893_at IM3 FBgn0040736 50209 aminopeptidase activity >>>>> metalloexopeptidase activity hydrolase activity manganese ion >>>>> bindin >>>>> 1622894_at CG15120 FBgn0034454 37248 protein binding >>>>> >>>>> I would like to try and group the genes in various GO categories, which >>>>> are >>>>> mentioned here in the last columns. The GO categories take more than >>>>> one >>>>> column and the number is not equal in each line, deending on the depth >>>>> of >>>>> the annotation for each gene. >>>>> Is there a way of transforming the table, so that I in the first column >>>>> a >>>>> list of my GO categories and than on each line a list with gene IDs >>>>> (the >>>>> right ID are not important as I can change them as I wish). >>>>> I would like to have something like that: >>>>> GO genes >>>>> protein binding FBgn0001128 FBgn0053057 FBgn0035889 etc. >>>>> ammonium transmembrane transporter activity FBgn0053057 >>>>> FBgn0035889 >>>>> hydrolayse activity FBgn0040736 FBgn0001128 >>>>> >>>>> >>>>> I would appriciate any kind of help or ideas >>>>> >>>>> Thanks >>>>> Assa >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor@r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>>> ********************************************************** >>>>> Electronic Mail is not secure, may not be read every day, and should >>>>> not be >>>>> used for urgent or sensitive issues >>>>> >>>>> >>>>> >>>> >>> -- >>> James W. MacDonald, M.S. >>> Biostatistician >>> Douglas Lab >>> University of Michigan >>> Department of Human Genetics >>> >>> 5912 Buhl >>> 1241 E. Catherine St. >>> Ann Arbor MI 48109-5618 >>> 734-615-7826 >>> ********************************************************** >>> Electronic Mail is not secure, may not be read every day, and should not >>> be used for urgent or sensitive issues >>> >> >> > [[alternative HTML version deleted]]

ADD REPLY • link 15.0 years ago Assa Yeroslaviz ★ 1.5k

0

Entering edit mode

Hi Assa, On 1/24/2011 4:52 AM, Assa Yeroslaviz wrote: > Hello James and Bioconductor users, > > It starts to look better now. here is a short summary of my script: > dat<- changedGenes.sub# changedGenes.sub is the complete data from the file > FB_simulated_contrasts for Luke > > lst<- tapply(1:nrow(dat), dat$flybase_gene_id, function(x) > dat[x,"bioProc"]) > > lst2<- lapply(lst, function(x) unlist(strsplit(as.character(x), ":"))) > unlst<- cbind(rep(names(lst2), sapply(lst2, length)), unlist(lst2, > use.names = FALSE)) > > done<- tapply(1:nrow(unlst), unlst[,2], function(x) unlst[x,1]) The only way you will be able to get this into a data.frame is if you have a consistent number of columns. Since you can have an arbitrary number of Flybase genes associated with a particular GO term, you have to collapse each list item to length one. This is easy enough to do, just collapse to a single string, separated by commas. done <- lapply(done, paste, collapse = ",") out <- data.frame(GO = names(done), FBgn = unlist(done)) Best, Jim > > The result I get is a list of lists: >> str(done) > List of 103 > $ > : chr [1:4] "FBgn0010359" "FBgn0021800" "FBgn0031420" "FBgn0034345" > $ actin cytoskeleton > organization : chr > "FBgn0000318" > $ actin filament > organization : > chr "FBgn0000318" > $ adenosine to inosine > editing : chr > "FBgn0044510" > $ adult > behavior > : chr "FBgn0044510" > $ adult locomotory > behavior : chr > "FBgn0044510" > $ antimicrobial humoral > response : chr > "FBgn0000318" > $ > apoptosis > : chr "FBgn0016977" > $ apposition of dorsal and ventral imaginal disc-derived wing > surfaces : chr "FBgn0034326" > $ asymmetric cell > division : > chr "FBgn0052484" > ... > > Unfortunately I can't find a way of converting this list of lists into an > exportable table/file to work with. > What I would like to have is the same is in the same form as this list of > lists, but as a data.frame with two columns. > like that (this is a *hypothetical object,* which I couldn't generate until > now) : >> done > GO > category > Gene_IDs > no category > > : chr [1:4] "FBgn0010359" "FBgn0021800" "FBgn0031420" "FBgn0034345" > actin cytoskeleton > organization : chr > "FBgn0000318" > actin filament > organization : > chr "FBgn0000318" > adenosine to inosine > editing : chr > "FBgn0044510" > adult > behavior > : chr "FBgn0044510" > adult locomotory > behavior : chr > "FBgn0044510" > antimicrobial humoral > response : chr > "FBgn0000318" > apoptosis > : chr "FBgn0016977" > apposition of dorsal and ventral imaginal disc-derived wing > surfaces : chr "FBgn0034326" > asymmetric cell > division : > chr "FBgn0052484" > > Just using as.data.frame can't convert it as it still stays a list, which is > not exportable. > I tried to convert the list of lists using: >> done.df<- do.call('rbind', lapply(names(done), > function(.name){data.frame(done[[.name]], Name=.name)})) > > But I get the error message that I have different length of rows. > Error in data.frame(done[[.name]], Name = .name) : > arguments imply differing number of rows: 0, 1 > > I would like to know if there is a way of exporting a list of lists into a > table, or to convert it into a data.frame. > > Thanks for any help > > Assa > > > On Mon, Jan 17, 2011 at 16:56, Assa Yeroslaviz<frymor at="" gmail.com=""> wrote: > >> Hi again, >> >> ok. I solved it. well to be honest, it wasn't that difficult. I just added >> >>> lst2<- lapply(list, function(x) unlist(strsplit(*as.character(x)*, >> ":")) >> >> Assa >> >> >> On Mon, Jan 17, 2011 at 16:42, Assa Yeroslaviz<frymor at="" gmail.com=""> wrote: >> >>> >>> Hi James, >>> >>> thanks for the help, but unfortunately I get an error message when running >>> the second line >>> >>>> list<- tapply(1:nrow(dat), dat$flybase_gene_id, function(x) >>> dat[x,"GOMF") >>> >>>> lst2<- lapply(list, function(x) unlist(strsplit(x, ":")) >>> >>> Error in strsplit(x, ":") : non-character argument >>> >>>> str(list) >>> List of 13369 >>> $ FBgn0000008: Factor w/ 3814 levels "\"1,3-beta-glucan synthase >>> activity:transferase activity:transferase activity, transferring glycosyl >>> groups\"",..: NA >>> $ FBgn0000014: Factor w/ 3814 levels "\"1,3-beta-glucan synthase >>> activity:transferase activity:transferase activity, transferring glycosyl >>> groups\"",..: 3330 NA >>> $ FBgn0000015: Factor w/ 3814 levels "\"1,3-beta-glucan synthase >>> activity:transferase activity:transferase activity, transferring glycosyl >>> groups\"",..: 2546 880 >>> $ FBgn0000017: Factor w/ 3814 levels "\"1,3-beta-glucan synthase >>> activity:transferase activity:transferase activity, transferring glycosyl >>> groups\"",..: NA 35 >>> $ FBgn0000018: Factor w/ 3814 levels "\"1,3-beta-glucan synthase >>> activity:transferase activity:transferase activity, transferring glycosyl >>> groups\"",..: NA >>> $ FBgn0000022: Factor w/ 3814 levels "\"1,3-beta-glucan synthase >>> activity:transferase activity:transferase activity, transferring glycosyl >>> groups\"",..: 893 >>> $ FBgn0000024: Factor w/ 3814 levels "\"1,3-beta-glucan synthase >>> activity:transferase activity:transferase activity, transferring glycosyl >>> groups\"",..: 2546 >>> $ FBgn0000028: Factor w/ 3814 levels "\"1,3-beta-glucan synthase >>> activity:transferase activity:transferase activity, transferring glycosyl >>> groups\"",..: NA >>> >>> I tried to convert the factor of the data.frame into characters, but it >>> still give me the same error. >>> list1<- data.frame(lapply(list, as.character), stringsAsFactors=FALSE) >>> >>> Is there a way of converting the lines to characters? >>> >>> THX >>> Assa >>> >>> >>> >>> >>> Hi Assa, >>>> >>>> OK, I see your point. This is still pretty easy. >>>> >>>> lst<- tapply(1:nrow(dat), dat$flybase_gene_id, function(x) dat[x,"GOMF") >>>> >>>> lst2<- lapply(lst, function(x) unlist(strsplit(x, ":")) >>>> >>> >>> >>> >>>> unlst<- cbind(rep(names(lst2), sapply(lst2, length)), unlist(lst2, >>>> use.names = FALSE)) >>>> >>>> done<- tapply(1:nrow(unlst), unlst[,2], function(x) unlst[x,1]) >>>> >>>> There are assuredly other more elegant ways to do this, but this should >>>> suffice. >>>> >>>> Best, >>>> >>>> Jim >>>> >>>> >>>> >>>> >>>> On 1/12/2011 7:28 AM, Assa Yeroslaviz wrote: >>>> >>>>> Hi James, >>>>> >>>>> thanks for this idea, but unfortunately it wasn't exactly what I needed. >>>>> This kind of transformation I was able to do on my own. Ye problem is, >>>>> that >>>>> I would like to split the third column into single GO categories. >>>>> >>>>> this waht I have until now, after applying the tapply command: >>>>> "carboxylesterase activity:hydrolase activity:3',5'-cyclic- nucleotide >>>>> phosphodiesterase activity:protein binding" FBgn0001128 >>>>> aminopeptidase activity:metalloexopeptidase activity:hydrolase >>>>> activity:manganese ion binding FBgn0040736 >>>>> nucleotide binding:protein binding:ATP binding:chaperone >>>>> binding:ammonium >>>>> transmembrane transporter activity FBgn0053057,FBgn0035889 >>>>> protein binding FBgn0034454 >>>>> >>>>> What I need is to split the first column (or in the original file the >>>>> third >>>>> column) in to separate names (in this column these are separated by >>>>> ':'). >>>>> and concatenate ALL the right IDs to the ALL the right GO categories. >>>>> As if to get something like: >>>>> carboxylesterase activity FBgn0001128 .... >>>>> hydrolase activity FBgn0001128 FBgn0040736 ..... >>>>> 3',5'-cyclic-nucleotide phosphodiesterase activity FBgn0001128 .... >>>>> protein binding FBgn0001128 FBgn0034454 FBgn0053057 >>>>> FBgn0035889 >>>>> .... >>>>> nucleotide binding FBgn0053057 FBgn0035889 ... >>>>> ATP binding FBgn0053057 FBgn0035889 .... >>>>> chaperone binding FBgn0053057 FBgn0035889 .... >>>>> ammonium transmembrane transporter activity FBgn0053057 >>>>> FBgn0035889 .... >>>>> aminopeptidase activity FBgn0040736 .... >>>>> metalloexopeptidase activity FBgn0040736 .... >>>>> manganese ion binding FBgn0040736 .... >>>>> .... >>>>> >>>>> I would appreciate any help on that subject. >>>>> >>>>> THX >>>>> Assa >>>>> >>>>> On Thu, Jan 6, 2011 at 22:09, James MacDonald<jmacdon at="" med.umich.edu=""> >>>>> wrote: >>>>> >>>>> Hi Assa, >>>>>> >>>>>> I don't think you need a package for that. A call to tapply() followed >>>>>> by a >>>>>> call to do.call() should get you where you want to go. >>>>>> >>>>>> Say you read your table into R, and call it 'dat'. >>>>>> >>>>>> thelist<- tapply(1:nrow(dat), dat$GOMF, function(x) dat[x, 3]) >>>>>> >>>>>> then you will have a list, with the names being the GOMF and the list >>>>>> items >>>>>> being all the gene ids. Collapsing that to a matrix is difficult >>>>>> because you >>>>>> will have different numbers of columns. So you can either collapse all >>>>>> the >>>>>> list items using commas, or directly write out to a file. Collapsing >>>>>> with >>>>>> commas is easy: >>>>>> >>>>>> commalist<- lapply(thelist, paste, collapse = ",") >>>>>> avector<- do.call("c", commalist) >>>>>> names(vector)<- names(commalist) >>>>>> >>>>>> or you could just write out to a file using something like >>>>>> >>>>>> con<- file("mydata.txt", "w") >>>>>> >>>>>> for(i in seq(along = commalist)) cat(names(commalist)[i], >>>>>> commalist[[i]], >>>>>> "\n", sep = "\t", file = con) >>>>>> >>>>>> close(con) >>>>>> >>>>>> All untested, so you might have to fiddle a bit to get the results you >>>>>> want. >>>>>> >>>>>> Best, >>>>>> >>>>>> Jim >>>>>> >>>>>> James W. MacDonald, M.S. >>>>>> Biostatistician >>>>>> Douglas Lab >>>>>> 5912 Buhl >>>>>> 1241 E. Catherine St. >>>>>> Ann Arbor MI 48109-5618 >>>>>> 734-615-7826 >>>>>> >>>>>>> Assa Yeroslaviz 01/06/11 1:02 PM>>> >>>>>>>>> >>>>>>>> Hi, everybody, >>>>>> >>>>>> I was wondering whether there is a package to cluster a list of genes >>>>>> to >>>>>> different GO categories >>>>>> >>>>>> my problem is as such: >>>>>> i have a list of genes (a tab delimited file): >>>>>> id flybasename_gene flybase_gene_id entrezgene GOMF >>>>>> >>>>>> 1616608_a_at Gpdh FBgn0001128 33824 carboxylesterase >>>>>> activity >>>>>> hydrolase activity 3',5'-cyclic-nucleotide phosphodiesterase >>>>>> activity >>>>>> protein binding >>>>>> 1622892_s_at CG33057 FBgn0053057 318833 nucleotide binding >>>>>> protein binding ATP binding chaperone binding ammonium >>>>>> transmembrane transporter activity >>>>>> 1622892_s_at mkg-p FBgn0035889 38955 nucleotide binding >>>>>> protein binding ATP binding chaperone binding ammonium >>>>>> transmembrane transporter activity >>>>>> 1622893_at IM3 FBgn0040736 50209 aminopeptidase activity >>>>>> metalloexopeptidase activity hydrolase activity manganese ion >>>>>> bindin >>>>>> 1622894_at CG15120 FBgn0034454 37248 protein binding >>>>>> >>>>>> I would like to try and group the genes in various GO categories, which >>>>>> are >>>>>> mentioned here in the last columns. The GO categories take more than >>>>>> one >>>>>> column and the number is not equal in each line, deending on the depth >>>>>> of >>>>>> the annotation for each gene. >>>>>> Is there a way of transforming the table, so that I in the first column >>>>>> a >>>>>> list of my GO categories and than on each line a list with gene IDs >>>>>> (the >>>>>> right ID are not important as I can change them as I wish). >>>>>> I would like to have something like that: >>>>>> GO genes >>>>>> protein binding FBgn0001128 FBgn0053057 FBgn0035889 etc. >>>>>> ammonium transmembrane transporter activity FBgn0053057 >>>>>> FBgn0035889 >>>>>> hydrolayse activity FBgn0040736 FBgn0001128 >>>>>> >>>>>> >>>>>> I would appriciate any kind of help or ideas >>>>>> >>>>>> Thanks >>>>>> Assa >>>>>> >>>>>> [[alternative HTML version deleted]] >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor at r-project.org >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: >>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>> >>>>>> ********************************************************** >>>>>> Electronic Mail is not secure, may not be read every day, and should >>>>>> not be >>>>>> used for urgent or sensitive issues >>>>>> >>>>>> >>>>>> >>>>> >>>> -- >>>> James W. MacDonald, M.S. >>>> Biostatistician >>>> Douglas Lab >>>> University of Michigan >>>> Department of Human Genetics >>>> >>>> 5912 Buhl >>>> 1241 E. Catherine St. >>>> Ann Arbor MI 48109-5618 >>>> 734-615-7826 >>>> ********************************************************** >>>> Electronic Mail is not secure, may not be read every day, and should not >>>> be used for urgent or sensitive issues >>>> >>> >>> >> > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues

ADD REPLY • link 15.0 years ago James W. MacDonald 68k

0

Entering edit mode

On 01/24/2011 06:37 AM, James W. MacDonald wrote: > Hi Assa, > > On 1/24/2011 4:52 AM, Assa Yeroslaviz wrote: >> Hello James and Bioconductor users, >> >> It starts to look better now. here is a short summary of my script: >> dat<- changedGenes.sub# changedGenes.sub is the complete data from the >> file >> FB_simulated_contrasts for Luke >> >> lst<- tapply(1:nrow(dat), dat$flybase_gene_id, function(x) >> dat[x,"bioProc"]) >> >> lst2<- lapply(lst, function(x) unlist(strsplit(as.character(x), ":"))) >> unlst<- cbind(rep(names(lst2), sapply(lst2, length)), unlist(lst2, >> use.names = FALSE)) >> >> done<- tapply(1:nrow(unlst), unlst[,2], function(x) unlst[x,1]) > > The only way you will be able to get this into a data.frame is if you > have a consistent number of columns. Since you can have an arbitrary > number of Flybase genes associated with a particular GO term, you have > to collapse each list item to length one. > > This is easy enough to do, just collapse to a single string, separated > by commas. > > done <- lapply(done, paste, collapse = ",") > out <- data.frame(GO = names(done), FBgn = unlist(done)) > > Best, > > Jim > > >> >> The result I get is a list of lists: >>> str(done) >> List of 103 >> $ >> : chr [1:4] "FBgn0010359" "FBgn0021800" "FBgn0031420" "FBgn0034345" >> $ actin cytoskeleton >> organization : >> chr >> "FBgn0000318" >> $ actin filament >> organization >> : >> chr "FBgn0000318" Jumping in in the middle so perhaps not understanding, but... To create a flat data frame that contains these data in a 'denormalized' form you might len <- sapply(done, length) data.frame(Term=rep(names(done), len), unlist(done, use.names=FALSE) use.names=FALSE is an efficiency that likely does not make a difference in the current situation; it might be necessary to first filter out elements that are not NULL, e.g., Filter(Negate(is.null), done) Martin >> $ adenosine to inosine >> editing : chr >> "FBgn0044510" >> $ adult >> behavior >> : chr "FBgn0044510" >> $ adult locomotory >> behavior >> : chr >> "FBgn0044510" >> $ antimicrobial humoral >> response : chr >> "FBgn0000318" >> $ >> apoptosis >> : chr "FBgn0016977" >> $ apposition of dorsal and ventral imaginal disc-derived wing >> surfaces : chr "FBgn0034326" >> $ asymmetric cell >> division : >> chr "FBgn0052484" >> ... >> >> Unfortunately I can't find a way of converting this list of lists into an >> exportable table/file to work with. >> What I would like to have is the same is in the same form as this list of >> lists, but as a data.frame with two columns. >> like that (this is a *hypothetical object,* which I couldn't generate >> until >> now) : >>> done >> GO >> category >> Gene_IDs >> no category >> >> : chr [1:4] "FBgn0010359" "FBgn0021800" "FBgn0031420" "FBgn0034345" >> actin cytoskeleton >> organization : >> chr >> "FBgn0000318" >> actin filament >> organization >> : >> chr "FBgn0000318" >> adenosine to inosine >> editing : chr >> "FBgn0044510" >> adult >> behavior >> : chr "FBgn0044510" >> adult locomotory >> behavior >> : chr >> "FBgn0044510" >> antimicrobial humoral >> response : chr >> "FBgn0000318" >> apoptosis >> : chr "FBgn0016977" >> apposition of dorsal and ventral imaginal disc-derived wing >> surfaces : chr "FBgn0034326" >> asymmetric cell >> division : >> chr "FBgn0052484" >> >> Just using as.data.frame can't convert it as it still stays a list, >> which is >> not exportable. >> I tried to convert the list of lists using: >>> done.df<- do.call('rbind', lapply(names(done), >> function(.name){data.frame(done[[.name]], Name=.name)})) >> >> But I get the error message that I have different length of rows. >> Error in data.frame(done[[.name]], Name = .name) : >> arguments imply differing number of rows: 0, 1 >> >> I would like to know if there is a way of exporting a list of lists >> into a >> table, or to convert it into a data.frame. >> >> Thanks for any help >> >> Assa >> >> >> On Mon, Jan 17, 2011 at 16:56, Assa Yeroslaviz<frymor at="" gmail.com=""> wrote: >> >>> Hi again, >>> >>> ok. I solved it. well to be honest, it wasn't that difficult. I just >>> added >>> >>>> lst2<- lapply(list, function(x) unlist(strsplit(*as.character(x)*, >>> ":")) >>> >>> Assa >>> >>> >>> On Mon, Jan 17, 2011 at 16:42, Assa Yeroslaviz<frymor at="" gmail.com=""> wrote: >>> >>>> >>>> Hi James, >>>> >>>> thanks for the help, but unfortunately I get an error message when >>>> running >>>> the second line >>>> >>>>> list<- tapply(1:nrow(dat), dat$flybase_gene_id, function(x) >>>> dat[x,"GOMF") >>>> >>>>> lst2<- lapply(list, function(x) unlist(strsplit(x, ":")) >>>> >>>> Error in strsplit(x, ":") : non-character argument >>>> >>>>> str(list) >>>> List of 13369 >>>> $ FBgn0000008: Factor w/ 3814 levels "\"1,3-beta-glucan synthase >>>> activity:transferase activity:transferase activity, transferring >>>> glycosyl >>>> groups\"",..: NA >>>> $ FBgn0000014: Factor w/ 3814 levels "\"1,3-beta-glucan synthase >>>> activity:transferase activity:transferase activity, transferring >>>> glycosyl >>>> groups\"",..: 3330 NA >>>> $ FBgn0000015: Factor w/ 3814 levels "\"1,3-beta-glucan synthase >>>> activity:transferase activity:transferase activity, transferring >>>> glycosyl >>>> groups\"",..: 2546 880 >>>> $ FBgn0000017: Factor w/ 3814 levels "\"1,3-beta-glucan synthase >>>> activity:transferase activity:transferase activity, transferring >>>> glycosyl >>>> groups\"",..: NA 35 >>>> $ FBgn0000018: Factor w/ 3814 levels "\"1,3-beta-glucan synthase >>>> activity:transferase activity:transferase activity, transferring >>>> glycosyl >>>> groups\"",..: NA >>>> $ FBgn0000022: Factor w/ 3814 levels "\"1,3-beta-glucan synthase >>>> activity:transferase activity:transferase activity, transferring >>>> glycosyl >>>> groups\"",..: 893 >>>> $ FBgn0000024: Factor w/ 3814 levels "\"1,3-beta-glucan synthase >>>> activity:transferase activity:transferase activity, transferring >>>> glycosyl >>>> groups\"",..: 2546 >>>> $ FBgn0000028: Factor w/ 3814 levels "\"1,3-beta-glucan synthase >>>> activity:transferase activity:transferase activity, transferring >>>> glycosyl >>>> groups\"",..: NA >>>> >>>> I tried to convert the factor of the data.frame into characters, but it >>>> still give me the same error. >>>> list1<- data.frame(lapply(list, as.character), stringsAsFactors=FALSE) >>>> >>>> Is there a way of converting the lines to characters? >>>> >>>> THX >>>> Assa >>>> >>>> >>>> >>>> >>>> Hi Assa, >>>>> >>>>> OK, I see your point. This is still pretty easy. >>>>> >>>>> lst<- tapply(1:nrow(dat), dat$flybase_gene_id, function(x) >>>>> dat[x,"GOMF") >>>>> >>>>> lst2<- lapply(lst, function(x) unlist(strsplit(x, ":")) >>>>> >>>> >>>> >>>> >>>>> unlst<- cbind(rep(names(lst2), sapply(lst2, length)), unlist(lst2, >>>>> use.names = FALSE)) >>>>> >>>>> done<- tapply(1:nrow(unlst), unlst[,2], function(x) unlst[x,1]) >>>>> >>>>> There are assuredly other more elegant ways to do this, but this >>>>> should >>>>> suffice. >>>>> >>>>> Best, >>>>> >>>>> Jim >>>>> >>>>> >>>>> >>>>> >>>>> On 1/12/2011 7:28 AM, Assa Yeroslaviz wrote: >>>>> >>>>>> Hi James, >>>>>> >>>>>> thanks for this idea, but unfortunately it wasn't exactly what I >>>>>> needed. >>>>>> This kind of transformation I was able to do on my own. Ye problem >>>>>> is, >>>>>> that >>>>>> I would like to split the third column into single GO categories. >>>>>> >>>>>> this waht I have until now, after applying the tapply command: >>>>>> "carboxylesterase activity:hydrolase activity:3',5'-cyclic- nucleotide >>>>>> phosphodiesterase activity:protein binding" FBgn0001128 >>>>>> aminopeptidase activity:metalloexopeptidase activity:hydrolase >>>>>> activity:manganese ion binding FBgn0040736 >>>>>> nucleotide binding:protein binding:ATP binding:chaperone >>>>>> binding:ammonium >>>>>> transmembrane transporter activity FBgn0053057,FBgn0035889 >>>>>> protein binding FBgn0034454 >>>>>> >>>>>> What I need is to split the first column (or in the original file the >>>>>> third >>>>>> column) in to separate names (in this column these are separated by >>>>>> ':'). >>>>>> and concatenate ALL the right IDs to the ALL the right GO categories. >>>>>> As if to get something like: >>>>>> carboxylesterase activity FBgn0001128 .... >>>>>> hydrolase activity FBgn0001128 FBgn0040736 ..... >>>>>> 3',5'-cyclic-nucleotide phosphodiesterase activity >>>>>> FBgn0001128 .... >>>>>> protein binding FBgn0001128 FBgn0034454 FBgn0053057 >>>>>> FBgn0035889 >>>>>> .... >>>>>> nucleotide binding FBgn0053057 FBgn0035889 ... >>>>>> ATP binding FBgn0053057 FBgn0035889 .... >>>>>> chaperone binding FBgn0053057 FBgn0035889 .... >>>>>> ammonium transmembrane transporter activity FBgn0053057 >>>>>> FBgn0035889 .... >>>>>> aminopeptidase activity FBgn0040736 .... >>>>>> metalloexopeptidase activity FBgn0040736 .... >>>>>> manganese ion binding FBgn0040736 .... >>>>>> .... >>>>>> >>>>>> I would appreciate any help on that subject. >>>>>> >>>>>> THX >>>>>> Assa >>>>>> >>>>>> On Thu, Jan 6, 2011 at 22:09, James MacDonald<jmacdon at="" med.umich.edu=""> >>>>>> wrote: >>>>>> >>>>>> Hi Assa, >>>>>>> >>>>>>> I don't think you need a package for that. A call to tapply() >>>>>>> followed >>>>>>> by a >>>>>>> call to do.call() should get you where you want to go. >>>>>>> >>>>>>> Say you read your table into R, and call it 'dat'. >>>>>>> >>>>>>> thelist<- tapply(1:nrow(dat), dat$GOMF, function(x) dat[x, 3]) >>>>>>> >>>>>>> then you will have a list, with the names being the GOMF and the >>>>>>> list >>>>>>> items >>>>>>> being all the gene ids. Collapsing that to a matrix is difficult >>>>>>> because you >>>>>>> will have different numbers of columns. So you can either >>>>>>> collapse all >>>>>>> the >>>>>>> list items using commas, or directly write out to a file. Collapsing >>>>>>> with >>>>>>> commas is easy: >>>>>>> >>>>>>> commalist<- lapply(thelist, paste, collapse = ",") >>>>>>> avector<- do.call("c", commalist) >>>>>>> names(vector)<- names(commalist) >>>>>>> >>>>>>> or you could just write out to a file using something like >>>>>>> >>>>>>> con<- file("mydata.txt", "w") >>>>>>> >>>>>>> for(i in seq(along = commalist)) cat(names(commalist)[i], >>>>>>> commalist[[i]], >>>>>>> "\n", sep = "\t", file = con) >>>>>>> >>>>>>> close(con) >>>>>>> >>>>>>> All untested, so you might have to fiddle a bit to get the >>>>>>> results you >>>>>>> want. >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Jim >>>>>>> >>>>>>> James W. MacDonald, M.S. >>>>>>> Biostatistician >>>>>>> Douglas Lab >>>>>>> 5912 Buhl >>>>>>> 1241 E. Catherine St. >>>>>>> Ann Arbor MI 48109-5618 >>>>>>> 734-615-7826 >>>>>>> >>>>>>>> Assa Yeroslaviz 01/06/11 1:02 PM>>> >>>>>>>>>> >>>>>>>>> Hi, everybody, >>>>>>> >>>>>>> I was wondering whether there is a package to cluster a list of >>>>>>> genes >>>>>>> to >>>>>>> different GO categories >>>>>>> >>>>>>> my problem is as such: >>>>>>> i have a list of genes (a tab delimited file): >>>>>>> id flybasename_gene flybase_gene_id entrezgene GOMF >>>>>>> >>>>>>> 1616608_a_at Gpdh FBgn0001128 33824 carboxylesterase >>>>>>> activity >>>>>>> hydrolase activity 3',5'-cyclic-nucleotide phosphodiesterase >>>>>>> activity >>>>>>> protein binding >>>>>>> 1622892_s_at CG33057 FBgn0053057 318833 nucleotide >>>>>>> binding >>>>>>> protein binding ATP binding chaperone binding ammonium >>>>>>> transmembrane transporter activity >>>>>>> 1622892_s_at mkg-p FBgn0035889 38955 nucleotide binding >>>>>>> protein binding ATP binding chaperone binding ammonium >>>>>>> transmembrane transporter activity >>>>>>> 1622893_at IM3 FBgn0040736 50209 aminopeptidase activity >>>>>>> metalloexopeptidase activity hydrolase activity manganese ion >>>>>>> bindin >>>>>>> 1622894_at CG15120 FBgn0034454 37248 protein binding >>>>>>> >>>>>>> I would like to try and group the genes in various GO categories, >>>>>>> which >>>>>>> are >>>>>>> mentioned here in the last columns. The GO categories take more than >>>>>>> one >>>>>>> column and the number is not equal in each line, deending on the >>>>>>> depth >>>>>>> of >>>>>>> the annotation for each gene. >>>>>>> Is there a way of transforming the table, so that I in the first >>>>>>> column >>>>>>> a >>>>>>> list of my GO categories and than on each line a list with gene IDs >>>>>>> (the >>>>>>> right ID are not important as I can change them as I wish). >>>>>>> I would like to have something like that: >>>>>>> GO genes >>>>>>> protein binding FBgn0001128 FBgn0053057 FBgn0035889 etc. >>>>>>> ammonium transmembrane transporter activity FBgn0053057 >>>>>>> FBgn0035889 >>>>>>> hydrolayse activity FBgn0040736 FBgn0001128 >>>>>>> >>>>>>> >>>>>>> I would appriciate any kind of help or ideas >>>>>>> >>>>>>> Thanks >>>>>>> Assa >>>>>>> >>>>>>> [[alternative HTML version deleted]] >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioconductor mailing list >>>>>>> Bioconductor at r-project.org >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>>> Search the archives: >>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>>> >>>>>>> ********************************************************** >>>>>>> Electronic Mail is not secure, may not be read every day, and should >>>>>>> not be >>>>>>> used for urgent or sensitive issues >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> -- >>>>> James W. MacDonald, M.S. >>>>> Biostatistician >>>>> Douglas Lab >>>>> University of Michigan >>>>> Department of Human Genetics >>>>> >>>>> 5912 Buhl >>>>> 1241 E. Catherine St. >>>>> Ann Arbor MI 48109-5618 >>>>> 734-615-7826 >>>>> ********************************************************** >>>>> Electronic Mail is not secure, may not be read every day, and >>>>> should not >>>>> be used for urgent or sensitive issues >>>>> >>>> >>>> >>> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793

ADD REPLY • link 15.0 years ago Martin Morgan 25k

Login before adding your answer.