Print to txt file in specific way
1
0
Entering edit mode
b.nota ▴ 370
@bnota-7379
Last seen 4.2 years ago
Netherlands

Hello,

I have an object with GO numbers (values), and their genes (ind) which is made from the getgo function from goseq:

all23kGos <- stack(getgo(assayed.genes,'hg19','geneSymbol'))

sorted<-all23kGos[order(all23kGos$values),]

> tail(sorted,10)
            values    ind
1755978 GO:2001300 ALOX15
227119  GO:2001301 ALOX12
1755979 GO:2001301 ALOX15
227120  GO:2001302 ALOX12
1755980 GO:2001302 ALOX15
227121  GO:2001303 ALOX12
1755981 GO:2001303 ALOX15
227122  GO:2001304 ALOX12
227123  GO:2001306 ALOX12
895887  GO:2001311   ACP6

Now I would like to print these genes to a tab. delim. txt file, in such a way that every row contains all genes from one GO term separated by tabs. In the end all GO terms have their own row containing all the genes. 

Is this possible with an easy one liner, or do I have to make a complicated for loop for this?

Thanks in advance!

Ben

goseq sink cat print • 1.8k views
ADD COMMENT
1
Entering edit mode
Mike Smith ★ 6.6k
@mike-smith
Last seen 6 hours ago
EMBL Heidelberg

You can do this using dplyr and grouping by the GO value, e.g.

library(dplyr)
reshaped <- group_by(sorted, values) %>% summarise(genes = paste(ind, collapse = "\t"))
write.table(reshaped, file = "/tmp/tmp.txt", sep = "\t", quote = FALSE, row.names = FALSE, col.names = FALSE)
GO:2001300    ALOX15
GO:2001301    ALOX12    ALOX15
GO:2001302    ALOX12    ALOX15
GO:2001303    ALOX12    ALOX15
GO:2001304    ALOX12
GO:2001306    ALOX12
GO:2001311    ACP6
ADD COMMENT
0
Entering edit mode

Thank you, that's a great solution. Is it possible to easily squeeze a column with term description between the GO value and the genes? From another object called newdf

head(newdf)
     values       term                                                        
[1,] "GO:0048518" "positive regulation of biological process"                 
[2,] "GO:0048519" "negative regulation of biological process"                 
[3,] "GO:0019222" "regulation of metabolic process"                           
[4,] "GO:0009605" "response to external stimulus"                             
[5,] "GO:0048522" "positive regulation of cellular process"                   
[6,] "GO:0051173" "positive regulation of nitrogen compound metabolic process"

 

ADD REPLY
1
Entering edit mode

Sure. Assuming you don't really have the quotes around the GO terms in your new data.frame, you can merge them using left_join()

added.terms <- left_join(reshaped, newdf, by = "values")
## this won't have to columns in the order you'd like, so change them
added.terms <- added.terms[,c(1,3,2)]

If you do have the quotes then I think you'll need to strip them before the matching will work.

ADD REPLY
0
Entering edit mode

I first got an error with the suggestion that I add copy=T, and that worked fine!

Thanks for your help!

ADD REPLY
0
Entering edit mode

Hi, I was wondering where did you get that "term" column for each of the GO values? Thank you!

ADD REPLY
0
Entering edit mode

The term column comes from the output you get from the goseq function itself, but only with later versions of goseq.

ADD REPLY

Login before adding your answer.

Traffic: 642 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6