Question

Print to txt file in specific way

0

Entering edit mode

b.nota ▴ 370

@bnota-7379

Last seen 4.4 years ago

Netherlands

Hello,

I have an object with GO numbers (values), and their genes (ind) which is made from the getgo function from goseq:

all23kGos <- stack(getgo(assayed.genes,'hg19','geneSymbol'))

sorted<-all23kGos[order(all23kGos$values),]

> tail(sorted,10)
            values    ind
1755978 GO:2001300 ALOX15
227119  GO:2001301 ALOX12
1755979 GO:2001301 ALOX15
227120  GO:2001302 ALOX12
1755980 GO:2001302 ALOX15
227121  GO:2001303 ALOX12
1755981 GO:2001303 ALOX15
227122  GO:2001304 ALOX12
227123  GO:2001306 ALOX12
895887  GO:2001311   ACP6

Now I would like to print these genes to a tab. delim. txt file, in such a way that every row contains all genes from one GO term separated by tabs. In the end all GO terms have their own row containing all the genes.

Is this possible with an easy one liner, or do I have to make a complicated for loop for this?

Thanks in advance!

Ben

goseq sink cat print • 1.8k views

ADD COMMENT • link updated 9.1 years ago by Mike Smith ★ 6.6k • written 9.1 years ago by b.nota ▴ 370

score 1 · Answer 1 · 2016-01-19

1

Entering edit mode

Mike Smith ★ 6.6k

@mike-smith

Last seen 4 days ago

EMBL Heidelberg

You can do this using dplyr and grouping by the GO value, e.g.

library(dplyr)
reshaped <- group_by(sorted, values) %>% summarise(genes = paste(ind, collapse = "\t"))
write.table(reshaped, file = "/tmp/tmp.txt", sep = "\t", quote = FALSE, row.names = FALSE, col.names = FALSE)

GO:2001300    ALOX15
GO:2001301    ALOX12    ALOX15
GO:2001302    ALOX12    ALOX15
GO:2001303    ALOX12    ALOX15
GO:2001304    ALOX12
GO:2001306    ALOX12
GO:2001311    ACP6

ADD COMMENT • link 9.1 years ago Mike Smith ★ 6.6k

0

Entering edit mode

Thank you, that's a great solution. Is it possible to easily squeeze a column with term description between the GO value and the genes? From another object called newdf

head(newdf)
     values       term                                                        
[1,] "GO:0048518" "positive regulation of biological process"                 
[2,] "GO:0048519" "negative regulation of biological process"                 
[3,] "GO:0019222" "regulation of metabolic process"                           
[4,] "GO:0009605" "response to external stimulus"                             
[5,] "GO:0048522" "positive regulation of cellular process"                   
[6,] "GO:0051173" "positive regulation of nitrogen compound metabolic process"

ADD REPLY • link 9.1 years ago b.nota ▴ 370

1

Entering edit mode

Sure. Assuming you don't really have the quotes around the GO terms in your new data.frame, you can merge them using left_join()

added.terms <- left_join(reshaped, newdf, by = "values")
## this won't have to columns in the order you'd like, so change them
added.terms <- added.terms[,c(1,3,2)]

If you do have the quotes then I think you'll need to strip them before the matching will work.

ADD REPLY • link 9.1 years ago Mike Smith ★ 6.6k

0

Entering edit mode

I first got an error with the suggestion that I add copy=T, and that worked fine!

Thanks for your help!

ADD REPLY • link 9.1 years ago b.nota ▴ 370

0

Entering edit mode

Hi, I was wondering where did you get that "term" column for each of the GO values? Thank you!

ADD REPLY • link 9.0 years ago javier.mendoza • 0

0

Entering edit mode

The term column comes from the output you get from the goseq function itself, but only with later versions of goseq.

ADD REPLY • link 9.0 years ago b.nota ▴ 370