how to change file format

0

Entering edit mode

weinong han ▴ 270

@weinong-han-1250

Last seen 11.5 years ago

An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20050725/ 48f382d4/attachment.pl

• 1.1k views

ADD COMMENT • link updated 20.6 years ago by Adaikalavan Ramasamy ★ 1.8k • written 20.6 years ago by weinong han ▴ 270

0

Entering edit mode

Uri David Akavia ▴ 80

@uri-david-akavia-1277

Last seen 11.5 years ago

If you have a UNIX system you can use AWK. Assuming that the original file (ORIGINAL) is seperated by tabs, I would use something like (in one line) cat ORIGINAL | awk -F"\t" '{print $1"\t"$2" - "$3"\t"$4"\t"$5"\t"$6"\t"$7"\t"$8"\t"$9"\t"$10}' If the original file is seperated by something else, say commas, replace the F"\t" with the appropriate seperator (F"," and so forth). Or you could try using something like EXCEL. I'm not sure R would be very useful, since I believe it would have to read the entire file into memory, which might be slow. Yours, Uri David Akavia weinong han wrote: > Dear All, > > My question seems not to be fit for the mail list, however, I really need your help. Crouching tigers and Hidden dragons are There! > > Now ,I have the file format including 10 headers(gene, name, description, arry1,array2...array7) > Gene Name Descriptin Array 1 Array 2 Array 3 Array 4 Array 5 Array 6 Array 7 > Gene 1 Name 1 Description 1 0.2 -0.1 -1.1 0.4 -4 -2 0.2 > Gene 2 Name 2 Description 2 2.3 2.1 -3 1.1 1.2 -1.6 0.1 > Gene 3 Name 3 Description 3 0.1 1.6 1.2 1.5 2.7 0.4 -0.4 > Gene 4 Name 4 Description 4 0.3 -1.5 -1.7 0.2 0.4 2 -2.1 > Gene 5 Name 5 Description 5 1.7 2.3 2.3 2.3 3 -2 2.1 > Gene 6 Name 6 Description 6 0.2 4 4 4 0.2 -3 -4 > Gene 7 Name 7 Description 7 -0.3 1.5 1.5 1.5 -0.2 1.7 3 > Gene 8 Name 8 Description 8 1.4 -0.6 -1.1 -0.3 -3 -3 1.4 > > I want to get the following file format: > > > Gene Name Array 1 Array 2 Array 3 Array 4 Array 5 Array 6 Array 7Gene 1 Name 1 - Description 1 0.2 -0.1 -1.1 0.4 -4 -2 0.2Gene 2 Name 2 - Description 2 2.3 2.1 -3 1.1 1.2 -1.6 0.1Gene 3 Name 3 - Description 3 0.1 1.6 1.2 1.5 2.7 0.4 -0.4Gene 4 Name 4 - Description 4 0.3 -1.5 -1.7 0.2 0.4 2 -2.1Gene 5 Name 5 - Description 5 1.7 2.3 2.3 2.3 3 -2 2.1Gene 6 Name 6 - Description 6 0.2 4 4 4 0.2 -3 -4Gene 7 Name 7 - Description 7 -0.3 1.5 1.5 1.5 -0.2 1.7 3Gene 8 Name 8 - Description 8 1.4 -0.6 -1.1 -0.3 -3 -3 1.4 > > in the above file format,The first row is a header row, where the names of the > > arrays/experiments are specified from column 3 and on. The second row and on specify > > expression data for each gene, where the first column is the unique identifier of each gene, > > the second column specifies the name and the description of the gene, where the name > > and description are separated by " - " (the surrounding spaces are important), and column 3 > > and on specify the expression data for the gene across all experiments. > > thanks much for your help in advance > > Any suggestions and advice will be much appreicated. > > > > Best Regards > > Han Weinong > hanweinong at yahoo.com > > __________________________________________________ > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > >

ADD COMMENT • link 20.6 years ago Uri David Akavia ▴ 80

0

Entering edit mode

Adaikalavan Ramasamy ★ 1.8k

@adaikalavan-ramasamy-675

Last seen 11.5 years ago

If I understand your question, this is probably what you want. df <- read.delim( file="lala.txt", row.names=NULL ) This will read in a tab delimited file. If your file is comma separated values or other formats see help(read.csv) or help(read.table). At this point, R will automatically assign rownames from 1,2,...,8 but we can ignore this. new <- paste( df[ , "Name"], df[ , "Description"], sep=" - ") df <- cbind( df[ , -c(2,3)], "Name - Description"=new ) write.table( df, file="modified_lala.txt", sep="\t", quote=FALSE, row.names=FALSE ) Hopefully this should do the trick. If it does not then try changing quote=FALSE or some other parameters. At this point I would strongly you read help(subset) and the Introduction to R (http://cran.r-project.org/doc/manuals/R-intro.html). Regards, Adai On Mon, 2005-07-25 at 22:54 -0700, weinong han wrote: > Dear All, > > My question seems not to be fit for the mail list, however, I really need your help. Crouching tigers and Hidden dragons are There! > > Now ,I have the file format including 10 headers(gene, name, description, arry1,array2...array7) > Gene Name Descriptin Array 1 Array 2 Array 3 Array 4 Array 5 Array 6 Array 7 > Gene 1 Name 1 Description 1 0.2 -0.1 -1.1 0.4 -4 -2 0.2 > Gene 2 Name 2 Description 2 2.3 2.1 -3 1.1 1.2 -1.6 0.1 > Gene 3 Name 3 Description 3 0.1 1.6 1.2 1.5 2.7 0.4 -0.4 > Gene 4 Name 4 Description 4 0.3 -1.5 -1.7 0.2 0.4 2 -2.1 > Gene 5 Name 5 Description 5 1.7 2.3 2.3 2.3 3 -2 2.1 > Gene 6 Name 6 Description 6 0.2 4 4 4 0.2 -3 -4 > Gene 7 Name 7 Description 7 -0.3 1.5 1.5 1.5 -0.2 1.7 3 > Gene 8 Name 8 Description 8 1.4 -0.6 -1.1 -0.3 -3 -3 1.4 > > I want to get the following file format: > > > Gene Name Array 1 Array 2 Array 3 Array 4 Array 5 Array 6 Array 7Gene 1 Name 1 - Description 1 0.2 -0.1 -1.1 0.4 -4 -2 0.2Gene 2 Name 2 - Description 2 2.3 2.1 -3 1.1 1.2 -1.6 0.1Gene 3 Name 3 - Description 3 0.1 1.6 1.2 1.5 2.7 0.4 -0.4Gene 4 Name 4 - Description 4 0.3 -1.5 -1.7 0.2 0.4 2 -2.1Gene 5 Name 5 - Description 5 1.7 2.3 2.3 2.3 3 -2 2.1Gene 6 Name 6 - Description 6 0.2 4 4 4 0.2 -3 -4Gene 7 Name 7 - Description 7 -0.3 1.5 1.5 1.5 -0.2 1.7 3Gene 8 Name 8 - Description 8 1.4 -0.6 -1.1 -0.3 -3 -3 1.4 > > in the above file format,The first row is a header row, where the names of the > > arrays/experiments are specified from column 3 and on. The second row and on specify > > expression data for each gene, where the first column is the unique identifier of each gene, > > the second column specifies the name and the description of the gene, where the name > > and description are separated by " - " (the surrounding spaces are important), and column 3 > > and on specify the expression data for the gene across all experiments. > > thanks much for your help in advance > > Any suggestions and advice will be much appreicated. > > > > Best Regards > > Han Weinong > hanweinong at yahoo.com > > __________________________________________________ > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor >

ADD COMMENT • link 20.6 years ago Adaikalavan Ramasamy ★ 1.8k

Login before adding your answer.