Question: How to automate a process on the command line for 1500 genes?
0
gravatar for ChIP-Tease
4.2 years ago by
ChIP-Tease0
Germany
ChIP-Tease0 wrote:

Hello everybody,

 

I have a problem i need some help for.

When I run a program on the command line for a single gene, it works fine:

program.sh genename_1 ../../../unchanged_file.txt ../../aa_bb_cc_dd_genename_1.gff genename_1.bed

 

But I need to run it about 1500 times on the command line and I don't know how to automate it.

 

I have a folder. Within this folder, there are different files with different endings. I only want to analyze the files which end with .gff.

All the .gff files are the same at the beginning aa_bb_cc_dd_ then the genename comes and finally an underscore and a number like here:

aa_bb_cc_dd_Genename_1.gff

aa_bb_cc_dd_Genename_2.gff

aa_bb_cc_dd_Genename_3.gff

aa_bb_cc_dd_otherGenename_1.gff

aa_bb_cc_dd_otherGenename_2.gff

 

There are more than 1500 combinations.

The code which does the job for one file looks like this:

 

program.sh genename_1 ../../../unchanged_file.txt ../../aa_bb_cc_dd_genename_1.gff genename_1.bed

 

Is there any way to do this for all 1500 .gff files in a few steps. I'm very sorry i cannot suggest anything, but i don't have too much experience with the command line. I could do something like this in R, but this doesn't help here a lot.

 

Thanks a lot, Alex

command line automation loop • 638 views
ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by ChIP-Tease0
Answer: How to automate a process on the command line for 1500 genes?
1
gravatar for Jim Hester
4.2 years ago by
Jim Hester10
United States
Jim Hester10 wrote:

Note you can do this using R as well, the system() function can call any 'command line' program.  Remove the echo from the examples to actually call program.sh

gffs <- list.files(pattern="gff$", full.names = TRUE)

lapply(gffs, function(file) {

  gene <- gsub(".*aa_bb_cc_dd_(.*).gff$", "\\1", file)

  system(sprintf("echo program.sh %s ../../../unchanged_file.txt %s %s.bed", gene, file, gene))

})

But you can of course do a similar thing with bash

for file in *gff;do

  temp=${file##aa_bb_cc_dd_}

  gene=${temp%.gff}

  echo program.sh $file ../../../unchanged_file.txt $gene $file $gene.bed

done
ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by Jim Hester10

Thanks a lot, i didn't know that R can call command line programs. This will be very usefull for me.

I guess i will try both ways.

Thanks a lot again!

ADD REPLYlink written 4.2 years ago by ChIP-Tease0
Answer: How to automate a process on the command line for 1500 genes?
0
gravatar for tangming2005
4.2 years ago by
tangming2005140
United States
tangming2005140 wrote:

something like this:

for file in *gtf

do

command $file

done
ADD COMMENTlink written 4.2 years ago by tangming2005140

 Thank you!

ADD REPLYlink written 4.2 years ago by ChIP-Tease0
Answer: How to automate a process on the command line for 1500 genes?
0
gravatar for ChIP-Tease
4.2 years ago by
ChIP-Tease0
Germany
ChIP-Tease0 wrote:

Hello everybody,

i cannot really make this suggestion work.

for file in *gff;do

  gene=${${file##aa_bb_cc_dd_}%.gff}

  echo program.sh $file ../../../unchanged_file.txt $gene $file $gene.bed

done 

The problem seems to be this part:

gene=${${file##aa_bb_cc_dd_}%.gff}

I understand that the $ sign excludes what is written in the brackets from the output.

Meaning

${file##aa_bb_cc_dd_} on aa_bb_cc_dd_example_gene.gff will give me example_gene.gff

and

gene=${${file##aa_bb_cc_dd_}%.gff} on aa_bb_cc_dd_example_gene.gff should give me example_gene

But it tells me "bad substituation"

This probably means that some sign is wrong, but i cannot figure out what is wrong and i don't really know what to google for to find the rules to variable definition. Maybe someone has a link or knows what is wrong.

Thanks a lot, Alex

ADD COMMENTlink written 4.2 years ago by ChIP-Tease0

I forgot bash doesn't support nested substitutions like zsh. In bash you have to do it in two steps,

for file in *gff;do

  temp=${file##aa_bb_cc_dd_}

  gene=${temp%.gff}

  echo program.sh $file ../../../unchanged_file.txt $gene $file $gene.bed

done

I have updated my answer appropriately, if it answers your question please mark it as accepted.

 

ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by Jim Hester10

Hello Jim,

Thanks a lot!

I accepted it. I didn't know so far that i can accept answers. Also thanks for that hint

ADD REPLYlink written 4.2 years ago by ChIP-Tease0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 177 users visited in the last hour