rGADEM not giving desired motif consistently
1
0
Entering edit mode
vinod.acear ▴ 50
@vinodacear-8884
Last seen 2.6 years ago
India

Hi, i am trying to discover a desired motif in a set of 251 sequences but my results are not consistent. In some runs i get desired motif but in other runs it disappears. Now i am trying to find the motif with some motif as a seed in my DNAstringset object of sequences.

My seeded motif is given here is present in motif.txt file

A    0.4619    0.927    0.8053    0.9305    0.4262    0.6623    0.4405    0.8018    0.7588    0.8912    0.7517    0.8268    0.4834    0.6158
C    0.0148    0.0077    0.0291    0.0148    0.3046    0.0112    0.022    0.0327    0.0148    0.022    0.0148    0.0184    0.0828    0.0291
G    0.1114    0.0148    0.0077    0.0291    0.0935    0.1221    0.0184    0.1078    0.0685    0.0148    0.1543    0.0291    0.1114    0.14
T    0.4119    0.0506    0.1579    0.0255    0.1758    0.2044    0.5192    0.0577    0.1579    0.072    0.0792    0.1257    0.3224    0.2151

Please tell me possible command to get similar motif to seeded motif in DNAstringset.

1
Entering edit mode
@charles-joly-beauparlant-4777
Last seen 4.0 years ago

Hi Vinod,

rGADEM uses a genetic algorithm which can explain why you are getting different results with each run. Even if you give the algorithm a seeded motif, it will generate multiple new motifs by mutation and crossing-over each iteration. It's possible that the motif you are expecting does not produce the best score with the fasta sequences you are using and that in some run the rGADEM is able to find a better motif.

With genetic algorithms, it's a good idea to launch multiple runs to make sure the algorithm was not stuck in fitness peak.

Of course, it's not possible to know for sure without the original files and with the parameters used.

0
Entering edit mode

Hi Charles,

The GADEM() function has a seed argument, and, according to its man page, "when a seed is specified, the run results are deterministic". This is a good feature that all randomized algorithms in Bioconductor are expected to have in order to allow reproducible research. Are you sure the non-deterministic behavior observed by the OP is not a bug?

Thanks,

H.

0
Entering edit mode

Hi Hervé,

There are 2 types of seeds with the GADEM() function: the seed argument you mention that make the results deterministic and the Spwm param that let the user use a motif as a starting point for the genetic algorithm (the other option is to let the GADEM() function generate the starting motifs with the most frequent k-mers in the sequences). Based on the ininital question, I assumed Vinod was talking was talking about the Spwm param. If it's not the case, then it's clearly a bug as you said.

0
Entering edit mode

Yes Vinod is saying that he's using a seeded motif (and is showing the motif). Are you saying that when the user gives the algorithm a seeded motif then the algorithm is not deterministic anymore? Just to clarify, deterministic means that 2 runs with exactly the same input (in particular same seed and same Spwm args) will produce the same output.

H.

0
Entering edit mode

Hi Hervé,

What I meant is that if the user *only* provide a seeded motif through the Spwm arg, then it's not deterministic (a seeded motif but no seed). The seed arg should determine if the algorithm will be deterministic independently of the values of any other args.

In the case of the OP, I assumed the Spwm arg was used without the seed arg since the results were different after each run. But I could be wrong, and in that case it would be a bug like you mention in your first comment.

0
Entering edit mode

The seed argument has a default value of 1 which means that if the user doesn't supply it it will be set to 1. Are you saying that when seed=1 the algorithm is not determinitstic? Is the value 1 treated in a special way? When I look at the implementation of the GADEM() function, it doesn't seem so: what I see is that only if seed is set to NULL is the call to set.seed(seed) skipped. So it looks like the algorithm is not deterministic only when the user supplies seed=NULL. As a consequence, if the user *only* provides a seeded motif through the Spwm arg (i.e. a seeded motif but no seed) then the algorithm should be deterministic. Am I missing something?

H.