Question

iupac2meme function missing in universalmotif package

0

Entering edit mode

lessismore ▴ 20

@lessismore

Last seen 19 months ago

Italy

Dear all i'd like to use iupac2meme (https://meme-suite.org/meme/doc/iupac2meme.html). So far i've been using universalmotifs() which has basicall all functions of MEME suite. Apparently except iupac2meme, which is pretty useful. I've tried to use it in bash and i get an output in MEME4 format which cannot be used with other motifs in MEME5 format with the compare_motifs() analysis, which is what i want to accomplish. Thanks in advance for any help.

universalmotif • 1.4k views

ADD COMMENT • link updated 3.0 years ago by snystrom ▴ 10 • written 3.2 years ago by lessismore ▴ 20

score 1 · Accepted Answer · 2021-03-05

1

Entering edit mode

btremblay ▴ 60

@btremblay-15941

Last seen 2.2 years ago

Spain

Sorry for the delay, I thought I had alerts set up. I guess not.

You can create a motif from a consensus sequence using create_motif(). For example:

library(universalmotif)

motif <- create_motif("AAAGNBGG")
motif
#>        Motif name:   motif
#>          Alphabet:   DNA
#>              Type:   PPM
#>           Strands:   +-
#>          Total IC:   12.42
#>       Pseudocount:   0
#>         Consensus:   AAAGNBGG
#>
#>   A A A G    N    B G G
#> A 1 1 1 0 0.25 0.00 0 0
#> C 0 0 0 0 0.25 0.34 0 0
#> G 0 0 0 1 0.25 0.33 1 1
#> T 0 0 0 0 0.25 0.33 0 0

write_meme(motif, "motif.meme")

ADD COMMENT • link 3.1 years ago btremblay ▴ 60

0

Entering edit mode

Is there any way to import also consensus written like this e.g. "AAAGNBGG[AT]". Thanks in advance! Amazing package.

ADD REPLY • link 3.1 years ago lessismore ▴ 20

0

Entering edit mode

Is there any way to import also consensus written like this e.g. "AAAGNBGG[AT]". Thanks in advance! Amazing package.

ADD REPLY • link 3.1 years ago lessismore ▴ 20

1

Entering edit mode

Unfortunately the package does not support regular expressions for consensus sequence input in create_motif(). However in this case, [AT] is the same as W, so you could simply do create_motif("AAAGNBGGW") to achieve the same result.

ADD REPLY • link 3.1 years ago btremblay ▴ 60

0

Entering edit mode

Hi Benjamin, it is still unclear to me if scan_sequences() uses a background to correct the results as in FIMO.

I am trying to replicate an analysis im doing with FIMO on core promoters (DNA seqs) using curated motifs from PlantTFDB. With universalmotifs im getting three times more motifs mapped and i would like to understand how to understand why this happens. Here following the code:

# fimo
fimo --bfile background_model.txt --qv-thresh --max-stored-scores 20000000 --thresh 5e-2 motifs.meme core_promoters.fa

# universalmotif
results <- 
  scan_sequences(motifs = motifs, sequences = Promoters, threshold = 0.8,
  threshold.type = "logodds", RC = TRUE, use.freq = 1, verbose = 0,
  nthreads = 1, motif_pvalue.k = 8, use.gaps = TRUE,
  allow.nonfinite = FALSE, warn.NA = TRUE, calc.pvals = TRUE) %>%
  universalmotif::as.data.frame()

Sorry if i post the question here. Also, it would be cool if you organize a webinar on such a great tool. Just an idea!

ADD REPLY • link 3.1 years ago lessismore ▴ 20

1

Entering edit mode

No worries, happy to help.

scan_sequences() works by converting your motifs, whatever format they be in, into position weight matrices (PWMs) (also known as log-odds matrices) which are constructed taking into account the background frequencies (see my "Introduction to sequence motifs" vignette). So yes, the scanning is background-corrected.

As for why Fimo is getting more hits: I'm not entirely sure. The Fimo documentation says it uses a "dynamic programming algorithm to convert log-odds scores into p-values", which is kind of vague (whereas scan_sequences() only uses log-odds scores; you can generate a scanning log-odds threshold from a p-value, and afterwards calculate a p-value from a motif hit log-odds score in scan_sequences(), but I'm not sure if that's the same thing). I would have to look through the Fimo source code to be sure.

The resulting object from scan_sequences() is rather comprehensive, so you can check the motif hits for yourself if you are curious as to how different thresholds affect motif hit sensitivity, and compare the results at different thresholds by changing the threshold option or by filtering afterwards using the score column (or the calc.pvals P-values as I see you are also calculating). It is not difficult to increase the stringency of scan_sequences() as you wish, so perhaps with the right combination you will be able to replicate your Fimo analysis.

Maybe that was not a very helpful answer, but ultimately I am limited by my meagre understanding of Fimo.

ADD REPLY • link 3.1 years ago btremblay ▴ 60

0

Entering edit mode

Thank you very much! You indeed clarified to me some important points. Ideally that could be cool to compare the scan_sequences() and FIMO outputs for TFs tested with Chip-seq or DAP-seq data.

ADD REPLY • link 3.1 years ago lessismore ▴ 20

0

Entering edit mode

You could try using runFimo from memes to compare them. My guess as to why FIMO detects more matches in your example is you're setting the --thresh argument to 0.05 which I would consider to be rather liberal unless you are scanning a small number of sequences.

ADD REPLY • link 3.0 years ago snystrom ▴ 10