MotifDb missing some FlyFactor Motifs
1
1
Entering edit mode
snystrom ▴ 10
@snystrom-12530
Last seen 2.2 years ago
United States

Hi,

I noticed that Motifdb is missing a few entries for the FlyFactor Survey data. I first noticed the issue when comparing the motifs shipped with the MEME Suite vs those inside Motifdb. There are two groups of motifs which are missing: those for the same protein which have alternate entries present in MotifDb (ex br-PLSOLEXA vs br-PLSANGER_5, indeed most of these TFs in this category are missing their SOLEXA entries) or motfs for TFs which are never found in MotifDb, although they have entries in Fly Factor and are assigned to extant Drosophila genes (ex. chinmo).

I generated a .meme format file of the motifs from TFs with 0 entries in Motifdb, and have provided a reproducible script which grabs entries for both types of missing values (found at this gist).

If there is a rationale for why these values should be excluded, I'd be curious to hear it. Otherwise, I'm happy to help curate metadata for these missing entries if necessary to get them included.

All the best, -Spencer

> devtools::session_info()
─ Session info ─────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.6.2 (2019-12-12)
 os       Debian GNU/Linux 10 (buster)
 system   x86_64, linux-gnu           
 ui       RStudio                     
 language (EN)                        
 collate  C.UTF-8                     
 ctype    C.UTF-8                     
 tz       Etc/UTC                     
 date     2020-05-25                  

─ Packages ─────────────────────────────────────────────────────────────────────────
 package              * version     date       lib
 ape                    5.3         2019-03-17 [1]
 assertthat             0.2.1       2019-03-21 [1]
 backports              1.1.5       2019-10-02 [1]
 bibtex                 0.4.2.2     2020-01-02 [1]
 Biobase                2.46.0      2019-10-29 [2]
 BiocGenerics           0.32.0      2019-10-29 [2]
 BiocManager            1.30.10     2019-11-16 [2]
 BiocParallel           1.20.1      2019-12-21 [2]
 Biostrings             2.54.0      2019-10-29 [2]
 bitops                 1.0-6       2013-08-17 [2]
 callr                  3.4.0       2019-12-09 [1]
 cli                    2.0.2       2020-02-28 [1]
 colorspace             1.4-1       2019-03-18 [1]
 crayon                 1.3.4       2017-09-16 [1]
 data.table             1.12.8      2019-12-09 [2]
 DelayedArray           0.12.2      2020-01-06 [2]
 desc                   1.2.0       2018-05-01 [2]
 devtools               2.2.1       2019-09-24 [2]
 digest                 0.6.25      2020-02-23 [1]
 dotargs                0.0.9000    2020-04-28 [1]
 dplyr                * 0.8.99.9002 2020-04-30 [1]
 dremeR               * 0.0.1.9001  2020-05-25 [1]
 ellipsis               0.3.0       2019-09-20 [1]
 fansi                  0.4.1       2020-01-08 [1]
 fs                     1.3.1       2019-05-06 [1]
 gbRd                   0.4-11      2012-10-01 [1]
 generics               0.0.2       2018-11-29 [1]
 GenomeInfoDb           1.22.0      2019-10-29 [2]
 GenomeInfoDbData       1.2.2       2020-02-19 [2]
 GenomicAlignments      1.22.1      2019-11-12 [2]
 GenomicRanges          1.38.0      2019-10-29 [2]
 ggplot2                3.2.1       2019-08-10 [1]
 ggseqlogo              0.1         2020-03-10 [1]
 ggtree                 2.0.2       2020-03-16 [1]
 glue                   1.4.0       2020-04-03 [1]
 gtable                 0.3.0       2019-03-25 [1]
 IRanges                2.20.2      2020-01-13 [2]
 jsonlite               1.6         2018-12-07 [1]
 lattice                0.20-38     2018-11-04 [3]
 lazyeval               0.2.2       2019-03-15 [1]
 lifecycle              0.2.0       2020-03-06 [1]
 magrittr             * 1.5         2014-11-22 [1]
 MASS                   7.3-51.4    2019-03-31 [3]
 Matrix                 1.2-18      2019-11-27 [3]
 matrixStats            0.55.0      2019-09-07 [2]
 memoise                1.1.0       2017-04-21 [2]
 MotifDb                1.28.0      2019-10-29 [1]
 munsell                0.5.0       2018-06-12 [1]
 nlme                   3.1-142     2019-11-07 [3]
 pillar                 1.4.3       2019-12-20 [1]
 pkgbuild               1.0.6       2019-10-09 [2]
 pkgconfig              2.0.3       2019-09-22 [1]
 pkgload                1.0.2       2018-10-29 [2]
 prettyunits            1.0.2       2015-07-13 [1]
 processx               3.4.2       2020-02-09 [1]
 ps                     1.3.0       2018-12-21 [1]
 purrr                  0.3.4       2020-04-17 [1]
 R6                     2.4.1       2019-11-12 [1]
 Rcpp                   1.0.3       2019-11-08 [1]
 RCurl                  1.98-1.1    2020-01-19 [2]
 Rdpack                 0.11-1      2019-12-14 [1]
 remotes                2.1.0       2019-06-24 [2]
 rlang                  0.4.6       2020-05-02 [1]
 rprojroot              1.3-2       2018-01-03 [2]
 Rsamtools              2.2.2       2020-02-11 [2]
 rstudioapi             0.10        2019-03-19 [1]
 rtracklayer            1.46.0      2019-10-29 [2]
 rvcheck                0.1.7       2019-11-29 [2]
 S4Vectors              0.24.3      2020-01-18 [2]
 scales                 1.1.0       2019-11-18 [1]
 sessioninfo            1.1.1       2018-11-05 [2]
 splitstackshape        1.4.8       2019-04-21 [1]
 stringi                1.4.3       2019-03-12 [1]
 stringr                1.4.0       2019-02-10 [1]
 SummarizedExperiment   1.16.1      2019-12-19 [2]
 testthat               2.3.1       2019-12-01 [2]
 tibble                 3.0.1       2020-04-20 [1]
 tidyr                  1.0.2       2020-01-24 [2]
 tidyselect             1.0.0       2020-01-27 [1]
 tidytree               0.3.2       2020-03-12 [1]
 treeio                 1.10.0      2019-10-29 [1]
 universalmotif       * 1.7.0       2020-04-29 [1]
 usethis                1.5.1       2019-07-04 [2]
 vctrs                  0.2.99.9011 2020-04-30 [1]
 withr                  2.1.2       2018-03-15 [1]
 XML                    3.99-0.3    2020-01-20 [2]
 XVector                0.26.0      2019-10-29 [2]
 yaml                   2.2.0       2018-07-25 [1]
 zlibbioc               1.32.0      2019-10-29 [2]
 source                              
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 Bioconductor                        
 Bioconductor                        
 CRAN (R 3.6.2)                      
 Bioconductor                        
 Bioconductor                        
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 Bioconductor                        
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 local                               
 Github (tidyverse/dplyr@d353ff1)    
 local                               
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 Bioconductor                        
 Bioconductor                        
 Bioconductor                        
 Bioconductor                        
 CRAN (R 3.6.2)                      
 Github (omarwagih/ggseqlogo@4adc8f2)
 Bioconductor                        
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 Bioconductor                        
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 Bioconductor                        
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 Bioconductor                        
 CRAN (R 3.6.2)                      
 Bioconductor                        
 CRAN (R 3.6.2)                      
 Bioconductor                        
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 Bioconductor                        
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 Bioconductor                        
 Bioconductor                        
 CRAN (R 3.6.2)                      
 Github (r-lib/vctrs@b11ba67)        
 CRAN (R 3.6.2)                      
 CRAN (R 3.6.2)                      
 Bioconductor                        
 CRAN (R 3.6.2)                      
 Bioconductor                        

[1] /nas/longleaf/home/snystrom/R/x86_64-pc-linux-gnu-library/3.6
[2] /usr/local/lib/R/site-library
[3] /usr/local/lib/R/library
Motifdb • 2.2k views
ADD COMMENT
0
Entering edit mode

Hi Spencer,

Thanks for this report, and for offering a fix. Perhaps you could go further? If you clone the repo you will see that it contains

 inst/extdata/flyFactorSurvey.RData

which when loaded, you will see contains matrices and tbl.md. It would be a big help if you could evolve both of these files to add the missing data, along with explanatory notes, then submit a pull request.

Possible?

  • Paul
ADD REPLY
0
Entering edit mode

Hi Paul,

Happy to do that. Is the repo hosted on a publicly available version control page? I can't find it under your github account or the Bioconductor github. I can get the source from the .tar.gz from the Bioconductor page, but it'd obviously be easier using a VCS platform.

-Spencer

ADD REPLY
0
Entering edit mode

Paul might have a more convenient way for posting issues etc, but worst-case is to

git clone https://git.bioconductor.org/packages/MotifDb

and provide a diff or similar (https://stackoverflow.com/a/15438863/547331, but this is just a naive google)

ADD REPLY
0
Entering edit mode

Hi Spencer,

Martin’s first proposal - let’s try that. Sorry I did not anticipate this.

I use (and just updated) an alternate home for MotifDb here:

https://github.com/PriceLab/MotifDb

If you could submit a PR against that repo, I’ll then be sure to echo it up to the bioc master (devel) repo.

Thanks for helping out with this.

  • Paul
ADD REPLY
0
Entering edit mode

Sounds good. I've got a clone of your version working.

By the way, another issue with these entries is that the Flybase gene numbers (FBgn) are out of date. Unfortunately, FBgn's are not permanent identifiers, yet they are used in FlyFactor to reference specific genes which is of course why this issue exists. FlyBase has a nice utility for updating these entries to current values which may help grabbing the ENTREZ ID for genes where it's missing. If you'd like I can do that as well.

ADD REPLY
0
Entering edit mode

Yes, please! Any and all improvements are welcome.

ADD REPLY
0
Entering edit mode

Spencer,

It's been a while, 19 months I see. Any chance you were able to update from FlyFactor, so that I can update MotifDb?

  • Paul
ADD REPLY
0
Entering edit mode

Hi everyone, I am going through a similar problem where the Fly motif I'm interested in is not present in MotifDB however it is available on Flybase and has Flybase id (FBgn....). The Github repo is not updated with flyFactorSurvey.RData. The file is 8 years old. Where can I find the updated version or is there a way to include specific motifs in MotifDB?

Thank you.

Regards,

Gunjan

ADD REPLY
0
Entering edit mode

Gunjan,

I just pinged Specner (aka snystrom) to see if he found the time to do the update from FlyFactor we hope for. Let's see what he says, and then make a plan.

  • Paul
ADD REPLY
0
Entering edit mode

Thank you. I really appreciate the quick response.

ADD REPLY
0
Entering edit mode

We have not heard back from Spencer unfortunately. Gunjan - would you be willing to work with me to update the flybase data in MotifDb?

ADD REPLY
0
Entering edit mode

Sure, I could give it a try. Can you guide me regarding it.

ADD REPLY
0
Entering edit mode
Paul Shannon ▴ 470
@paul-shannon-5944
Last seen 22 months ago
United States

Hi Gunjan,

So sorry for my late reply. I did not realize that no email is generated by "ADD REPLY", and I did not think to check back in.

Here is a good place to start. And I will help if you get stuck, or have questions, or if it becomes appropriate for me to do the bulk of the work myself.

Take a look at MotifDb/inst/scripts/import/flyFactorSurvey/import.R. The main function is called simply "run" - it calls 10 other functions also in import.R

Don't spend much time on this - but please give it a try just in case it goes smoothly. Email me directly, if you will, at

paul.thurmond.shannon@gmail.com

or better yet: file an issue here: https://github.com/PriceLab/MotifDb

ADD COMMENT
0
Entering edit mode

Gunjan and I have been in email contact. It may be that the fly TFs missing from MotifDb do not yet have motifs.

For the future I will do my best to update MotifDb with new PWMs and metadata anyone provides. I can be reached at paul.thurmond.shannon@gmail, or here on bioc support (somewhat less reliably) if you have such data.

ADD REPLY
1
Entering edit mode

JASPAR2022 just added to MotifDb, version 1.37.2. This jaspar update incorporates some (all?) of the motifs in FlyFactorSurvey. The original motivating question from Gunjan - the motif for fly Elba1 - is now available in MotifDb as

 Dmelanogaster-jaspar2022-Elba1-MA1837.1 

FWIW, here is the query:

> query(MotifDb, "elba1")
MotifDb object of length 1
| Created from downloaded public sources, last update: 2022-Mar-04
| 1 position frequency matrices from 1 source:
|         jaspar2022:    1
| 1 organism/s
|      Dmelanogaster:    1
Dmelanogaster-jaspar2022-Elba1-MA1837.1 
ADD REPLY
0
Entering edit mode

Thanks, Paul for updating MotifDb with JASPAR2022. It does make a difference. I will try to work more on the missing motifs and get in touch with you once I have their PFMS and metadata reads.

Thanks again for such a quick response and update!

ADD REPLY

Login before adding your answer.

Traffic: 616 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6