Motif enrichment analysis: Error in transfac format and background frequencies from BSGenome
1
0
Entering edit mode
deepti anand ▴ 50
@deepti-anand-6724
Last seen 7.7 years ago
Hi all, I am scanning a geneset for all the Mmusculus motifs and comparing their enrichment to genomic background. I am using MotifDb package to retrieve motifs and PWMEnrich for doing motif enrichment. I am getting error in the below code- 1). Get all motifs in Mmusculus from MotifDb in transfac format- In this step when exporting the motifs as TRANSFAC format I am getting error. Here are my codes: > motifs.denovo = query(MotifDb, 'Mmusculus') > export(motifs.denovo,con='MotifDBFile',format='transfac') Error in cat(list(...), file, sep, fill, labels, append) : argument 1 (type 'closure') cannot be handled by 'cat' 2). Convert count matrices into PWMs: In this step the error is in getting the background frequencies from Mmusculus BSgenome. Here are my code: > library(BSgenome.Mmusculus.UCSC.mm10) > genome = BSgenome.Mmusculus.UCSC.mm10 > genomic.acgt = getBackgroundFrequencies("BSgenome.Mmusculus.UCSC.mm10") Error in pickGenome(organism) : Please pick one of the valid organisms: "dm3" or provide a BSgenome object of the target genome. I would appreciate any help Dips [[alternative HTML version deleted]]
BSgenome convert BSgenome PWMEnrich MotifDb • 1.8k views
0
Entering edit mode
@robert-stojnic-4721
Last seen 7.7 years ago
Dear Deepti, If you want to use the mouse MotifDB motifs you can retrieve them in the correct format for PWMEnrich here: http://bioconductor.org/packages/2.14/data/experiment/html/PWMEnrich.M musculus.background.html Cheers, Robert On 07/09/14 16:47, deepti anand wrote: > Hi all, > I am scanning a geneset for all the Mmusculus motifs and comparing their enrichment to genomic background. I am using MotifDb package to retrieve motifs and PWMEnrich for doing motif enrichment. I am getting error in the below code- > > 1). Get all motifs in Mmusculus from MotifDb in transfac format- > In this step when exporting the motifs as TRANSFAC format I am getting error. Here are my codes: > > >> motifs.denovo = query(MotifDb, 'Mmusculus') >> export(motifs.denovo,con='MotifDBFile',format='transfac') > Error in cat(list(...), file, sep, fill, labels, append) : > argument 1 (type 'closure') cannot be handled by 'cat' > > > > 2). Convert count matrices into PWMs: In this step the error is in getting the background frequencies from Mmusculus BSgenome. Here are my code: > > >> library(BSgenome.Mmusculus.UCSC.mm10) >> genome = BSgenome.Mmusculus.UCSC.mm10 >> genomic.acgt = getBackgroundFrequencies("BSgenome.Mmusculus.UCSC.mm10") > Error in pickGenome(organism) : > Please pick one of the valid organisms: "dm3" or provide a BSgenome object of the target genome. > > > I would appreciate any help > > > Dips > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor >
0
Entering edit mode
Hi Roberts, Thank you for suggestion. The backgrounds available in PWMEnrich for mouse are in mm9 assembly (current is mm10). Also, I found that it has 329 PWMs which is less than current MotifDb (528 motifs). That is why I want to create a background with the current mouse genome and use 528 motifs for enrichment analysis in my gene list Could you please tell me how can I export the motifs in 'transfac ' format and get the background frequencies from 'BSgenome.Mmusculus.UCSC.mm10'. I would appreciate it. Dips > Date: Sun, 7 Sep 2014 19:38:43 +0100 > From: rainmansr at gmail.com > To: anand.deepti at outlook.com > CC: bioconductor at r-project.org > Subject: Re: [BioC] Motif enrichment analysis: Error in transfac format and background frequencies from BSGenome > > > Dear Deepti, > > If you want to use the mouse MotifDB motifs you can retrieve them in the > correct format for PWMEnrich here: > > http://bioconductor.org/packages/2.14/data/experiment/html/PWMEnrich .Mmusculus.background.html > > Cheers, Robert > > On 07/09/14 16:47, deepti anand wrote: > > Hi all, > > I am scanning a geneset for all the Mmusculus motifs and comparing their enrichment to genomic background. I am using MotifDb package to retrieve motifs and PWMEnrich for doing motif enrichment. I am getting error in the below code- > > > > 1). Get all motifs in Mmusculus from MotifDb in transfac format- > > In this step when exporting the motifs as TRANSFAC format I am getting error. Here are my codes: > > > > > >> motifs.denovo = query(MotifDb, 'Mmusculus') > >> export(motifs.denovo,con='MotifDBFile',format='transfac') > > Error in cat(list(...), file, sep, fill, labels, append) : > > argument 1 (type 'closure') cannot be handled by 'cat' > > > > > > > > 2). Convert count matrices into PWMs: In this step the error is in getting the background frequencies from Mmusculus BSgenome. Here are my code: > > > > > >> library(BSgenome.Mmusculus.UCSC.mm10) > >> genome = BSgenome.Mmusculus.UCSC.mm10 > >> genomic.acgt = getBackgroundFrequencies("BSgenome.Mmusculus.UCSC.mm10") > > Error in pickGenome(organism) : > > Please pick one of the valid organisms: "dm3" or provide a BSgenome object of the target genome. > > > > > > I would appreciate any help > > > > > > Dips > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > [[alternative HTML version deleted]]
0
Entering edit mode
Hi Dips, If you haven't already done so, please first update to the latest version of PWMEnrich (in release this is 3.6.1). I would recommend converting the MotifDb motifs directly into PFMs that PWMEnrich expects. The only issue here is that MotifDb motifs come from different sources and are not always in the same format (i.e. sometimes they are probabilites, sometimes count matrices). Here is some example code to extract the motifs from MotifDb: # extract mouse motifs d = values(MotifDb) dm.sel = which(d$organism == "Mmusculus") # output list of motifs motifs = list() for(i in dm.sel){ seq.count = d$sequenceCount[i] ifis.na(seq.count)) seq.count = 100 motifs[[length(motifs)+1]] = apply(round(MotifDb[[i]] * seq.count), 1:2, as.integer) } motif.names = d$geneSymbol[dm.sel] motif.ids = d$providerName[dm.sel] motif.names[is.na(motif.names)] = motif.ids[is.na(motif.names)] names(motifs) = motif.ids # get A,C,G,T counts prior = getBackgroundFrequencies("mm9") # convert to PWMenrich PWM format pwms = PFMtoPWM(motifs, id=motif.ids, name=motif.names, prior.params=prior) # create background distributions bg = makeBackground(pwms, "mm9") The last line is using the mm9 promoters that are built-in into PWMEnrich as genomic background. If you want to use a different set of promoter sequences (i.e. mm10), you will have to extract them yourself into a DNAStringSet object and pass them like this: bg = makeBackground(pwms, bg.seq=your_DNAStringSet_object) Cheers, Robert On 07/09/14 23:17, deepti anand wrote: > Hi Roberts, > > Thank you for suggestion. The backgrounds available in PWMEnrich for > mouse are in mm9 assembly (current is mm10). Also, I found that it has > 329 PWMs which is less than current MotifDb (528 motifs). That is why > I want to create a background with the current mouse genome and use > 528 motifs for enrichment analysis in my gene list Could you please > tell me how can I export the motifs in 'transfac ' format and get the > background frequencies from 'BSgenome.Mmusculus.UCSC.mm10'. > > I would appreciate it. > > Dips > > > > Date: Sun, 7 Sep 2014 19:38:43 +0100 > > From: rainmansr at gmail.com > > To: anand.deepti at outlook.com > > CC: bioconductor at r-project.org > > Subject: Re: [BioC] Motif enrichment analysis: Error in transfac > format and background frequencies from BSGenome > > > > > > Dear Deepti, > > > > If you want to use the mouse MotifDB motifs you can retrieve them in > the > > correct format for PWMEnrich here: > > > > > http://bioconductor.org/packages/2.14/data/experiment/html/PWMEnrich .Mmusculus.background.html > > > > Cheers, Robert > > > > On 07/09/14 16:47, deepti anand wrote: > > > Hi all, > > > I am scanning a geneset for all the Mmusculus motifs and comparing > their enrichment to genomic background. I am using MotifDb package to > retrieve motifs and PWMEnrich for doing motif enrichment. I am getting > error in the below code- > > > > > > 1). Get all motifs in Mmusculus from MotifDb in transfac format- > > > In this step when exporting the motifs as TRANSFAC format I am > getting error. Here are my codes: > > > > > > > > >> motifs.denovo = query(MotifDb, 'Mmusculus') > > >> export(motifs.denovo,con='MotifDBFile',format='transfac') > > > Error in cat(list(...), file, sep, fill, labels, append) : > > > argument 1 (type 'closure') cannot be handled by 'cat' > > > > > > > > > > > > 2). Convert count matrices into PWMs: In this step the error is in > getting the background frequencies from Mmusculus BSgenome. Here are > my code: > > > > > > > > >> library(BSgenome.Mmusculus.UCSC.mm10) > > >> genome = BSgenome.Mmusculus.UCSC.mm10 > > >> genomic.acgt = > getBackgroundFrequencies("BSgenome.Mmusculus.UCSC.mm10") > > > Error in pickGenome(organism) : > > > Please pick one of the valid organisms: "dm3" or provide a > BSgenome object of the target genome. > > > > > > > > > I would appreciate any help > > > > > > > > > Dips > > > [[alternative HTML version deleted]] > > > > > > _______________________________________________ > > > Bioconductor mailing list > > > Bioconductor at r-project.org > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > [[alternative HTML version deleted]]
0
Entering edit mode
Hi Robert, Thank you for example codes. I am able to extract all the 528 Mmusculus motifs from MotifDB by running the example codes you send. The code below gives me error when I try to get the A,C,G,T counts using getBackgroundFrequencies(). > prior = getBackgroundFrequencies("mm9")Error in pickGenome(organism) : Please pick one of the valid organisms: "dm3" or provide a BSgenome object of the target genome. I have updated version of PWMEnrich (3.6.1) installed. Could you please suggest me how to proceed with this error. I appreciate your help. -Dips-Date: Mon, 8 Sep 2014 18:22:28 +0100From: rainmansr at gmail.com To: anand.deepti at outlook.com CC: bioconductor at r-project.org Subject: Re: [BioC] Motif enrichment analysis: Error in transfac format and background frequencies from BSGenome Hi Dips, If you haven't already done so, please first update to the latest version of PWMEnrich (in release this is 3.6.1). I would recommend converting the MotifDb motifs directly into PFMs that PWMEnrich expects. The only issue here is that MotifDb motifs come from different sources and are not always in the same format (i.e. sometimes they are probabilites, sometimes count matrices). Here is some example code to extract the motifs from MotifDb: # extract mouse motifs d = values(MotifDb) dm.sel = which(d$organism == "Mmusculus") # output list of motifs motifs = list() for(i in dm.sel){ seq.count = d$sequenceCount[i] ifis.na(seq.count)) seq.count = 100 motifs[[length(motifs)+1]] = apply(round(MotifDb[[i]] * seq.count), 1:2, as.integer) } motif.names = d$geneSymbol[dm.sel] motif.ids = d$providerName[dm.sel] motif.names[is.na(motif.names)] = motif.ids[is.na(motif.names)] names(motifs) = motif.ids # get A,C,G,T counts prior = getBackgroundFrequencies("mm9") # convert to PWMenrich PWM format pwms = PFMtoPWM(motifs, id=motif.ids, name=motif.names, prior.params=prior) # create background distributions bg = makeBackground(pwms, "mm9") The last line is using the mm9 promoters that are built-in into PWMEnrich as genomic background. If you want to use a different set of promoter sequences (i.e. mm10), you will have to extract them yourself into a DNAStringSet object and pass them like this: bg = makeBackground(pwms, bg.seq=your_DNAStringSet_object) Cheers, Robert On 07/09/14 23:17, deepti anand wrote: Hi Roberts, Thank you for suggestion. The backgrounds available in PWMEnrich for mouse are in mm9 assembly (current is mm10). Also, I found that it has 329 PWMs which is less than current MotifDb (528 motifs). That is why I want to create a background with the current mouse genome and use 528 motifs for enrichment analysis in my gene list Could you please tell me how can I export the motifs in 'transfac ' format and get the background frequencies from 'BSgenome.Mmusculus.UCSC.mm10'. I would appreciate it. Dips > Date: Sun, 7 Sep 2014 19:38:43 +0100 > From: rainmansr at gmail.com > To: anand.deepti at outlook.com > CC: bioconductor at r-project.org > Subject: Re: [BioC] Motif enrichment analysis: Error in transfac format and background frequencies from BSGenome > > > Dear Deepti, > > If you want to use the mouse MotifDB motifs you can retrieve them in the > correct format for PWMEnrich here: > > http://bioconductor.org/packages/2.14/data/experiment/html/PWMEnrich.M musculus.background.html > > Cheers, Robert > > On 07/09/14 16:47, deepti anand wrote: > > Hi all, > > I am scanning a geneset for all the Mmusculus motifs and comparing their enrichment to genomic background. I am using MotifDb package to retrieve motifs and PWMEnrich for doing motif enrichment. I am getting error in the below code- > > > > 1). Get all motifs in Mmusculus from MotifDb in transfac format- > > In this step when exporting the motifs as TRANSFAC format I am getting error. Here are my codes: > > > > > >> motifs.denovo = query(MotifDb, 'Mmusculus') > >> export(motifs.denovo,con='MotifDBFile',format='transfac') > > Error in cat(list(...), file, sep, fill, labels, append) : > > argument 1 (type 'closure') cannot be handled by 'cat' > > > > > > > > 2). Convert count matrices into PWMs: In this step the error is in getting the background frequencies from Mmusculus BSgenome. Here are my code: > > > > > >> library(BSgenome.Mmusculus.UCSC.mm10) > >> genome = BSgenome.Mmusculus.UCSC.mm10 > >> genomic.acgt = getBackgroundFrequencies("BSgenome.Mmusculus.UCSC.mm10") > > Error in pickGenome(organism) : > > Please pick one of the valid organisms: "dm3" or provide a BSgenome object of the target genome. > > > > > > I would appreciate any help > > > > > > Dips > > [[alternative HTML version deleted]] > > > > _______________________________________________ > > Bioconductor mailing list > > Bioconductor at r-project.org > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > > > [[alternative HTML version deleted]]
0
Entering edit mode
Hi deepti, The problem is described in the error message. You have to select one of the valid organisms (and only "dm3" is valid, so "mm9" will not work), or pass a BSgenome object. You almost had it in the code you sent with your first email: > library(BSgenome.Mmusculus.UCSC.mm10) > genome = BSgenome.Mmusculus.UCSC.mm10 > genomic.acgt = getBackgroundFrequencies("BSgenome.Mmusculus.UCSC.mm10") You see, above you are not passing a BSgenome, but a character string. Try removing the quotes: genomic.acgt = getBackgroundFrequencies(BSgenome.Mmusculus.UCSC.mm10) or, since you assigned previously BSgenome.Mmusculus.UCSC.mm10 to the variable "genome": genomic.acgt = getBackgroundFrequencies(genome) HTH Diego On Tue, Sep 9, 2014 at 3:13 AM, deepti anand <anand.deepti at="" outlook.com=""> wrote: > Hi Robert, > Thank you for example codes. I am able to extract all the 528 Mmusculus motifs from MotifDB by running the example codes you send. The code below gives me error when I try to get the A,C,G,T counts using getBackgroundFrequencies(). > prior = getBackgroundFrequencies("mm9")Error in pickGenome(organism) : Please pick one of the valid organisms: "dm3" or provide a BSgenome object of the target genome. > I have updated version of PWMEnrich (3.6.1) installed. Could you please suggest me how to proceed with this error. I appreciate your help. > -Dips-Date: Mon, 8 Sep 2014 18:22:28 +0100From: rainmansr at gmail.com > To: anand.deepti at outlook.com > CC: bioconductor at r-project.org > Subject: Re: [BioC] Motif enrichment analysis: Error in transfac format and background frequencies from BSGenome > > > > > > > > > Hi Dips, > > > > If you haven't already done so, please first update to the latest > version of PWMEnrich (in release this is 3.6.1). I would recommend > converting the MotifDb motifs directly into PFMs that PWMEnrich > expects. The only issue here is that MotifDb motifs come from > different sources and are not always in the same format (i.e. > sometimes they are probabilites, sometimes count matrices). Here > is some example code to extract the motifs from MotifDb: > > > > # extract mouse > motifs > > d = values(MotifDb) > > dm.sel = which(d$organism == "Mmusculus") > > > > # output list of motifs > > motifs = list() > > for(i in dm.sel){ > > seq.count = d$sequenceCount[i] > > ifis.na(seq.count)) > > seq.count = 100 > > motifs[[length(motifs)+1]] = apply(round(MotifDb[[i]] * > seq.count), 1:2, as.integer) > > } > > > > motif.names = d$geneSymbol[dm.sel] > > motif.ids = d$providerName[dm.sel] > > motif.names[is.na(motif.names)] = motif.ids[is.na(motif.names)] > > > > names(motifs) = motif.ids > > > > # get A,C,G,T counts > > prior = getBackgroundFrequencies("mm9") > > > > # convert to PWMenrich PWM format > > pwms = PFMtoPWM(motifs, id=motif.ids, name=motif.names, > prior.params=prior) > > > > # create background distributions > > bg = makeBackground(pwms, "mm9") > > > > The last line is using the mm9 promoters that are built-in into > PWMEnrich as genomic background. If you want to use a different > set of promoter sequences (i.e. mm10), you will have to extract > them yourself into a DNAStringSet object and pass them like this: > > > > bg = > makeBackground(pwms, bg.seq=your_DNAStringSet_object) > > > > Cheers, Robert > > > > On 07/09/14 23:17, deepti anand wrote: > > > > > Hi Roberts, > > > Thank you for suggestion. The backgrounds available in > PWMEnrich for mouse are in mm9 assembly (current is mm10). Also, I found that it has 329 PWMs which is > less than current MotifDb (528 motifs). That is why I want > to create a background with the current mouse genome and > use 528 motifs for enrichment analysis in my gene list > Could you please tell me how can I export the motifs in > 'transfac ' format and get the > background frequencies from 'BSgenome.Mmusculus.UCSC.mm10'. > > > > I would appreciate it. > > > > Dips > > > > > > Date: Sun, 7 Sep 2014 19:38:43 +0100 > > > From: rainmansr at gmail.com > > > To: anand.deepti at outlook.com > > > CC: bioconductor at r-project.org > > > Subject: Re: [BioC] Motif enrichment analysis: > Error in transfac format and background frequencies from > BSGenome > > > > > > > > > Dear Deepti, > > > > > > If you want to use the mouse MotifDB motifs you can > retrieve them in the > > > correct format for PWMEnrich here: > > > > > > > http://bioconductor.org/packages/2.14/data/experiment/html/PWMEnrich .Mmusculus.background.html > > > > > > Cheers, Robert > > > > > > On 07/09/14 16:47, deepti anand wrote: > > > > Hi all, > > > > I am scanning a geneset for all the Mmusculus > motifs and comparing their enrichment to genomic > background. I am using MotifDb package to retrieve > motifs and PWMEnrich for doing motif enrichment. I am > getting error in the below code- > > > > > > > > 1). Get all motifs in Mmusculus from MotifDb > in transfac format- > > > > In this step when exporting the motifs as > TRANSFAC format I am getting error. Here are my codes: > > > > > > > > > > > >> motifs.denovo = query(MotifDb, > 'Mmusculus') > > > >> > export(motifs.denovo,con='MotifDBFile',format='transfac') > > > > Error in cat(list(...), file, sep, fill, > labels, append) : > > > > argument 1 (type 'closure') cannot be handled > by 'cat' > > > > > > > > > > > > > > > > 2). Convert count matrices into PWMs: In this > step the error is in getting the background frequencies > from Mmusculus BSgenome. Here are my code: > > > > > > > > > > > >> library(BSgenome.Mmusculus.UCSC.mm10) > > > >> genome = BSgenome.Mmusculus.UCSC.mm10 > > > >> genomic.acgt = > getBackgroundFrequencies("BSgenome.Mmusculus.UCSC.mm10") > > > > Error in pickGenome(organism) : > > > > Please pick one of the valid organisms: "dm3" > or provide a BSgenome object of the target genome. > > > > > > > > > > > > I would appreciate any help > > > > > > > > > > > > Dips > > > > [[alternative HTML version deleted]] > > > > > > > > > _______________________________________________ > > > > Bioconductor mailing list > > > > Bioconductor at r-project.org > > > > > https://stat.ethz.ch/mailman/listinfo/bioconductor > > > > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > > > > > > > > > > > > > > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor