GoStats and microRNA pipeline using Biomart

0

Entering edit mode

David ▴ 860

@david-3335

Last seen 6.1 years ago

Hi marc, Thanks for the tip, i realized that my function was too slow. I agree with you. Getting all GO in one shot is a much better approach. On 03/31/2011 09:23 PM, Marc Carlson wrote: > Hi David, > > If this was your function you would 1st of all want to just pass in a > big vector (with your universe of transcript IDs in it) to get out all > the data. Then making the GOFrame is just a matter of taking all the > Gene IDs (entrez gene IDs) and all the GO IDs (from any of the three > ontologies), and the evidence codes into a single data.frame as outlined > in this document here: > > http://www.bioconductor.org/packages/2.7/bioc/vignettes/GOstats/inst /doc/GOstatsForUnsupportedOrganisms.pdf > > > But if it were me, I would attempt to save a little headache for making > the final table, but just getting only the data I needed from getBM (and > since they keep the three ontologies separate, that means I would make > three calls to get BM. So like this > > getBioProcgoids <- function (id) { > getBM(attributes=c( > 'go_biological_process_id', > 'go_biological_process_linkage_type', > 'entrezgene') > ,filters="ensembl_transcript_id", values=id, mart=mart) > } > BioGOs <- getBioProcgoids( > yourBigUniverseVectorOfEnsemblTranscriptIDsGoesHere ) > > Then do separate small functions to get the other two ontologies and > call them etc. > > Then something like this: > > myGOFrame <- rbind(BioGOs, CCGOs, MFGOs) > > To stick them all together. > > Does that help? > > > Marc > > > > On 03/31/2011 02:47 AM, David martin wrote: >> Ok thanks, >> Any idea on how to turn the biomart output into a valid GOFrame input ?? >> >> For example : >> I wrote this function >> >> getgoids <- function (id) { >> getBM(attributes=c( >> 'entrezgene', >> 'ensembl_transcript_id', >> 'go_biological_process_id', >> 'go_biological_process_linkage_type', >> 'go_cellular_component_id', >> 'go_cellular_component_linkage_type', >> 'go_molecular_function_id', >> 'go_molecular_function_linkage_type') >> ,filters="ensembl_transcript_id", values=id, mart=mart) >> } >> foo >> >> How do i turn this into a valid GOFrame Object ? >> >> thanks, >> david >> >> >> >> >> On 03/31/2011 10:10 AM, James F. Reid wrote: >>> Hi David, >>> >>> On 03/30/2011 08:31 PM, David martin wrote: >>> > Yes absolutly. A few ensembl releases ago UTR tend to be smaller but >>> > this is getting better now. How would you normalize that based on >>> length ? >>> >>> I'm afraid that I don't have a simple answer to this it would need >>> thinking out especially wrt to your GO enrichment analysis. >>> Any ideas from the members of the list? >>> >>> Best, >>> J. >>> >>>> On 03/30/2011 07:00 PM, James F. Reid wrote: >>>>> Hi David, >>>>> >>>>> I understand your reasoning for counting the number of miRNA binding >>>>> sites with the 3' UTR of a predicted target, you are trying to include >>>>> the 'combinatorial' effect of miRNA targeting. >>>>> I would try to include the length of any UTR however (some kind of >>>>> normalization if you wish) since the longer the UTR the more >>>>> chances are >>>>> that miRNA will bind. >>>>> Does this make sense? >>>>> >>>>> Best, >>>>> J. >>>>> >>>>> On 03/30/2011 05:23 PM, David martin wrote: >>>>>> On 03/30/2011 04:56 PM, Steve Lianoglou wrote: >>>>>>> Hi, >>>>>>> >>>>>>> On Wed, Mar 30, 2011 at 9:43 AM, David >>>>>>> martin<vilanew at="" gmail.com=""> wrote: >>>>>>>> Hi, >>>>>>>> I open this new discussion so not to confuse with the previous one. >>>>>>>> >>>>>>>> The objective here is to look for overrepresented GoTerms from >>>>>>>> microRNA >>>>>>>> targets. One microRNA can have several targets (genes) and one >>>>>>>> single >>>>>>>> gene >>>>>>>> can be targeted by several microRNAs. The assumption is to check >>>>>>>> for a >>>>>>>> specific microRNAs which GoTerms are overrepresented. >>>>>>>> >>>>>>>> >>>>>>>> Ok so let's say me my microRNA of interest is mir-A. >>>>>>>> >>>>>>>> Step1: based on my favorite prediction algorithm i have managed to >>>>>>>> get a >>>>>>>> list of genes targeted by mir-A. The genes are ensembl transcripts >>>>>>>> and as i >>>>>>>> said before miR-A can target several times the same transcript (at >>>>>>>> different >>>>>>>> location) so i need to account for this. >>>>>>>> >>>>>>>> miR-A targets -> >>>>>>>> ENST001,ENST001,ENST001,ENST0025,ENST089,ENST099,ENST0099......) up >>>>>>>> to 300 >>>>>>>> different transcripts. >>>>>>> >>>>>>> I don't get why you'd want to have the same transcript multiple >>>>>>> times >>>>>>> as a target for the miRNA -- if the miRNA targets the same >>>>>>> transcript >>>>>>> in two different locations, you then want to double count the GO >>>>>>> terms >>>>>>> associated with that transcript? >>>>>> >>>>>> That's correct. The idea behind that is that a transcript targeted at >>>>>> different locations is more "likely to be twice targeted" and >>>>>> therefore >>>>>> GO term associated to this transcript have to be replicated. This >>>>>> sound >>>>>> good to me but i don not expect that you agree on that. >>>>>> >>>>>> >>>>>> i have managed to get all GO ids with a small function. Basically you >>>>>> input one transcript id in a loop >>>>>> >>>>>> l = length(genes) # list of all ensembl transcripts >>>>>> for (l in 1:l) >>>>>> { >>>>>> goid[l] <- getgoids("ENST...") >>>>>> >>>>>> } >>>>>> getgoids <- function (id) { >>>>>> getBM(attributes=c( >>>>>> 'go_biological_process_id', >>>>>> 'go_biological_process_linkage_type', >>>>>> 'go_cellular_component_id', >>>>>> 'go_cellular_component_linkage_type', >>>>>> 'go_molecular_function_id', >>>>>> 'go_molecular_function_linkage_type') >>>>>> ,filters="ensembl_transcript_id", values=id, mart=mart) >>>>>> } >>>>>> >>>>>> I agree wioth you that i might need to add the transcript_id to be >>>>>> able >>>>>> to use for GoStats mapping between transcripts and GO ids. >>>>>> >>>>>> >>>>>> Now i want to use that as the univere set for GoStats and do >>>>>> hyperG to >>>>>> compare with the GO for a specific microRNA. >>>>>> >>>>>> I guess : >>>>>> >>>>>> goframeData = data.frame(frame$go_id, frame$Evidence, frame$gene_id) >>>>>> #list of all GOids from all transcripts targeted by all microRNA >>>>>> >>>>>> goFrame = GOFrame(goframeData, organism = "Homo sapiens") >>>>>> goAllFrame = GOAllFrame(goFrame) #Geneid to ALL go id mapping >>>>>> >>>>>> >>>>>> In the GSEAGOHyperGParams function below can you correct me ? >>>>>> geneSetCollection = List of all go ids off all transcripts >>>>>> targetted by >>>>>> all microRNA >>>>>> single_mir_transcript_ids = list of ENSEMBl transcripts ids >>>>>> targeted by >>>>>> a specific microRNA >>>>>> univerGeneIds: list of transcript to Go mapping >>>>>> Is this correc t? >>>>>> >>>>>> >>>>>> gsc <- GeneSetCollection(goAllFrame, setType = GOCollection()) >>>>>> params <- GSEAGOHyperGParams(name = "My Custom GSEA based annot >>>>>> Params",geneSetCollection = gsc, geneIds = >>>>>> single_mir_transcripts_ids, >>>>>> universeGeneIds = universe,ontology = "BP", pvalueCutoff = 0.05, >>>>>> conditional = FALSE,testDirection = "over") >>>>>> >>>>>> >>>>>>> >>>>>>> Somehow that seems wrong to me -- if the "hit count" of the miRNA to >>>>>>> the transcript is important to you, one thing you can do is store >>>>>>> your >>>>>>> miR-A vector as its "table()" so the names will the the transcripts, >>>>>>> and the values will be the number of hits. >>>>>>> >>>>>>>> I use biomart to get the corresponding GoIds for these transcripts >>>>>>>> >>>>>>>> .... >>>>>>>> #Select mart database >>>>>>>> mart<- useMart("ensembl", dataset="hsapiens_gene_ensembl") >>>>>>>> >>>>>>>> #Get go for a specific transcript >>>>>>>> # First problem as Biomart will not return twice GoTerms for >>>>>>>> duplicated >>>>>>>> transcripts. The example below show that for transcript >>>>>>>> c("ENST00000347770","ENST00000347770") i get the same goTerms than >>>>>>>> for >>>>>>>> transcript c("ENST00000347770"). >>>>>>>> # As i said before a microRNA can target several times the same >>>>>>>> microRNA so >>>>>>>> twice the number of goterms associated to this particular microRNA. >>>>>>>> Can we >>>>>>>> force biomart to return redundant GoTerms ???? >>>>>>> >>>>>>> I'm actually still not sure what you want to do, but if you >>>>>>> follow my >>>>>>> advice above, you can manipulate the data.frame you get from >>>>>>> getBM to >>>>>>> replicate rows (or whatever you're trying to do). >>>>>>> >>>>>>> You will also want to add "ensembl_transcript_id" to your vector of >>>>>>> attributes so you can reassociate the rows in the table that is >>>>>>> returned to you with your original ensembl transcripts you are >>>>>>> querying for, eg: >>>>>>> >>>>>>> R> gomir<- getBM(attributes=c('ensembl_transcript_id', 'go..', ...), >>>>>>> filters='ensemble_transcript_id', values=c("ENST..."), mart=mart) >>>>>>> >>>>>>> Hope that helps, >>>>>>> -steve >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioconductor mailing list >>>>>> Bioconductor at r-project.org >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>>> Search the archives: >>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioconductor mailing list >>>>> Bioconductor at r-project.org >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>>> Search the archives: >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>>> >>>> >>>> _______________________________________________ >>>> Bioconductor mailing list >>>> Bioconductor at r-project.org >>>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>>> Search the archives: >>>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>>> >>> >>> _______________________________________________ >>> Bioconductor mailing list >>> Bioconductor at r-project.org >>> https://stat.ethz.ch/mailman/listinfo/bioconductor >>> Search the archives: >>> http://news.gmane.org/gmane.science.biology.informatics.conductor >>> >> >> _______________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org >> https://stat.ethz.ch/mailman/listinfo/bioconductor >> Search the archives: >> http://news.gmane.org/gmane.science.biology.informatics.conductor > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >

miRNA Normalization GO Organism GOstats biomaRt microRNA miRNA Normalization GO Organism • 1.3k views

ADD COMMENT • link 13.1 years ago David ▴ 860

Login before adding your answer.