R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence)

0

Entering edit mode

mauede@alice.it ▴ 870

@mauedealiceit-3511

Last seen 9.6 years ago

Thank you very much. Now I have to push my inquiry a little bit further ... sorry for being pedantic. Do you know, or can you help me finf out, the correspondence naming convention between the BioMart databases and miRecords and TarBase ? Thanks to your help I learnt how to find the association between miRNA and gene-3UTR region. For instance: Similarity hsa-miR-130a miRanda miRNA_target 2 120825363 120825385 + . 16.5359 1.687830e-02 ENST00000295228 INHBB miRNA identifier is "hsa-miR-130a" Feature is: "miRNA_target" Chromosome is: 2 Start is: 120825363 End is: 120825385 TRANSCRIPT_ID is: ENST00000295228 EXTERNAL_NAME is: INHBB I downloaded the VALIDATED Targets from miRecords and saved it as an XLS file. I browsed such XLS file looking for the above miRNA identifier "hsa- miR-130a" and found the following 4 records where the target gene names (not unique) are respectively: "MCSF","MAFB","GAX","HOXA5", the Target gene_Refseq_acc are the NM.... strings, the Target site_number is known and equal to 2 only for the first two records 14697198 Homo sapiens human MCSF NM_000757.3 2 Homo sapiens hsa-miR-130a 14697198 Homo sapiens human MCSF NM_000757.3 2 Homo sapiens hsa-miR-130a 16549775 Homo sapiens human MAFB NM_005461.3 Homo sapiens hsa-miR-130a 17957028 Homo sapiens human GAX NM_005924.4 Homo sapiens hsa-miR-130a 17957028 Homo sapiens human GAX NM_005924.4 Homo sapiens hsa-miR-130a 17957028 Homo sapiens human GAX NM_005924.4 Homo sapiens hsa-miR-130a 17957028 Homo sapiens human HOXA5 NM_019102.2 Homo sapiens hsa-miR-130a It looks like miRNAs naming convebtion is the same for BioMart and miRecords databases My problem is the apparently different genes naming convention. How can I map the gene identifier used in BioMart databases to the gene identifiers used in miRecords ? Without such *hopefully* 1-1 mapping function I cannot use the information across databases. On the other hand, I cannot see how to get the 3UTR region sequences from miRecords only. Any suggestion and/or comment is more than welcome. Thank you in advance for your attention. Maura -----Messaggio originale----- Da: michael watson (IAH-C) [mailto:michael.watson@bbsrc.ac.uk] Inviato: gio 25/06/2009 8.39 A: mauede@alice.it; Sean Davis Cc: bioconductor@stat.math.ethz.ch Oggetto: RE: [BioC] how to find the VALIDATED pair (miRNA, gene-3'UTR- sequence) They are predicted. The only databases of experimentally predicted taregst are TarBase and miRecords, and when I last looked they had 1300 and 1135 records respectively. Mick ________________________________ From: bioconductor-bounces@stat.math.ethz.ch on behalf of mauede@alice.it Sent: Thu 25/06/2009 4:57 AM To: Sean Davis Cc: bioconductor@stat.math.ethz.ch Subject: [BioC] how to find the VALIDATED pair (miRNA, gene-3'UTR- sequence) Thank you very much. I believe I can use biomaRt functions to get the 3'UTR sequences through providing the crhomosome name and start/end sequence coordinates. However I am not sure that the text file I downloaded from http://microrna.sanger.ac.uk/cgi-bin/targets/v5/download.pl that is "arch.v5.txt.homo_sapiens" contains (points to) the VALIDATED miRNA <-> gene-3'UTR sequences (or coordinates of them). Since the prediction code "miRANDA" is mentioned, my question is: are the (miRNA, gene-3'UTR-sequence) pairs listed in the files downloadable from http://microrna.sanger.ac.uk/cgi- bin/targets/v5/download.pl experimentally VALIDATED or computationally PREDICTED ? At he time being I definitely need the (miRNA,gene-3'UTR-sequences) experimentally VALIDATED pairs. Please, correct me if I am mistaken . Thank you so much, Maura -----Messaggio originale----- Da: Sean Davis [mailto:seandavi@gmail.com] Inviato: mer 24/06/2009 18.28 A: mauede@alice.it Cc: bioconductor@stat.math.ethz.ch Oggetto: Re: [BioC] how to find the validated pair (miRNA, gene-3'UTR- sequence) On Wed, Jun 24, 2009 at 11:45 AM, <mauede@alice.it> wrote: > Sorry for my misuse of Biology nomenclature. I am still very confused. > > My first task (very trivial for you) is to generate a text files containing > a list of Homo-Sapiens validated miRNAs (microRNA-identifier, sequence) > and relative 3'UTR regions (gene-identifier, 3'UTR-sequence). Hi, Maura. See here: http://microrna.sanger.ac.uk/cgi-bin/targets/v5/download.pl If you download the text file for human, it looks like: Similarity hsa-miR-647 miRanda miRNA_target 2 120824263 120824281 + . 16.3205 3.701400e-06 ENST00000295228 INHBB Similarity hsa-miR-130a miRanda miRNA_target 2 120825363 120825385 + . 16.5359 1.687830e-02 ENST00000295228 INHBB >From here, you have the miR name, the chromosome (2 in this case), the chromosome start and end positions, and the strand. You can use this to get the sequence from the genome (the fasta sequence for those locations). The transcript name (ENST....) is from the Ensembl database, so there is plenty of information via biomaRt, if necessary, but the HUGO gene symbol is given in the last column. Several of the code snippets you give below give similar information. If you are concerned about what a specific data source is giving you, you should probably contact that data source directly via email. Most websites offer a "contact us" link. If this isn't what you need, then perhaps you can show more specifically how this information is not meeting your needs. Know that you may have to do a little bit of programming to get things into exactly the formats that you like. Sean > > I realize this is just a matter of retrieving all known information. The > difficulty for me is where to find the pair (miRNA, gene-3'UTR) matching > information. > In the following I downloaded a lot of stuff but I do not know how to put > the pieces together to fulfill my task. > I think the 3'UTR sequences can be retrieved through function "getSequence" > from package "biomaRt"m .... if only I knew which parameters to pass to such > a function to achieve my goal. > > 1) Function "hsSeqs" from package "microRNA" produces 677 miRNAs entries > ex. hsa-let-7a "UGAGGUAGUAGGUUGUAUAGUU" > Are such miRNAs validated ? > If the answer is "yes" then how can I retrieve the correspondent > gene-3'UTR regions ? > > 2) Function "hsSeqs" from package "microRNA" produces a matrix 709015x 6 > contaiing miRNA identifiers > and apparently some data from the paired gene. > ex. name target chrom start end > strand > [1,] "hsa-miR-647" "ENST00000295228" "2" "120824263" "120824281" "+" > [2,] "hsa-miR-130a" "ENST00000295228" "2" "120825363" "120825385" "+" > > > Again. how can I retrieve the correspondent gene-3'UTR regions from the > above data ? Note my answer above. The gene 3'UTR information is there, but you may need to do some calculations, depending on what you want. Also, note that "genes" do not have 3'UTRs--only transcripts have that. > > > 3) Function "s3utr" from package "microRNA" produces 112 3'UTR entries > ex. > "CCTGCCCGCCCGCATGGCCAGCCAGTGGCAAGCTGCCGCCCCCACTCTCCGGGCACCGTCTCCTGCC TGTGCGTCCGCCC > > ACCGCTGCCCTGTCTGTTGCGACACCCTCCCCCCCACATACACACGCAGCGTTTTGATAAATTATTGG TTTTCAACG" > > Where do such 3'UTR come from ? Which (miRNA, gene) do they belong to ? > > 4) I downloaded the file "mature.fa" (Fasta format sequences of all mature > miRNA sequences) from http://microrna.sanger.ac.uk/sequences/ftp.shtml > The file contais a number of records starting withthe miRNA identifier. > ex: hsa-miR-943 miRanda miRNA_target 9885484 9885504 15.6748 > 4.721740e-02 + . URL " > http://www.ensembl.org/homo_sapiens/geneview?gene=ENST00000302092" > hsa-miR-944 miRanda miRNA_target 9885188 9885209 16.602 > 1.659470e-03 + . URL " > http://www.ensembl.org/homo_sapiens/geneview?gene=ENST00000302092" > > Where are the 3'UTR regions indicated in the above records ? > > > 5) I downloaded miRNA Validated Targets from > http://mirecords.umn.edu/miRecords/download.php. > It generated a huge XLS file with alot of data. > ex: Pubmed_id Target gene_species_scientific Target gene_species_common > Target gene_name Target gene_Refseq_acc Target site_number > miRNA_species miRNA_mature_ID miRNA_regulation Reporter_target > gene/region Reporter link element Test_method_inter Target gene > mRNA_level Original description Mutation_target region Post > mutation_method Original description_mutation_region Target > site_position A Reporter_target site Reporter link element > Test_method_inter_site Original description_inter_site Mutation_target site > Post mutation_method_site Original description_mutation_site > Mutiple site mutation note Additional note > 12808467 Homo sapiens human Hes1 NM_198155.2 1 > Homo sapiens hsa-miR-23a mutation Western > blotting Next, to examine whether expression of the gene for > Hes1 is regulated by miR-23, we introduced synthetic miR-23 or mutant miR-23 > (Fig. 2a) into undifferentiated NT2 cells. When synthetic miR-23 was > introduced at 2 mMinto undifferentiated NT2 cells,the intracellular level of > Hes1 fell significantly (Fig. 2b).By contrast,in the presence of synthetic > mutant miR-23,the level of Hes1 in undifferentiated NT2 cells remained > unchanged and similar to that in untreated wild-type NT2 cells (Fig. 2b). > 801 overexpression by mature miRNA transfection > luciferase target site(five copies of the target sequence) activity > assay Furthermore, the luciferase activity of LucSTS23 in undifferentiated > NT2 cells that had been treated with synthetic miR-23 was lower than that in > untreated wild-type NT2 cells (Fig. 3c). Yes Luciferase activity > assay Furthermore, the luciferase activity of LucSTS23 in > undifferentiated NT2 cells that had been treated with synthetic miR-23 was > lower than that in untreated wild-type NT2 cells (Fig. 3c). > > Thank you in advance for helping me out of my misery. > Maura > > > > > > > > > > > > > > > > > > > [[alternative HTML version deleted]] > > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > tutti i telefonini TIM! [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor@stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor [[alternative HTML version deleted]]

miRNA Homo sapiens biomaRt miRNA Homo sapiens biomaRt • 1.2k views

ADD COMMENT • link updated 14.8 years ago by Steve Lianoglou ★ 13k • written 14.8 years ago by mauede@alice.it ▴ 870

0

Entering edit mode

Steve Lianoglou ★ 13k

@steve-lianoglou-2771

Last seen 14 months ago

United States

Hi Maura, On Jun 25, 2009, at 7:01 AM, <mauede at="" alice.it=""> <mauede at="" alice.it=""> wrote: > Thank you very much. > Now I have to push my inquiry a little bit further ... sorry for > being pedantic. > Do you know, or can you help me finf out, the correspondence naming > convention between the BioMart databases and miRecords and TarBase ? > Thanks to your help I learnt how to find the association between > miRNA and gene-3UTR region. For instance: > > Similarity hsa-miR-130a miRanda miRNA_target 2 120825363 120825385 > + . 16.5359 1.687830e-02 ENST00000295228 INHBB > <snip> > > 14697198 Homo sapiens human MCSF NM_000757.3 2 Homo sapiens hsa- > miR-130a > 14697198 Homo sapiens human MCSF NM_000757.3 2 Homo sapiens hsa- > miR-130a > 16549775 Homo sapiens human MAFB NM_005461.3 Homo sapiens hsa- > miR-130a > 17957028 Homo sapiens human GAX NM_005924.4 Homo sapiens > hsa-miR-130a > 17957028 Homo sapiens human GAX NM_005924.4 Homo sapiens > hsa-miR-130a > 17957028 Homo sapiens human GAX NM_005924.4 Homo sapiens > hsa-miR-130a > 17957028 Homo sapiens human HOXA5 NM_019102.2 Homo sapiens hsa- > miR-130a > > It looks like miRNAs naming convebtion is the same for BioMart and > miRecords databases > My problem is the apparently different genes naming convention. > How can I map the gene identifier used in BioMart databases to the > gene identifiers used in miRecords ? > Without such *hopefully* 1-1 mapping function I cannot use the > information across databases. The gene IDs from your first result (eg: ENSTXXXX) are Ensembl transript IDs. The IDs used in your second example, (eg: NM_00757.3, etc) are Refseq IDs. It seems that the .X in NM_*.3, NM_*.4, etc are for versioning purposes, so the actual refseq accession number for NM_000757.3 is NM_000757, make sense? OK, know that we know that, you can use biomaRt (once again!) to create yourself a map of refseq <--> transcript IDs. I don't think you'll get an exact 1-1 mapping as you'd like (usually ID mapping is never so easy, but you might get lucky), so you'll probably need some further processing, but look here: R> library(biomaRt) R> hmart <- useMart('ensembl', dataset='hsapiens_gene_ensembl') R> refseqs <- c ("NM_000757 ","NM_000757 ","NM_005461","NM_005924","NM_005924","NM_005924","NM_019102") R> gene.map <- getBM(attributes=c('hgnc_symbol', 'ensembl_gene_id', 'ensembl_transcript_id','refseq_dna'), filters='refseq_dna', value=refseqs, mart=hmart) R> gene.map hgnc_symbol ensembl_gene_id ensembl_transcript_id refseq_dna 1 CSF1 ENSG00000184371 ENST00000369802 NM_000757 2 MAFB ENSG00000204103 ENST00000396967 NM_005461 3 MEOX2 ENSG00000106511 ENST00000262041 NM_005924 4 HOXA5 ENSG00000106004 ENST00000222726 NM_019102 That should get you pretty close to where you want to be. Hope that helps, -steve -- Steve Lianoglou Graduate Student: Physiology, Biophysics and Systems Biology Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD COMMENT • link 14.8 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

One more thing to add: >> Similarity hsa-miR-130a miRanda miRNA_target 2 120825363 120825385 >> + . 16.5359 1.687830e-02 ENST00000295228 INHBB > R> library(biomaRt) > R> hmart <- useMart('ensembl', dataset='hsapiens_gene_ensembl') > R> refseqs <- > c > ("NM_000757 > ","NM_000757 > ","NM_005461","NM_005924","NM_005924","NM_005924","NM_019102") > R> gene.map <- getBM(attributes=c('hgnc_symbol', 'ensembl_gene_id', > 'ensembl_transcript_id','refseq_dna'), filters='refseq_dna', > value=refseqs, mart=hmart) > > R> gene.map > hgnc_symbol ensembl_gene_id ensembl_transcript_id refseq_dna > 1 CSF1 ENSG00000184371 ENST00000369802 NM_000757 > 2 MAFB ENSG00000204103 ENST00000396967 NM_005461 > 3 MEOX2 ENSG00000106511 ENST00000262041 NM_005924 > 4 HOXA5 ENSG00000106004 ENST00000222726 NM_019102 Your original ensembl transcript wasn't included in our result, so instead of telling the `getBM` function to use a list of refseq IDs to get info for, we can flip this around and find out what refseq ID your "ENST00000295228" transcript points to. Using the same `hmart` object, you can do it like so: R> getBM(attributes=c('hgnc_symbol', 'ensembl_gene_id', 'ensembl_transcript_id','refseq_dna'), filters='ensembl_transcript_id', value='ENST00000295228', mart=hmart) hgnc_symbol ensembl_gene_id ensembl_transcript_id refseq_dna 1 INHBB ENSG00000163083 ENST00000295228 NM_002193 Note we just had to change the type of ID we are passing to the `filters` parameter. -steve -- Steve Lianoglou Graduate Student: Physiology, Biophysics and Systems Biology Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact

ADD REPLY • link 14.8 years ago Steve Lianoglou ★ 13k

Login before adding your answer.