R: R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence)

0

Entering edit mode

mauede@alice.it ▴ 870

@mauedealiceit-3511

Last seen 9.7 years ago

It is true I received a number of answers providing examples of data extraction from Ensembl. However, none of them extracts any identifier contained in file "maturestar" (ex. >hsa-let-7d* MIMAT0004484 Homo sapiens let-7d* CUAUACGACCUGCUGCCUUUCU) or in file "mature" (ex. >hsa-miR-30a MIMAT0000087 Homo sapiens miR-30a UGUAAACAUCCUCGACUGGAAG) or in file "/hsa.gff" All the three above mentioned files contain the miRNA identifier and some other identifier that I do not know what it is. You may ask me why I haven't try to get all possible attribute values from Ensembl to check if some relationship can be found I anticipate my answer: > library(biomaRt) > hmart <- useMart('ensembl', dataset='hsapiens_gene_ensembl') Error in value[[3L]](cond) : Request to BioMart web service failed. Verify if you are still connected to the internet. Alternatively the BioMart web service is temporarily down. In fact I tried to ping server "www.biomart.org" and it did not work. ... I deduce the server is really down at the moment. Anyway, I do not know if either file above mentioned contains validated miRNAs. Best regards, Maura -----Messaggio originale----- Da: Sean Davis [mailto:seandavi@gmail.com] Inviato: sab 27/06/2009 14.23 A: mauede@alice.it Cc: Steve Lianoglou; bioconductor List Oggetto: Re: [BioC] R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence) On Sat, Jun 27, 2009 at 1:42 AM, <mauede@alice.it> wrote: > What is the attribute correspondent to the miR name (ex. "hsa-miR- 130a") ? Hi, Maura. This information does not exist directly via biomaRt. You can use the listAttributes() function to see what attributes are available if you are ever in doubt. > > > I have to link the gene information (actually right now I am only intrested > to the 3'UTR sequence) to the miRNA for which the gene in question is a > target. This question has been answered several times for you. You'll want to try those suggestions. At the bottom of emails to this list, you will find a link to search the archives in case you didn't save the emails sent to you earlier. Sean > > -----Messaggio originale----- > Da: Steve Lianoglou [mailto:mailinglist.honeypot@gmail.com] > Inviato: gio 25/06/2009 16.02 > A: mauede@alice.it > Cc: bioconductor List > Oggetto: Re: [BioC] R: how to find the VALIDATED pair (miRNA, > gene-3'UTR-sequence) > > One more thing to add: > > >> Similarity hsa-miR-130a miRanda miRNA_target 2 120825363 > 120825385 > >> + . 16.5359 1.687830e-02 ENST00000295228 INHBB > > > R> library(biomaRt) > > R> hmart <- useMart('ensembl', dataset='hsapiens_gene_ensembl') > > R> refseqs <- > > c > > ("NM_000757 > > ","NM_000757 > > ","NM_005461","NM_005924","NM_005924","NM_005924","NM_019102") > > R> gene.map <- getBM(attributes=c('hgnc_symbol', 'ensembl_gene_id', > > 'ensembl_transcript_id','refseq_dna'), filters='refseq_dna', > > value=refseqs, mart=hmart) > > > > R> gene.map > > hgnc_symbol ensembl_gene_id ensembl_transcript_id refseq_dna > > 1 CSF1 ENSG00000184371 ENST00000369802 NM_000757 > > 2 MAFB ENSG00000204103 ENST00000396967 NM_005461 > > 3 MEOX2 ENSG00000106511 ENST00000262041 NM_005924 > > 4 HOXA5 ENSG00000106004 ENST00000222726 NM_019102 > > > Your original ensembl transcript wasn't included in our result, so > instead of telling the `getBM` function to use a list of refseq IDs to > get info for, we can flip this around and find out what refseq ID your > "ENST00000295228" transcript points to. Using the same `hmart` object, > you can do it like so: > > R> getBM(attributes=c('hgnc_symbol', 'ensembl_gene_id', > 'ensembl_transcript_id','refseq_dna'), > filters='ensembl_transcript_id', value='ENST00000295228', mart=hmart) > > hgnc_symbol ensembl_gene_id ensembl_transcript_id refseq_dna > 1 INHBB ENSG00000163083 ENST00000295228 NM_002193 > > Note we just had to change the type of ID we are passing to the > `filters` parameter. > > -steve > > -- > Steve Lianoglou > Graduate Student: Physiology, Biophysics and Systems Biology > Weill Medical College of Cornell University > > Contact Info: http://cbio.mskcc.org/~lianos/contact<http: cbio.mskc="" c.org="" %7elianos="" contact=""> > > > > > > > > tutti i telefonini TIM! > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor@stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > tutti i telefonini TIM! [[alternative HTML version deleted]]

miRNA Biophysics Homo sapiens biomaRt PING miRNA Biophysics Homo sapiens biomaRt PING • 1.5k views

ADD COMMENT • link updated 14.9 years ago by michael watson IAH-C ★ 3.4k • written 14.9 years ago by mauede@alice.it ▴ 870

0

Entering edit mode

michael watson IAH-C ★ 3.4k

@michael-watson-iah-c-378

Last seen 9.7 years ago

To get those you will need to download the mature.fa.gz and maturestar.fa.gz files from the miRBase ftp site: ftp://ftp.sanger.ac.uk/pub/mirbase/sequences/CURRENT/. Then you can unzip them and do a grep to find the hsa miRs. They'll be in fasta format, and whether or not Bioconductor can read them in I have no idea - I use Bioperl for all my sequence handling. Mick -----Original Message----- From: bioconductor-bounces@stat.math.ethz.ch on behalf of mauede@alice.it Sent: Sun 28/06/2009 2:05 AM To: Sean Davis Cc: bioconductor List Subject: [BioC] R: R: R: how to find the VALIDATED pair (miRNA,gene-3 'UTR-sequence) It is true I received a number of answers providing examples of data extraction from Ensembl. However, none of them extracts any identifier contained in file "maturestar" (ex. >hsa-let-7d* MIMAT0004484 Homo sapiens let-7d* CUAUACGACCUGCUGCCUUUCU) or in file "mature" (ex. >hsa-miR-30a MIMAT0000087 Homo sapiens miR-30a UGUAAACAUCCUCGACUGGAAG) or in file "/hsa.gff" All the three above mentioned files contain the miRNA identifier and some other identifier that I do not know what it is. You may ask me why I haven't try to get all possible attribute values from Ensembl to check if some relationship can be found I anticipate my answer: > library(biomaRt) > hmart <- useMart('ensembl', dataset='hsapiens_gene_ensembl') Error in value[[3L]](cond) : Request to BioMart web service failed. Verify if you are still connected to the internet. Alternatively the BioMart web service is temporarily down. In fact I tried to ping server "www.biomart.org" and it did not work. ... I deduce the server is really down at the moment. Anyway, I do not know if either file above mentioned contains validated miRNAs. Best regards, Maura -----Messaggio originale----- Da: Sean Davis [mailto:seandavi at gmail.com] Inviato: sab 27/06/2009 14.23 A: mauede at alice.it Cc: Steve Lianoglou; bioconductor List Oggetto: Re: [BioC] R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence) On Sat, Jun 27, 2009 at 1:42 AM, <mauede at="" alice.it=""> wrote: > What is the attribute correspondent to the miR name (ex. "hsa-miR- 130a") ? Hi, Maura. This information does not exist directly via biomaRt. You can use the listAttributes() function to see what attributes are available if you are ever in doubt. > > > I have to link the gene information (actually right now I am only intrested > to the 3'UTR sequence) to the miRNA for which the gene in question is a > target. This question has been answered several times for you. You'll want to try those suggestions. At the bottom of emails to this list, you will find a link to search the archives in case you didn't save the emails sent to you earlier. Sean > > -----Messaggio originale----- > Da: Steve Lianoglou [mailto:mailinglist.honeypot at gmail.com] > Inviato: gio 25/06/2009 16.02 > A: mauede at alice.it > Cc: bioconductor List > Oggetto: Re: [BioC] R: how to find the VALIDATED pair (miRNA, > gene-3'UTR-sequence) > > One more thing to add: > > >> Similarity hsa-miR-130a miRanda miRNA_target 2 120825363 > 120825385 > >> + . 16.5359 1.687830e-02 ENST00000295228 INHBB > > > R> library(biomaRt) > > R> hmart <- useMart('ensembl', dataset='hsapiens_gene_ensembl') > > R> refseqs <- > > c > > ("NM_000757 > > ","NM_000757 > > ","NM_005461","NM_005924","NM_005924","NM_005924","NM_019102") > > R> gene.map <- getBM(attributes=c('hgnc_symbol', 'ensembl_gene_id', > > 'ensembl_transcript_id','refseq_dna'), filters='refseq_dna', > > value=refseqs, mart=hmart) > > > > R> gene.map > > hgnc_symbol ensembl_gene_id ensembl_transcript_id refseq_dna > > 1 CSF1 ENSG00000184371 ENST00000369802 NM_000757 > > 2 MAFB ENSG00000204103 ENST00000396967 NM_005461 > > 3 MEOX2 ENSG00000106511 ENST00000262041 NM_005924 > > 4 HOXA5 ENSG00000106004 ENST00000222726 NM_019102 > > > Your original ensembl transcript wasn't included in our result, so > instead of telling the `getBM` function to use a list of refseq IDs to > get info for, we can flip this around and find out what refseq ID your > "ENST00000295228" transcript points to. Using the same `hmart` object, > you can do it like so: > > R> getBM(attributes=c('hgnc_symbol', 'ensembl_gene_id', > 'ensembl_transcript_id','refseq_dna'), > filters='ensembl_transcript_id', value='ENST00000295228', mart=hmart) > > hgnc_symbol ensembl_gene_id ensembl_transcript_id refseq_dna > 1 INHBB ENSG00000163083 ENST00000295228 NM_002193 > > Note we just had to change the type of ID we are passing to the > `filters` parameter. > > -steve > > -- > Steve Lianoglou > Graduate Student: Physiology, Biophysics and Systems Biology > Weill Medical College of Cornell University > > Contact Info: http://cbio.mskcc.org/~lianos/contact<http: cbio.mskc="" c.org="" %7elianos="" contact=""> > > > > > > > > tutti i telefonini TIM! > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > tutti i telefonini TIM! [[alternative HTML version deleted]] _______________________________________________ Bioconductor mailing list Bioconductor at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/bioconductor Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 14.9 years ago michael watson IAH-C ★ 3.4k

0

Entering edit mode

> They'll be in fasta format, and whether or not Bioconductor can read > them in I have no idea - I use Bioperl for all my sequence handling. Yes, bioconductor can: the Biostrings package provides readFASTA and writeFASTA that handle this for you. -steve -- Steve Lianoglou Graduate Student: Physiology, Biophysics and Systems Biology Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos

ADD REPLY • link 14.9 years ago Steve Lianoglou ★ 13k

0

Entering edit mode

michael watson IAH-C ★ 3.4k

@michael-watson-iah-c-378

Last seen 9.7 years ago

The power of Bioconductor :D So, some code would look like this: > mat <- gzcon(url("ftp://ftp.sanger.ac.uk/pub/mirbase/sequences/CURRE NT/mature.fa.gz")) > matfas <- readFASTA(mat, strip.descs=TRUE) > matstar <- gzcon(url("ftp://ftp.sanger.ac.uk/pub/mirbase/sequences/C URRENT/maturestar.fa.gz")) > matstarfas <- readFASTA(matstar, strip.descs=TRUE) -----Original Message----- From: Steve Lianoglou [mailto:mailinglist.honeypot@gmail.com] Sent: Sun 28/06/2009 8:51 AM To: michael watson (IAH-C) Cc: mauede at alice.it; Sean Davis; bioconductor List Subject: Re: [BioC] R: R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence) > They'll be in fasta format, and whether or not Bioconductor can read > them in I have no idea - I use Bioperl for all my sequence handling. Yes, bioconductor can: the Biostrings package provides readFASTA and writeFASTA that handle this for you. -steve -- Steve Lianoglou Graduate Student: Physiology, Biophysics and Systems Biology Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos

ADD COMMENT • link 14.9 years ago michael watson IAH-C ★ 3.4k

0

Entering edit mode

michael watson IAH-C ★ 3.4k

@michael-watson-iah-c-378

Last seen 9.7 years ago

Hi Maura Well, you can get gene:target info from miRBase, read in using CORNA or just read.table. You can get miRNA sequences also from miRBase using readFASTA. You can get ensembl gene sequences using biomaRt. You can read in miRecords data using RODBC. You can then link this all together using merge(), though I appreciate some work needs to be done on the list provided by readFASTA. Other than actually doing the work for you, I'm not sure what else we can do.... :) Mick -----Original Message----- From: mauede@alice.it [mailto:mauede@alice.it] Sent: Sun 28/06/2009 3:35 PM To: michael watson (IAH-C); Steve Lianoglou Cc: Sean Davis; bioconductor List Subject: R: [BioC] R: R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence) Thank you very much. I just realized the biomart server is up & running again. Now I have learnt that BioMart can extract a lot of data from Ensembl (from where I have been told to get the genes info) and can also download the validated miRNAs compressed files. I stress the main problem I am experienciing, though, is still open. In fact I have to find a piece of data that allows me to relate all the gene info I can get from BioMart querying Ensembl to the downloaded miRNAs info. This is because the miRNA identifier is not available through BioMart .... I wish I were mistaken. However, some other (unique ?) miRNA attribute, that is available through BioMart, is also present in the VALIDATED targets file that is downloadable in XLS format from miRecords. This piece of data would allow me to relate the gene 3UTS string to the targeting miRNA. The issue is that I do not know how often such miRecords file is updated, and the downloading is to be performed outside R environment. Maybe R might handle the download automatically through the R "system" function and then the XLS file can be processed through R package "RExcelInstaller" ..... just a speculation ... Regards, Maura -----Messaggio originale----- Da: michael watson (IAH-C) [mailto:michael.watson at bbsrc.ac.uk] Inviato: dom 28/06/2009 10.15 A: Steve Lianoglou Cc: mauede at alice.it; Sean Davis; bioconductor List Oggetto: RE: [BioC] R: R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence) The power of Bioconductor :D So, some code would look like this: > mat <- gzcon(url("ftp://ftp.sanger.ac.uk/pub/mirbase/sequences/CURRE NT/mature.fa.gz")) > matfas <- readFASTA(mat, strip.descs=TRUE) > matstar <- gzcon(url("ftp://ftp.sanger.ac.uk/pub/mirbase/sequences/C URRENT/maturestar.fa.gz")) > matstarfas <- readFASTA(matstar, strip.descs=TRUE) -----Original Message----- From: Steve Lianoglou [mailto:mailinglist.honeypot@gmail.com] Sent: Sun 28/06/2009 8:51 AM To: michael watson (IAH-C) Cc: mauede at alice.it; Sean Davis; bioconductor List Subject: Re: [BioC] R: R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence) > They'll be in fasta format, and whether or not Bioconductor can read > them in I have no idea - I use Bioperl for all my sequence handling. Yes, bioconductor can: the Biostrings package provides readFASTA and writeFASTA that handle this for you. -steve -- Steve Lianoglou Graduate Student: Physiology, Biophysics and Systems Biology Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos Alice Messenger ;-) chatti anche con gli amici di Windows Live Messenger e tutti i telefonini TIM! Vai su http://maileservizi.alice.it/alice_messenger/index.html?pmk=footer

ADD COMMENT • link 14.9 years ago michael watson IAH-C ★ 3.4k

0

Entering edit mode

michael watson IAH-C ★ 3.4k

@michael-watson-iah-c-378

Last seen 9.7 years ago

Yes, but what are you trying to do? Biomart has a very complex structure, I admit that; but why do you need/want all those attributes? What are the attributes you need? This works: library(biomaRt) hmart <- useMart('ensembl', dataset='hsapiens_gene_ensembl') getBM(attributes=c("go_molecular_function_description", "go_molecular_function_linkage_type", "ensembl_gene_id", "ensembl_transcript_id"), filters='ensembl_transcript_id',value='ENST00000295228',mart=hmart) It gets the GO molecular function data for ensembl human transcript ENST00000295228. If that's what I want to do, then the code is right; if it's not, then the code is wrong. How does the query you specify below relate to your question on microRNAs? -----Original Message----- From: mauede@alice.it [mailto:mauede@alice.it] Sent: Sun 28/06/2009 6:29 PM To: michael watson (IAH-C); Steve Lianoglou Cc: Sean Davis; bioconductor List Subject: R: [BioC] R: R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence) Sure. I have to do that. I am just struggling to get all the pieces together. To me most of those names have no meaning as I do not have any Biology background. Here in the following I am pasting s weird error ... maybe it is clear to you. I am proceeding with getting 10 consecutive attributes at a tiime to find the ones that I need, if any. So far I have successfully extracted the first 40 attributes from the listAttributes(mart) but now ... > library(biomaRt) > hmart <- useMart('ensembl', dataset='hsapiens_gene_ensembl') Checking attributes ... ok Checking filters ... ok > getBM(attributes=c("go_molecular_function_description", + "go_molecular_function_linkage_type", + "clone_based_ensembl_gene_name", + "clone_based_ensembl_transcript_name", + "clone_based_vega_gene_name", + "clone_based_vega_transcript_name", + "ccds", + "embl", + "entrezgene", + "ottt"), + filters='ensembl_transcript_id',value='ENST00000295228',mart=hmart) Error in getBM(attributes = c("go_molecular_function_description", "go_molecular_function_linkage_type", : Query ERROR: caught BioMart::Exception::Usage: Too many attributes selected for External References -----Messaggio originale----- Da: michael watson (IAH-C) [mailto:michael.watson at bbsrc.ac.uk] Inviato: dom 28/06/2009 16.50 A: mauede at alice.it; Steve Lianoglou Cc: Sean Davis; bioconductor List Oggetto: RE: [BioC] R: R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence) Hi Maura Well, you can get gene:target info from miRBase, read in using CORNA or just read.table. You can get miRNA sequences also from miRBase using readFASTA. You can get ensembl gene sequences using biomaRt. You can read in miRecords data using RODBC. You can then link this all together using merge(), though I appreciate some work needs to be done on the list provided by readFASTA. Other than actually doing the work for you, I'm not sure what else we can do.... :) Mick -----Original Message----- From: mauede@alice.it [mailto:mauede@alice.it] Sent: Sun 28/06/2009 3:35 PM To: michael watson (IAH-C); Steve Lianoglou Cc: Sean Davis; bioconductor List Subject: R: [BioC] R: R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence) Thank you very much. I just realized the biomart server is up & running again. Now I have learnt that BioMart can extract a lot of data from Ensembl (from where I have been told to get the genes info) and can also download the validated miRNAs compressed files. I stress the main problem I am experienciing, though, is still open. In fact I have to find a piece of data that allows me to relate all the gene info I can get from BioMart querying Ensembl to the downloaded miRNAs info. This is because the miRNA identifier is not available through BioMart .... I wish I were mistaken. However, some other (unique ?) miRNA attribute, that is available through BioMart, is also present in the VALIDATED targets file that is downloadable in XLS format from miRecords. This piece of data would allow me to relate the gene 3UTS string to the targeting miRNA. The issue is that I do not know how often such miRecords file is updated, and the downloading is to be performed outside R environment. Maybe R might handle the download automatically through the R "system" function and then the XLS file can be processed through R package "RExcelInstaller" ..... just a speculation ... Regards, Maura -----Messaggio originale----- Da: michael watson (IAH-C) [mailto:michael.watson at bbsrc.ac.uk] Inviato: dom 28/06/2009 10.15 A: Steve Lianoglou Cc: mauede at alice.it; Sean Davis; bioconductor List Oggetto: RE: [BioC] R: R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence) The power of Bioconductor :D So, some code would look like this: > mat <- gzcon(url("ftp://ftp.sanger.ac.uk/pub/mirbase/sequences/CURRE NT/mature.fa.gz")) > matfas <- readFASTA(mat, strip.descs=TRUE) > matstar <- gzcon(url("ftp://ftp.sanger.ac.uk/pub/mirbase/sequences/C URRENT/maturestar.fa.gz")) > matstarfas <- readFASTA(matstar, strip.descs=TRUE) -----Original Message----- From: Steve Lianoglou [mailto:mailinglist.honeypot@gmail.com] Sent: Sun 28/06/2009 8:51 AM To: michael watson (IAH-C) Cc: mauede at alice.it; Sean Davis; bioconductor List Subject: Re: [BioC] R: R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence) > They'll be in fasta format, and whether or not Bioconductor can read > them in I have no idea - I use Bioperl for all my sequence handling. Yes, bioconductor can: the Biostrings package provides readFASTA and writeFASTA that handle this for you. -steve -- Steve Lianoglou Graduate Student: Physiology, Biophysics and Systems Biology Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos Alice Messenger ;-) chatti anche con gli amici di Windows Live Messenger e tutti i telefonini TIM! Vai su http://maileservizi.alice.it/alice_messenger/index.html?pmk=footer Alice Messenger ;-) chatti anche con gli amici di Windows Live Messenger e tutti i telefonini TIM! Vai su http://maileservizi.alice.it/alice_messenger/index.html?pmk=footer

ADD COMMENT • link 14.9 years ago michael watson IAH-C ★ 3.4k

0

Entering edit mode

michael watson IAH-C ★ 3.4k

@michael-watson-iah-c-378

Last seen 9.7 years ago

The CORNA library can read this file directly: corna.sf.net ________________________________ Fr?n: mauede at alice.it [mailto:mauede at alice.it] Skickat: m? 29/06/2009 4:12 Till: michael watson (IAH-C); Steve Lianoglou Kopia: Sean Davis; bioconductor List ?mne: R: [BioC] R: R: R: how to find the VALIDATED pair (miRNA, gene-3 'UTR-sequence) I have preprocessed the Fasta miRNAs files. I'd like to find an equivalent way to download and read in the file "http://microrna.sanger.ac.uk/cgi- bin/targets/v5/download.pl/arch.v5.txt.homo_sapiens.zip" without leaving R. Maybe I should dowload it firts using a system call and then use R unzip and finally read.table ? I doubt that read.table will work because it is not a matrix (constant rows and columns length). Thank you in advance for your help. Maura -----Messaggio originale----- Da: michael watson (IAH-C) [mailto:michael.watson at bbsrc.ac.uk] Inviato: dom 28/06/2009 16.50 A: mauede at alice.it; Steve Lianoglou Cc: Sean Davis; bioconductor List Oggetto: RE: [BioC] R: R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence) Hi Maura Well, you can get gene:target info from miRBase, read in using CORNA or just read.table. You can get miRNA sequences also from miRBase using readFASTA. You can get ensembl gene sequences using biomaRt. You can read in miRecords data using RODBC. You can then link this all together using merge(), though I appreciate some work needs to be done on the list provided by readFASTA. Other than actually doing the work for you, I'm not sure what else we can do.... :) Mick -----Original Message----- From: mauede@alice.it [mailto:mauede@alice.it] Sent: Sun 28/06/2009 3:35 PM To: michael watson (IAH-C); Steve Lianoglou Cc: Sean Davis; bioconductor List Subject: R: [BioC] R: R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence) Thank you very much. I just realized the biomart server is up & running again. Now I have learnt that BioMart can extract a lot of data from Ensembl (from where I have been told to get the genes info) and can also download the validated miRNAs compressed files. I stress the main problem I am experienciing, though, is still open. In fact I have to find a piece of data that allows me to relate all the gene info I can get from BioMart querying Ensembl to the downloaded miRNAs info. This is because the miRNA identifier is not available through BioMart .... I wish I were mistaken. However, some other (unique ?) miRNA attribute, that is available through BioMart, is also present in the VALIDATED targets file that is downloadable in XLS format from miRecords. This piece of data would allow me to relate the gene 3UTS string to the targeting miRNA. The issue is that I do not know how often such miRecords file is updated, and the downloading is to be performed outside R environment. Maybe R might handle the download automatically through the R "system" function and then the XLS file can be processed through R package "RExcelInstaller" ..... just a speculation ... Regards, Maura -----Messaggio originale----- Da: michael watson (IAH-C) [mailto:michael.watson at bbsrc.ac.uk] Inviato: dom 28/06/2009 10.15 A: Steve Lianoglou Cc: mauede at alice.it; Sean Davis; bioconductor List Oggetto: RE: [BioC] R: R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence) The power of Bioconductor :D So, some code would look like this: > mat <- gzcon(url("ftp://ftp.sanger.ac.uk/pub/mirbase/sequences/CURRE NT/mature.fa.gz")) > matfas <- readFASTA(mat, strip.descs=TRUE) > matstar <- gzcon(url("ftp://ftp.sanger.ac.uk/pub/mirbase/sequences/C URRENT/maturestar.fa.gz")) > matstarfas <- readFASTA(matstar, strip.descs=TRUE) -----Original Message----- From: Steve Lianoglou [mailto:mailinglist.honeypot@gmail.com] Sent: Sun 28/06/2009 8:51 AM To: michael watson (IAH-C) Cc: mauede at alice.it; Sean Davis; bioconductor List Subject: Re: [BioC] R: R: R: how to find the VALIDATED pair (miRNA, gene-3'UTR-sequence) > They'll be in fasta format, and whether or not Bioconductor can read > them in I have no idea - I use Bioperl for all my sequence handling. Yes, bioconductor can: the Biostrings package provides readFASTA and writeFASTA that handle this for you. -steve -- Steve Lianoglou Graduate Student: Physiology, Biophysics and Systems Biology Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos Alice Messenger ;-) chatti anche con gli amici di Windows Live Messenger e tutti i telefonini TIM! Vai su http://maileservizi.alice.it/alice_messenger/index.html?pmk=footer Alice Messenger ;-) chatti anche con gli amici di Windows Live Messenger e tutti i telefonini TIM! Vai su http://maileservizi.alice.it/alice_messenger/index.html?pmk=footer

ADD COMMENT • link 14.9 years ago michael watson IAH-C ★ 3.4k

Login before adding your answer.