R: R: BioMart error occurred again

0

Entering edit mode

mauede@alice.it ▴ 870

@mauedealiceit-3511

Last seen 9.6 years ago

I read that message and asked for some guidelines to query biomaRt in batch mode. The PDF file available from biomaRt on-line pages shows a number of useful ways to extract useful data but it does not mention any batch interrogation mode. I thought R CMD BATCH would be the way to do that. If so it will take a while. Basically I am trying to extract the 3utr sequence for each target gene transcript listed in data set hsTargets. Since I have to save to a file the miRNA identifier, the miRNA sequence, followed by all its target gene transcripts with their 3utr sequences, my R script loops on each miRNA identifier, reads out all its target gene transcript identifiers from hsTargets, and subits such an ENST list to biomaRt to get the relative 3UTR sequences: ## -------------------- GET 3UTR SEQUENCES FOR TARGET GENE TRANSCRIPTS gene_seq <- getSequence (id=tmp[,"target"],type="ensembl_transcrip t_id",seqType="3utr",mart=hmart) In addition, to the purpose of identifying the target transcripts in the output file I also ask biomaRt for some other target identifiers providing the ENST filter: gene_map <- getBM(attributes=c("hgnc_symbol","ensembl_gene_id","refse q_dna","ensembl_transcript_id"), filters = "ensembl_transcript_id", values=gene_seq[j,"ensembl_transcript_id"], mart=hmart) The typical output file looks like the example pasted at the bottom. My question is: how can I rewrite my R script so as to accomplish my task in batch mode ? I hope I won't have to get all the 3utr sequences for all the target gene transcripts listed in hsTargets. together. Thank you, Maura >hsa-miR-7 UGGAAGACUAGUGAUUUUGUUGU UGGAAGACUAGUGAUUUUGUUGU >GPRC5A|ENSG00000013588|ENST00000014914 CTCTGTCCTGAA ......................................................... ...................................................................... ............................................. ...................................................................... ...................................................................... ................................................................. >PSMA4|ENSG00000041357|ENST00000044462 AATCAGAGATTTTATTACTCATTTGGGGCACCATTTCAGTGTAAAAGCAGTCCTACTCTTCCACACTAGG AAGGCTTTAC TTTTTTTAACTGGTGCAGTGGGAAAATA.......................................... ...................................................................... ....................... ...................................................................... ...................................................................... ................................................................. >COPZ2|ENSG00000005243|ENST00000006101 AGGCTGTGGATTCAAGGCTCCCTGCCCCCCAGATCATTTCCCCAA......................... .......................................................... ...................................................................... ...................................................................... ................................................................. >PIGB|ENSG00000069943|ENST00000164305 ACTTTCCTAGATAAATTAACATT............................................... ...................................................................... ............................... ...................................................................... ...................................................................... ................................................................. >ZNF275|ENSG00000063587|ENST00000095634 AAACGCCCTGTGGTCCCGCGGGACAGGGACGGAGTCCCCAGAGGGGATGGCAGAGTCAAAGGAGATGAAC AGTTTT GTAGCGCTTATATATTTTGT.................................................. ...................................................................... .................................. ...................................................................... ...................................................................... ................................................................ tutti i telefonini TIM! [[alternative HTML version deleted]]

miRNA biomaRt miRNA biomaRt • 916 views

ADD COMMENT • link updated 14.4 years ago by steffen@stat.Berkeley.EDU ▴ 600 • written 14.4 years ago by mauede@alice.it ▴ 870

0

Entering edit mode

steffen@stat.Berkeley.EDU ▴ 600

@steffenstatberkeleyedu-2907

Last seen 9.6 years ago

Hi Maura, With "query in batch" I meant querying multiple IDs at once, not one at a time. There should be a way to convert your query from querying every id one by one to a query for everything in batch and then combine the results in R. For example: 1) you make a vector with all the target transcript ID's that are in your miRNA set and retrieve all 3utrs for all of them at once.: library(biomaRt) hmart=useMart("ensembl", dataset="hsapiens_gene_ensembl") targets = c("ENST00000014914","ENST00000044462","ENST00000006101","ENST000001643 05") targets3UTR= getSequence(id=targets,type="ensembl_transcript_id",seqType="3utr",mar t=hmart) 2) in a second query retrieve the gene symbols and ensembl gene ids for this set: idmap = getBM(attributes=c("hgnc_symbol","ensembl_gene_id","refseq_dna","ensem bl_transcript_id"),filters = "ensembl_transcript_id",values=targets, mart=hmart) Then in a next step you combine the information from targets3UTR and idmap in R. So all you need is two queries to biomaRt and then loop over the results in R to combine the data. Let me know if this solves your problem. Cheers, Steffen Cheers, Steffen > I read that message and asked for some guidelines to query biomaRt in > batch mode. > The PDF file available from biomaRt on-line pages shows a number of useful > ways to extract useful data but it > does not mention any batch interrogation mode. > I thought R CMD BATCH would be the way to do that. If so it will take a > while. > > Basically I am trying to extract the 3utr sequence for each target gene > transcript listed in data set hsTargets. > Since I have to save to a file the miRNA identifier, the miRNA sequence, > followed by all its target gene transcripts with their 3utr sequences, my > R script loops on each miRNA identifier, reads out all its target gene > transcript identifiers from > hsTargets, and subits such an ENST list to biomaRt to get the relative > 3UTR sequences: > > ## -------------------- GET 3UTR SEQUENCES FOR TARGET GENE TRANSCRIPTS > gene_seq <- getSequence > (id=tmp[,"target"],type="ensembl_transcript_id",seqType="3utr",mart= hmart) > > In addition, to the purpose of identifying the target transcripts in the > output file I also ask biomaRt for some other target identifiers providing > the ENST filter: > > gene_map <- > getBM(attributes=c("hgnc_symbol","ensembl_gene_id","refseq_dna","ens embl_transcript_id"), > filters = "ensembl_transcript_id", > values=gene_seq[j,"ensembl_transcript_id"], > mart=hmart) > > The typical output file looks like the example pasted at the bottom. > My question is: how can I rewrite my R script so as to accomplish my task > in batch mode ? > I hope I won't have to get all the 3utr sequences for all the target gene > transcripts listed in hsTargets. together. > > Thank you, > Maura > >>hsa-miR-7 > UGGAAGACUAGUGAUUUUGUUGU UGGAAGACUAGUGAUUUUGUUGU >>GPRC5A|ENSG00000013588|ENST00000014914 > CTCTGTCCTGAA > .................................................................... ...................................................................... .................................. > .................................................................... ...................................................................... ................................................................... >>PSMA4|ENSG00000041357|ENST00000044462 > AATCAGAGATTTTATTACTCATTTGGGGCACCATTTCAGTGTAAAAGCAGTCCTACTCTTCCACACTA GGAAGGCTTTAC > TTTTTTTAACTGGTGCAGTGGGAAAATA........................................ ...................................................................... ......................... > .................................................................... ...................................................................... ................................................................... >>COPZ2|ENSG00000005243|ENST00000006101 > AGGCTGTGGATTCAAGGCTCCCTGCCCCCCAGATCATTTCCCCAA....................... ............................................................ > .................................................................... ...................................................................... ................................................................... >>PIGB|ENSG00000069943|ENST00000164305 > ACTTTCCTAGATAAATTAACATT............................................. ...................................................................... ................................. > .................................................................... ...................................................................... ................................................................... >>ZNF275|ENSG00000063587|ENST00000095634 > AAACGCCCTGTGGTCCCGCGGGACAGGGACGGAGTCCCCAGAGGGGATGGCAGAGTCAAAGGAGATGA ACAGTTTT > GTAGCGCTTATATATTTTGT................................................ ...................................................................... .................................... > .................................................................... ...................................................................... .................................................................. > > > > > > > > > > > > > tutti i telefonini TIM! > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor >

ADD COMMENT • link 14.4 years ago steffen@stat.Berkeley.EDU ▴ 600

Login before adding your answer.