Search
Question: All 3' UTRs for an organism - biomaRt package
0
gravatar for Urska Cvek
10.1 years ago by
Urska Cvek20
Urska Cvek20 wrote:
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20071001/ 18f98078/attachment.pl
ADD COMMENTlink modified 10.1 years ago by Steffen500 • written 10.1 years ago by Urska Cvek20
0
gravatar for Steffen
10.1 years ago by
Steffen500
Steffen500 wrote:
Dear Urska, Your errors show that the getSequence function needed extra argument checking to help the user in understanding what is going wrong. This has been added to the new biomaRt version which will be included in the new BioConductor release later this week. Firstly, getSequence always needs the type and seqType arguments. The type argument specifies the type of identifiers that you use as input and also specifies the type of identifiers that will be present in the output so you know to which genes the retrieved sequences belong. Secondly, getSequence uses two ways to specify the sequences you are interested in: a) you give a list of identifiers and getSequence will retrieve the sequences for the corresponding genes (here you'll use the id argument) b) you give a chromosomal location and getSequence will retrieve the sequences of all genes that are located between the given positions. (here you'll use the chromosome, start and end arguments) An additional problem that you encounter is that the data volume you want to retrieve is so large (as you query for all sequences of all genes) that this possibly resulted in a time-out. Here's how I would retrieve the sequences: First retrieve all ensembl gene identifiers in the hsapiens dataset by: ensmart=useMart("ensembl", dataset="hsapiens_gene_ensembl") ensemblid = getBM("ensembl_gene_id", mart=ensmart) This should give 31189 unique ensembl gene identifiers. To retrieve the 3UTRs for all these identifiers you could now retrieve them in batches of 5000 at once e.g.: utr3Batch1 = getSequence(id = ensemblid[1:5000,1], type="ensembl_gene_id", seqType="3utr", mart=ensmart) utr3Batch2 = getSequence(id = ensemblid[5001:10000,1], type="ensembl_gene_id", seqType="3utr", mart=ensmart) etc. Note that you'll have to do some cleaning as genes might have multiple 3'UTRs associated with them for alternatively spliced transcripts and others might have no annotated 3'UTR. Cheers, Steffen Urska Cvek wrote: > I am using the latest version of biomaRt (1.10.1) package with R version > 2.5.1, and would like to retrieve all 3' UTR sequences in the Ensembl 46, > NCBI 36 (current versions) for the homo sapiens organism. I have used the > Vignette/documentation and tried: > > >> ensmart=useMart("ensembl", dataset="hsapiens_gene_ensembl") >> > Checking attributes and filters ... ok > >> getSequence(type="ensembl", seqType="3utr", mart=ensmart) >> > Error in getSequence(type = "ensembl", seqType = "3utr", mart = ensmart) : > Invalid type argument. Use the listFilters function to select a > valid type argument. > > although both arguments to "type" and "seqType" seem to be correct, it > looks like there is something wrong with the input. > > I also tried > > >> getSequence(type="entrezgene", seqType="3utr", mart=ensmart) >> > > >> utr3_1 = getSequence(chromosome=1, start=1, end=10000000, >> > type="entrezgene", seqType="3utr", mart=ensmart) > >> show(utr3_1) >> > NULL > > When I try the advanced option for retrieval, the command takes a long time > to return, and then returns the following: > >> getBM(attributes=c("ensembl_gene_id","external_gene_id","3utr"), >> > mart=ensmart) > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, > : > line 21 did not have 3 elements > > I must be missing something... Any suggestions how I could get all 3' UTR > sequences for the human organism using this package? > > Thank you in advance, > > U.C. > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >
ADD COMMENTlink written 10.1 years ago by Steffen500
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 114 users visited in the last hour