All 3' UTRs for an organism - biomaRt package
1
0
Entering edit mode
Urska Cvek ▴ 20
@urska-cvek-2094
Last seen 9.6 years ago
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/bioconductor/attachments/20071001/ 18f98078/attachment.pl
• 1.3k views
ADD COMMENT
0
Entering edit mode
Steffen ▴ 500
@steffen-2351
Last seen 9.6 years ago
Dear Urska, Your errors show that the getSequence function needed extra argument checking to help the user in understanding what is going wrong. This has been added to the new biomaRt version which will be included in the new BioConductor release later this week. Firstly, getSequence always needs the type and seqType arguments. The type argument specifies the type of identifiers that you use as input and also specifies the type of identifiers that will be present in the output so you know to which genes the retrieved sequences belong. Secondly, getSequence uses two ways to specify the sequences you are interested in: a) you give a list of identifiers and getSequence will retrieve the sequences for the corresponding genes (here you'll use the id argument) b) you give a chromosomal location and getSequence will retrieve the sequences of all genes that are located between the given positions. (here you'll use the chromosome, start and end arguments) An additional problem that you encounter is that the data volume you want to retrieve is so large (as you query for all sequences of all genes) that this possibly resulted in a time-out. Here's how I would retrieve the sequences: First retrieve all ensembl gene identifiers in the hsapiens dataset by: ensmart=useMart("ensembl", dataset="hsapiens_gene_ensembl") ensemblid = getBM("ensembl_gene_id", mart=ensmart) This should give 31189 unique ensembl gene identifiers. To retrieve the 3UTRs for all these identifiers you could now retrieve them in batches of 5000 at once e.g.: utr3Batch1 = getSequence(id = ensemblid[1:5000,1], type="ensembl_gene_id", seqType="3utr", mart=ensmart) utr3Batch2 = getSequence(id = ensemblid[5001:10000,1], type="ensembl_gene_id", seqType="3utr", mart=ensmart) etc. Note that you'll have to do some cleaning as genes might have multiple 3'UTRs associated with them for alternatively spliced transcripts and others might have no annotated 3'UTR. Cheers, Steffen Urska Cvek wrote: > I am using the latest version of biomaRt (1.10.1) package with R version > 2.5.1, and would like to retrieve all 3' UTR sequences in the Ensembl 46, > NCBI 36 (current versions) for the homo sapiens organism. I have used the > Vignette/documentation and tried: > > >> ensmart=useMart("ensembl", dataset="hsapiens_gene_ensembl") >> > Checking attributes and filters ... ok > >> getSequence(type="ensembl", seqType="3utr", mart=ensmart) >> > Error in getSequence(type = "ensembl", seqType = "3utr", mart = ensmart) : > Invalid type argument. Use the listFilters function to select a > valid type argument. > > although both arguments to "type" and "seqType" seem to be correct, it > looks like there is something wrong with the input. > > I also tried > > >> getSequence(type="entrezgene", seqType="3utr", mart=ensmart) >> > > >> utr3_1 = getSequence(chromosome=1, start=1, end=10000000, >> > type="entrezgene", seqType="3utr", mart=ensmart) > >> show(utr3_1) >> > NULL > > When I try the advanced option for retrieval, the command takes a long time > to return, and then returns the following: > >> getBM(attributes=c("ensembl_gene_id","external_gene_id","3utr"), >> > mart=ensmart) > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, > : > line 21 did not have 3 elements > > I must be missing something... Any suggestions how I could get all 3' UTR > sequences for the human organism using this package? > > Thank you in advance, > > U.C. > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > >
ADD COMMENT

Login before adding your answer.

Traffic: 693 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6