Entering edit mode
jiayu wen
▴
10
@jiayu-wen-3651
Last seen 10.2 years ago
Dear list,
About over a year ago, I extracted 3'UTR sequences for about 7000
genes using Biomart for my project. This is the command that I used:
(my gene_list is in gene symbol)
>my_mart = useMart("ensembl",dataset="hsapiens_gene_ensembl")
>seq_3utr = getSequence(id = unique(gene.symbol),
type="hgnc_symbol",seqType="3utr",mart = my_mart)
>seq_3utr = seq_3utr[seq_3utr[,"3utr"] != "Sequence unavailable",]
>here: extract longest 3'UTR for each unique gene symbol
>exportFASTA(seq_3utr, file=paste("s3utr.fa",sep=""))
As my project goes, I now need 3'UTR genomic coordinates to get
phastcons conservation for some regions in 3'UTR.
To do that, I first convert hgnc_symbol back to ensembl_gene_id, then
get 3'UTR coordinates using getBM like this:
>s3utr = read.DNAStringSet(paste("s3utr.fa",sep=""),format="fasta")
>gene_names = names(s3utr)
>hgnc2ensembl = getBM(attributes=c("hgnc_symbol","ensembl_gene_id"),
filters="hgnc_symbol", values=gene_names, mart=my_mart)
>s3utr_pos = getBM(attributes=c("ensembl_gene_id",
"chromosome_name","strand","3_utr_start", "3_utr_end"),
filters="ensembl_gene_id",
values=as.character(hgnc2ensembl
$ensembl_gene_id), mart=my_mart)
>s3utr_pos = s3utr_pos[complete.cases(s3utr_pos),]
By doing that, now I can only get about 5000 gene symbols with 3'UTR
coordinates (converting from hgnc_symbol back to ensembl_gene_id
looses about 250 genes). I was thinking it might be version
difference? So I tried to use ensembl archive but it gives me error as
below:
> my_mart =
useMart("ensembl_mart_50",dataset="hsapiens_gene_ensembl",archive=T)
Error in value[[3L]](cond) :
Request to BioMart web service failed. Verify if you are still
connected to the internet. Alternatively the BioMart web service is
temporarily down.
In addition: Warning message:
In file(file, "r") : cannot open: HTTP status was '404 Not Found'
Is there anyway that I can get 3'UTR coordinates for all my gene list?
Thanks for any help.
Jean
[[alternative HTML version deleted]]