genes in region of miRNA genes
0
0
Entering edit mode
@iain-gallagher-2532
Last seen 8.7 years ago
United Kingdom
Hi list I some data on the chromosome, start and end points of some microRNAs of interest: miR chromosome start end hsa-mir-572 17 10979549 10979643 hsa-mir-583 18 95440598 95440672 hsa-mir-587 19 107338693 107338788 hsa-mir-598 21 10930126 10930222 hsa-mir-599 21 100618040 100618134 hsa-mir-210 3 558089 558198 hsa-mir-141 4 6943521 6943615 hsa-mir-492 4 93752305 93752420 hsa-mir-639 11 14501355 14501452 hsa-mir-663 13 26136822 26136914 hsa-mir-503 24 133508024 133508094 I was hoping to use biomaRt to extract information for genes upstream and downstream of these miRNAs (see script below). I have created a list in the correct form for a multi filter query using biomaRt but the following query only retrieves data for chromosome 17. I gather that looping over data is discouraged for biomaRt (presumably to prevent overloading servers) and I was wondering if there was a better way of doing this. In the following script the allMirs table is the result of: allMirs <- "ftp://ftp.sanger.ac.uk/pub/mirbase/sequences/CURRENT/genomes/hsa.gff" allMirs<-read.table(allMirs) Although I did massages the data outside R to remove some extraneous columns (mainly those full of full stops) and add column names. The 'miRsUpInFlu.txt' table is that above. #get miR chromosome corrds from biomaRt rm(list=ls()) library(biomaRt) #read in list of miRs mirs<-read.table('miRsUpInFlu.txt', header=T, sep='\t') mirs<-sub('R', 'r', as.character(mirs[,1])) #correct miR labels allMirs<-read.table('miRbaseJune2009.txt', header=T, sep='\t') mirRow<-which(as.character(allMirs$id) %in% mirs) mirsData<-allMirs[mirRow,] #minor miRs are missing (eg * etc etc) mirRow<-cbind(as.character(mirsData$id), mirsData[,2], mirsData[,4], mirsData[,5]) #now we have a dataframe containing the miR id, start and stop #we have to extend the start and stop sites by 500000 #then retrieve genes in these regions starts<-as.numeric(mirRow[,3]) stops<-as.numeric(mirRow[,4]) limitStarts<-starts-500000#going 5' limitStops<-stops+500000#going 3' #this creates a dataframe in the form we need for list conversion vals<-rbind(mirRow[,2], limitStarts, limitStops) #the list conversion is required for the biomaRt query because we are using more than one filter vals<-as.list(vals) #generate query db<-useMart('ensembl', dataset='hsapiens_gene_ensembl') query<-getBM(c('hgnc_symbol', 'ensembl_transcript_id', 'chromosome_name', 'external_gene_id'), filters=c('chromosome_name', 'start', 'end'), values=vals, mart=db) Any help would be appreciated. Thanks Iain R version 2.9.0 (2009-04-17) x86_64-pc-linux-gnu locale: LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB .UTF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_N AME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTI FICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] biomaRt_2.0.0 loaded via a namespace (and not attached): [1] RCurl_0.94-1 XML_2.3-0
biomaRt biomaRt • 803 views
ADD COMMENT

Login before adding your answer.

Traffic: 864 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6