Question

genes in region of miRNA genes

0

Entering edit mode

Iain Gallagher ▴ 930

@iain-gallagher-2532

Last seen 8.7 years ago

United Kingdom

Hi list I some data on the chromosome, start and end points of some microRNAs of interest: miR chromosome start end hsa-mir-572 17 10979549 10979643 hsa-mir-583 18 95440598 95440672 hsa-mir-587 19 107338693 107338788 hsa-mir-598 21 10930126 10930222 hsa-mir-599 21 100618040 100618134 hsa-mir-210 3 558089 558198 hsa-mir-141 4 6943521 6943615 hsa-mir-492 4 93752305 93752420 hsa-mir-639 11 14501355 14501452 hsa-mir-663 13 26136822 26136914 hsa-mir-503 24 133508024 133508094 I was hoping to use biomaRt to extract information for genes upstream and downstream of these miRNAs (see script below). I have created a list in the correct form for a multi filter query using biomaRt but the following query only retrieves data for chromosome 17. I gather that looping over data is discouraged for biomaRt (presumably to prevent overloading servers) and I was wondering if there was a better way of doing this. In the following script the allMirs table is the result of: allMirs <- "ftp://ftp.sanger.ac.uk/pub/mirbase/sequences/CURRENT/genomes/hsa.gff" allMirs<-read.table(allMirs) Although I did massages the data outside R to remove some extraneous columns (mainly those full of full stops) and add column names. The 'miRsUpInFlu.txt' table is that above. #get miR chromosome corrds from biomaRt rm(list=ls()) library(biomaRt) #read in list of miRs mirs<-read.table('miRsUpInFlu.txt', header=T, sep='\t') mirs<-sub('R', 'r', as.character(mirs[,1])) #correct miR labels allMirs<-read.table('miRbaseJune2009.txt', header=T, sep='\t') mirRow<-which(as.character(allMirs$id) %in% mirs) mirsData<-allMirs[mirRow,] #minor miRs are missing (eg * etc etc) mirRow<-cbind(as.character(mirsData$id), mirsData[,2], mirsData[,4], mirsData[,5]) #now we have a dataframe containing the miR id, start and stop #we have to extend the start and stop sites by 500000 #then retrieve genes in these regions starts<-as.numeric(mirRow[,3]) stops<-as.numeric(mirRow[,4]) limitStarts<-starts-500000#going 5' limitStops<-stops+500000#going 3' #this creates a dataframe in the form we need for list conversion vals<-rbind(mirRow[,2], limitStarts, limitStops) #the list conversion is required for the biomaRt query because we are using more than one filter vals<-as.list(vals) #generate query db<-useMart('ensembl', dataset='hsapiens_gene_ensembl') query<-getBM(c('hgnc_symbol', 'ensembl_transcript_id', 'chromosome_name', 'external_gene_id'), filters=c('chromosome_name', 'start', 'end'), values=vals, mart=db) Any help would be appreciated. Thanks Iain R version 2.9.0 (2009-04-17) x86_64-pc-linux-gnu locale: LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB .UTF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_N AME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTI FICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] biomaRt_2.0.0 loaded via a namespace (and not attached): [1] RCurl_0.94-1 XML_2.3-0

biomaRt biomaRt • 803 views

ADD COMMENT • link 14.9 years ago Iain Gallagher ▴ 930