Entering edit mode
David
▴
860
@david-3335
Last seen 6.7 years ago
Hi,
I open this new discussion so not to confuse with the previous one.
The objective here is to look for overrepresented GoTerms from
microRNA
targets. One microRNA can have several targets (genes) and one single
gene can be targeted by several microRNAs. The assumption is to check
for a specific microRNAs which GoTerms are overrepresented.
Ok so let's say me my microRNA of interest is mir-A.
Step1: based on my favorite prediction algorithm i have managed to get
a
list of genes targeted by mir-A. The genes are ensembl transcripts and
as i said before miR-A can target several times the same transcript
(at
different location) so i need to account for this.
miR-A targets ->
ENST001,ENST001,ENST001,ENST0025,ENST089,ENST099,ENST0099......) up to
300 different transcripts.
I use biomart to get the corresponding GoIds for these transcripts
....
#Select mart database
mart <- useMart("ensembl", dataset="hsapiens_gene_ensembl")
#Get go for a specific transcript
# First problem as Biomart will not return twice GoTerms for
duplicated
transcripts. The example below show that for transcript
c("ENST00000347770","ENST00000347770") i get the same goTerms than for
transcript c("ENST00000347770").
# As i said before a microRNA can target several times the same
microRNA
so twice the number of goterms associated to this particular microRNA.
Can we force biomart to return redundant GoTerms ????
gomir = getBM(attributes=c(
'go_biological_process_id',
'go_biological_process_linkage_type',
'go_cellular_component_linkage_type',
'go_cellular_component_id',
'go_molecular_function_id',
'go_molecular_function_id')
,filters="ensembl_transcript_id",
values=c("ENST00000347770","ENST00000347770"......), mart=mart)
.... i will complete the rest of the pipiline with GoStats if i get
clean on that first.