Hi all,
I'm having a bit of problem with the TxDb.Hsapiens.UCSC.hg19.knownGene, most of the Entrez identifiers I used are fine when I use the function: transcriptsBy (TxDb.Hsapiens.UCSC.hg19.knownGene, by = "gene")
For example: (example that works)
------------------------------------------------------------------------------------------------------------------
MYOT <-'9499'
transcriptCoordsByGene.GRangesList.MYOT <-
transcriptsBy (TxDb.Hsapiens.UCSC.hg19.knownGene, by = "gene") [MYOT]
transcriptCoordsByGene.GRangesList.MYOT
#GRangesList object of length 1:
#$9499
#GRanges object with 4 ranges and 2 metadata columns:
#seqnames ranges strand | tx_id tx_name
#<Rle> <IRanges> <Rle> | <integer> <character>
#[1] chr5 [137022410, 137223540] + | 21288 uc011cye.2
#[2] chr5 [137203545, 137223540] + | 21290 uc003lbv.3
#[3] chr5 [137203545, 137223540] + | 21291 uc011cyg.2
#[4] chr5 [137203545, 137223540] + | 21292 uc011cyh.2
---------------------------------------------------------------------------------------------------
However, for some other genes such as 201625 which is Entrez code for DNAH12 gene in human (I used library(org.Hs.eg.db) and checked with NCBI) I start to get:
Error: subscript contains out-of-bounds indices
Could you please tell me how I can solve this problem?
or is there any other packages I can use to extract these genes' data?
I need the data here so that I can analyse the motif using the rGADEM package
I am a medical student and extremely new to R and bioconductor
my entire set of genes (Entrez identifiers) which I need to analyse are
[1] 1002 10233 114798 122481 126792 126820 128344 130827 1428 146845 1493 150483 150572
[14] 159989 183 1852 201625 2167 22824 22885 23676 254956 255101 257177 25992 26576
[27] 266629 283152 283726 285141 29895 3067 340286 340706 3860 387712 389125 389177 4617
[40] 4621 4625 51364 5144 51778 5212 54585 55815 56203 56849 56901 57494 6345
[53] 64102 64446 644890 6588 7042 7060 7138 7273 7322 7337 796 79933 8048
[66] 8091 8125 83450 83657 83894 88 89765 9172 9499
Like I said, some of these work perfectly, others don't.
Any help would be appreciated.
Thank you.
Ah that makes sense... thank you very much
I'll try that now.
I checked all my genes codes - only 1 mismatch though which I have already removed and it still didn't work.
all other codes does exist in that human genome file.
However, it won't work if I use any code with "value" higher than 23459
i.e. any code below that would work e.g. 23459 23458 1223 etc would work
but 23460 and above would not work e.g. 23460 23461 124234 would not work
Could you please tell me if there is any way I could correct this problem?
Thank you
That's most likely because you are subsetting your list using integer Entrez Gene IDs. But the names of your list aren't integers, they are character. In other words, if you do
you are saying 'give me the 124234th list item'. But there aren't 124,234 genes! There are like 22,000 or so. What you want to do is
which will give you the list item that has that Entrez Gene ID as its name.
oh wow!
I'm just stupid then I guess....
ok so I removed a single code that wasn't valid and then use as.character to convert everything in my vector as character and it's now up and running!!
Thank you so much!!