Hi all,
I'm studying Agilent Human Gene Expression one color microarray using Limma. It results significant probes with the topTable function but then I didn't succed to do the conversion of my identifiers with "ids2indices" function? nor later the gene set testing?
Please, find below my code, I'm new to Limma what am I doing wrong??
Thanks,
Best,
Amal
##
>require("TxDb.Hsapiens.UCSC.hg19.knownGene")
>humangenes<-genes(TxDb.Hsapiens.UCSC.hg19.knownGene)
or
>download.file("http://bioinf.wehi.edu.au/software/MSigDB/human_c2_v5p2.rdata","human_c2_v5p2.rdata")
>load("human_c2_v5p2.rdata")
>summary(Hs.c2)
>SystematicName.allgenes <- myresults$genes$SystematicName
#Agilent SystematicName look like:
> SystematicName.allgenes[(1:10)]
[1] "NM_004333" "NM_015987" "NM_024604" "NM_178829"
[5] "ENST00000429990" "NM_001040196" "NM_030961" "NM_033081"
[9] "NM_020188" "NM_012318"
>indices.genes <- ids2indices(humangenes$gene_id, SystematicName.allgenes,remove.empty=TRUE)
>indices.genes #named list()
>length(indices.genes) # it results: [1] 0
or
>genes.indices.Hs <- ids2indices(Hs.c2,SystematicName.allgenes)
>genes.indices.Hs #named list()
>length(genes.indices.Hs) # it results: [1] 0
## The same if I use the probe names as identifiers:
probenames.allgenes <- myresults$genes$ProbeName
#Agilent ProbeName look like:
> probenames.allgenes[(1:10)]
[1] "A_23_P42935" "A_23_P117082" "A_23_P2683" "A_23_P157316" "A_32_P14850"
[6] "A_23_P158596" "A_23_P350107" "A_23_P388190" "A_23_P106544" "A_23_P94998"
If you want to make a comment on my post, please use the ADD COMMENT button, rather than the 'Add your answer' box (which is intended for answers only).
Your question is whether or not this is a loss of information, and it isn't clear if it is. Not every sequence has an Entrez Gene ID, and there can be any number of reasons why not. There are lots of cDNA clones, hypothetical transcripts, etc that were put on the array as speculative content, and those things won't have an Entrez Gene ID until there is more evidence that they are real.
So one could argue that restricting to probes with linked Entrez Gene IDs increases the information by getting rid of noise probes that don't measure anything.
Hello James,
Thanks for your quick reply.
So afterwards, it will be interesting to focus only on probes mapped to Enter Gene IDs without seeking explanations of non-mapping for the others!? Thanks,
Best,
Oh, I don't know. It depends on how much work you want to put in. You could register for Agilent's eArray site, or more easily, just go to GEO and download the annotation table, which has the probe sequences. You could then get all the probe sequences that they don't have annotations for, and then blast against something (just human, human + mouse, nt?) to get the most likely alignment for that probe, and then add that back into your annotations.
Or you could take the sequences, convert to FASTA, and then use an aligner like subread, or bowtie, or whatever you prefer to align against GRCh38, then read the bam file into R, and use summarizeOverlaps to get the genes that each probe aligns to.
You could probably do other things as well. It just depends on how interested you are in seeking explanations for the non-mapping. In the end I imagine you won't gain many extra annotations by doing that, but you may gain experience in how one would do such a thing, which may be enough of a gain for you to do it anyway.
Great! Thanks James. Certainly that always there will be an interest!