Question: Gene set Testing with Limma for Agilent IDs probes?
0
2.4 years ago by
amalbouzid.ing0 wrote:

Hi all,

I'm studying Agilent Human Gene Expression one color microarray using Limma. It results significant probes with the topTable function but then I didn't succed to do the conversion of my identifiers with "ids2indices" function? nor later the gene set testing?

Please, find below my code, I'm new to Limma what am I doing wrong??
Thanks,

Best,

Amal

##

>require("TxDb.Hsapiens.UCSC.hg19.knownGene")

>humangenes<-genes(TxDb.Hsapiens.UCSC.hg19.knownGene)

or

>summary(Hs.c2)

>SystematicName.allgenes <- myresults$genes$SystematicName

#Agilent SystematicName look like:

> SystematicName.allgenes[(1:10)]
[1] "NM_004333"       "NM_015987"       "NM_024604"       "NM_178829"
[5] "ENST00000429990" "NM_001040196"    "NM_030961"       "NM_033081"
[9] "NM_020188"       "NM_012318"

>indices.genes <- ids2indices(humangenes$gene_id, SystematicName.allgenes,remove.empty=TRUE) >indices.genes #named list() >length(indices.genes) # it results: [1] 0 or >genes.indices.Hs <- ids2indices(Hs.c2,SystematicName.allgenes) >genes.indices.Hs #named list() >length(genes.indices.Hs) # it results: [1] 0 ## The same if I use the probe names as identifiers: probenames.allgenes <- myresults$genes$ProbeName #Agilent ProbeName look like: > probenames.allgenes[(1:10)] [1] "A_23_P42935" "A_23_P117082" "A_23_P2683" "A_23_P157316" "A_32_P14850" [6] "A_23_P158596" "A_23_P350107" "A_23_P388190" "A_23_P106544" "A_23_P94998" ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by amalbouzid.ing0 Answer: Gene set Testing with Limma for Agilent IDs probes? 2 2.4 years ago by United States James W. MacDonald50k wrote: Running summary on a list is probably not that great an idea, as you doubtless found out. Instead, it's preferable to first find the class of an object and then query from there. > class(Hs.c2) [1] "list" > length(Hs.c2) [1] 4729 > class(Hs.c2[[1]]) [1] "character" > head(Hs.c2[[1]]) [1] "55902" "2645" "5232" "5230" "5162" "5160" So Hs.c2 is a list of numbers in character format. That's not super useful in and of itself, but you could go to the site where you downloaded these data and see that these are character vectors of Entrez Gene IDs. The help page says the second argument should be a 'character vector of gene identifiers' which you should interpret as 'a character vector of gene identifiers of the same type as in the gene set list you downloaded'. So you need the Entrez Gene IDs for each of your Agilent probes, in character format. This is where things get tricky, because you just have this stupid SystematicName column in the genes data.frame that is pretty useless. The array you are using is the Agilent-026652, if I am not mistaken, so you need the HsAgilentDesign026652.db package (which you can infer by the fact that, um, 026652 is the same for both the array and the annotation package, so you have to do a bit of Sherlock Holmes for that step). You can now annotate your data, assuming that your normalized data are stored in an EList called 'myresults' library(BiocInstaller) biocLite("Hs.AgilentDesign026652.db") library(Hs.AgilentDesign026652.db) ## add Entrez Gene IDs to your EList myresults$genes$ENTREZID <- mapIds(Hs.AgilentDesign026652.db, myresults$genes$ProbeName, "ENTREZID","PROBENAME") ## fit the model fit <- lmFit(myresults, design) fit2 <- contrasts.fit(fit, contrast) fit2 <- eBayes(fit2) tt <- topTable(fit2, 1) Now the topTable 'tt' has a column of Entrez Gene IDs. ind <- ids2indices(Hs.c2, as.character(tt$ENTREZID))

Answer: Gene set Testing with Limma for Agilent IDs probes?
0
2.4 years ago by
amalbouzid.ing0 wrote:

Dear James,

Absolutely, I'm using the Agilent-026652 arrays. I write to you a bit of my Sherlock Holmes for that step!

Considering that  "my probes" is a data farme of my resulted significant probes:

>require(HsAgilentDesign026652.db)
>mykeys = myprobes$ProbeName >mykeys = as.vector(mykeys) >myprobes$ENTREZID <- mapIds(HsAgilentDesign026652.db,  keys= mykeys , column = "ENTREZID", keytype = "PROBEID")
>probes_ZID <- ids2indices(Hs.c2, as.character(myprobes\$ENTREZID)) # results #named list()
##as an alternative
>myENTREZID= myprobes[,12]
>myind <- ids2indices(Hs.c2, myENTREZID)

##

But after that about 23% of my probes don't have an ENTREZID! It isn't a loss of information?

1

Your question is whether or not this is a loss of information, and it isn't clear if it is. Not every sequence has an Entrez Gene ID, and there can be any number of reasons why not. There are lots of cDNA clones, hypothetical transcripts, etc that were put on the array as speculative content, and those things won't have an Entrez Gene ID until there is more evidence that they are real.

So one could argue that restricting to probes with linked Entrez Gene IDs increases the information by getting rid of noise probes that don't measure anything.

Hello James,

So afterwards, it will be interesting to focus only on probes mapped to Enter Gene IDs without seeking explanations           of non-mapping for the others!? Thanks,

Best,

1

Oh, I don't know. It depends on how much work you want to put in. You could register for Agilent's eArray site, or more easily, just go to GEO and download the annotation table, which has the probe sequences. You could then get all the probe sequences that they don't have annotations for, and then blast against something (just human, human + mouse, nt?) to get the most likely alignment for that probe, and then add that back into your annotations.

Or you could take the sequences, convert to FASTA, and then use an aligner like subread, or bowtie, or whatever you prefer to align against GRCh38, then read the bam file into R, and use summarizeOverlaps to get the genes that each probe aligns to.

You could probably do other things as well. It just depends on how interested you are in seeking explanations for the non-mapping. In the end I imagine you won't gain many extra annotations by doing that, but you may gain experience in how one would do such a thing, which may be enough of a gain for you to do it anyway.