Question: Annotating HTA 2.0 genes
0
gravatar for statistuta
4.5 years ago by
statistuta0
United States
statistuta0 wrote:

Hello..I have a list of .CEL files from HTA 2.0 chips. I created an ExpressionSet object by reading them with the R package "oligo". This object has data on 70523 features (genes) and 48 samples. What is best way to annotate these genes so that I can create a "GenomicRanges" object with the following attributes: Chromosome, Transcript_start, Transcript end ?

microarray • 4.4k views
ADD COMMENTlink modified 4.4 years ago • written 4.5 years ago by statistuta0
Answer: Annotating HTA 2.0 genes
0
gravatar for James W. MacDonald
4.5 years ago by
United States
James W. MacDonald49k wrote:
> library(pd.hta.2.0)

> load(paste0(path.package("pd.hta.2.0"), "/extdata/netaffxTranscript.rda"))

> annot <- pData(netaffxTranscript)

> annot <- annot[!annot$seqname %in% c("", NA, "NA"),]

> annot$strand[annot$strand == ""] <- "*"

> pd.hta.gr <- GRanges(annot$seqname, IRanges(annot$start, annot$stop), annot$strand)

EDIT: added names

> namespd.hta.gr) <- annot$transcriptclusterid
> pd.hta.gr
GRanges object with 67758 ranges and 0 metadata columns:
                                 seqnames           ranges strand
                                    <Rle>        <IRanges>  <Rle>
           TC01000001.hg.1           chr1 [ 11869,  14409]      +
           TC01000002.hg.1           chr1 [ 29554,  31109]      +
           TC01000003.hg.1           chr1 [ 69091,  70008]      +
           TC01000004.hg.1           chr1 [160446, 161525]      +
           TC01000005.hg.1           chr1 [317811, 328581]      +
                       ...            ...              ...    ...
  TCUn_gl000212000002.hg.0 chrUn_gl000212 [ 65184,  66597]      +
  TCUn_gl000219000001.hg.0 chrUn_gl000219 [ 45397,  57412]      +
  TCUn_gl000222000004.hg.0 chrUn_gl000222 [122791, 122814]      -
  TCUn_gl000223000002.hg.0 chrUn_gl000223 [   351,    377]      -
  TCUn_gl000229000001.hg.0 chrUn_gl000229 [ 16405,  16424]      +
  -------
  seqinfo: 60 sequences from an unspecified genome; no seqlengths

 

ADD COMMENTlink modified 4.5 years ago • written 4.5 years ago by James W. MacDonald49k
Answer: Annotating HTA 2.0 genes
0
gravatar for statistuta
4.4 years ago by
statistuta0
United States
statistuta0 wrote:

Sincere thanks for your reply!…But this gives the annotation for 67758 transcripts, while I have the expression data on 70523 transcripts. How can I get the annotation for the remaining transcripts?

ADD COMMENTlink written 4.4 years ago by statistuta0

The short answer is that you can't. If you re-read the code in my previous answer, you will note this line:

> annot <- annot[!annot$seqname %in% c("", NA, "NA"),]

which removes all the data where the seqname (chromosome) is missing. It turns out there are 2995 probesets for which there is no chromosome in the annotation file. If you don't have a chromosome, how will you add it to the GRanges object? If we look a bit closer at these probesets, they don't have start/end information either:

> annot.noseqname <- annot[annot$seqname == "", 1:6]
> head(annot.noseqname)
           transcriptclusterid probesetid seqname strand start stop
2824546_st          2824546_st 2824546_st                    0    0
2824549_st          2824549_st 2824549_st                    0    0
2824551_st          2824551_st 2824551_st                    0    0
2824554_st          2824554_st 2824554_st                    0    0
2827992_st          2827992_st 2827992_st                    0    0
2827995_st          2827995_st 2827995_st                    0    0
> table(annot.noseqname$start)
   0
2995
> table(annot.noseqname$stop)
   0
2995

And if we go to netaffx to see what these probesets are supposed to measure, we either return no results, or we get redirected to some random page. Try querying for 2824546_st at netaffx, or better yet, one of these PSR probesets:

> tail(annot.noseqname)
transcriptclusterid                probesetid
PSR6_ssto_hap7002446.hg.1 PSR6_ssto_hap7002446.hg.1
PSR6_ssto_hap7002447.hg.1 PSR6_ssto_hap7002447.hg.1
PSR6_ssto_hap7002454.hg.1 PSR6_ssto_hap7002454.hg.1
PSR6_ssto_hap7002455.hg.1 PSR6_ssto_hap7002455.hg.1
PSR6_ssto_hap7002456.hg.1 PSR6_ssto_hap7002456.hg.1
PSR6_ssto_hap7002459.hg.1 PSR6_ssto_hap7002459.hg.1
                          seqname strand start stop
PSR6_ssto_hap7002446.hg.1                    0    0
PSR6_ssto_hap7002447.hg.1                    0    0
PSR6_ssto_hap7002454.hg.1                    0    0
PSR6_ssto_hap7002455.hg.1                    0    0
PSR6_ssto_hap7002456.hg.1                    0    0
PSR6_ssto_hap7002459.hg.1                    0    0

So long story short, you cannot get the position information for all of the probesets, because some of them have no such information.

 

ADD REPLYlink written 4.4 years ago by James W. MacDonald49k

The 2995 probesets in question are various types of controls, and are not considered main content for gene level analysis.

ADD REPLYlink written 4.0 years ago by Jason Hubbard20
Answer: Annotating HTA 2.0 genes
0
gravatar for statistuta
4.4 years ago by
statistuta0
United States
statistuta0 wrote:

Thank you very much!

ADD COMMENTlink written 4.4 years ago by statistuta0

Hello James,

I have a similar situation. I have read all my 30 CEL file into R and Normalized them using Oligo package. How can I use pd.hta.2.0 package to annotate my data ? I have been searching for codes. From what I have seen some people have tried to build their on library but I am not sure if is that a necessary step to annotate my data?

 

thanks in advance.

Best,

Hoss

ADD REPLYlink written 2.1 years ago by Seymoo0
library(affycoretools)

eset <- annotateEset(eset, pd.hta.2.0)

OR

eset <- annotateEset(eset, hta20transcriptcluster.db)

If you want to use the annotation db package. See ?annotateEset for more information.

ADD REPLYlink written 2.1 years ago by James W. MacDonald49k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 259 users visited in the last hour