Question

Looking for annotation file for Affymetrix SNP 6.0

1

Entering edit mode

KB ▴ 50

@k-8495

Last seen 2.5 years ago

United States

Hello,

I have copy number data from Affymetrix SNP 6.0. I am looking to annotate the probe ids with chromosome number and location of probe.

Example of output I am looking for:

chr location CN_473981 1 52784 CN_473982 1 52801 CN_497980 1 62568 CN_497981 1 62640 CN_513369 1 72034 CN_513370 1 72179

From what I can understand, pd.genomewidesnp.6 is the package that has annotation information for this chip.

But there is no example provided as to how to use this package. Could someone point me in the right direction ? Is there another package that can do this ?

If I don't find a solution for this, I will have to download this annotation file from the Affymetrix web site, and import that into R and use (I know it is not an optimal solution, but I will have no other choice)

Thank you,

copynumber snp affymetrix • 4.8k views

ADD COMMENT • link 9.6 years ago KB ▴ 50

0

Entering edit mode

KB ▴ 50

@k-8495

Last seen 2.5 years ago

United States

I have a follow up question regarding this. When I used the query mentioned here, I see that the reference genome is hg19.

Is it possible to get the probe information from Affymetrix SNP 6.0 from this pd.genomewidesnp.6 package for hg18 reference genome ?

Thank you

ADD COMMENT • link 9.6 years ago KB ▴ 50

0

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 6 days ago

United States

You should probably coerce to a GRanges first, then use liftOver(). There is a bit of extra coercion you have to do first.

> library(pd.genomewidesnp.6)
> library(GenomicRanges)
> library(rtracklayer)
> con <- db(pd.genomewidesnp.6)
> snp <- dbGetQuery(con, "select man_fsetid, dbsnp_rs_id, chrom, physical_pos, strand from featureSet;")
## make chromosomes the same as a chain file
> snp$chrom <- paste0("chr", snp$chrom)
## subset out SNPs with missing location
> snp <- snp[!(is.na(snp$chrom) | is.na(snp$physical_pos)),]
## convert strand info to +,-,*
> snp$strand[is.na(snp$strand)] <- 2
> snp$strand <- snp$strand + 1
> snp$strand <- c("+", "-", "*")[snp$strand]
> head(snp)
     man_fsetid dbsnp_rs_id chrom physical_pos strand
1 SNP_A-2131660   rs2887286  chr1      1156131      +
2 SNP_A-1967418   rs1496555  chr1      2234251      +
3 SNP_A-1969580  rs41477744  chr1      2329564      +
4 SNP_A-4263484   rs3890745  chr1      2553624      +
5 SNP_A-1978185  rs10492936  chr1      2936870      -
6 SNP_A-4264431  rs10489588  chr1      2951834      -
## convert to GRanges
> snpgr <- GRanges(snp[,3], IRanges(snp[,4],snp[,4]), snp$strand, ID = snp[,1], dbsnp = snp[,2])
## now use liftOver from rtracklayer, using the hg19ToHg18.over.chain from UCSC
> download.file("http://hgdownload.cse.ucsc.edu/gbdb/hg19/liftOver/hg19ToHg18.over.chain.gz", "hg19ToHg18.over.chain.gz")
trying URL 'http://hgdownload.cse.ucsc.edu/gbdb/hg19/liftOver/hg19ToHg18.over.chain.gz'
Content type 'application/x-gzip' length 225973 bytes (220 KB)
==================================================
downloaded 220 KB

## have to decompress
> system("gzip -d hg19ToHg18.over.chain.gz")
> chain <- import.chain("hg19ToHg18.over.chain")
> snpgr.18 <- unlist(liftOver(snpgr, chain))
Discarding unchained sequences: chrMT
> snpgr
GRanges object with 905422 ranges and 2 metadata columns:
           seqnames             ranges strand   |            ID       dbsnp
              <Rle>          <IRanges>  <Rle>   |   <character> <character>
       [1]     chr1 [1156131, 1156131]      +   | SNP_A-2131660   rs2887286
       [2]     chr1 [2234251, 2234251]      +   | SNP_A-1967418   rs1496555
       [3]     chr1 [2329564, 2329564]      +   | SNP_A-1969580  rs41477744
       [4]     chr1 [2553624, 2553624]      +   | SNP_A-4263484   rs3890745
       [5]     chr1 [2936870, 2936870]      -   | SNP_A-1978185  rs10492936
       ...      ...                ...    ... ...           ...         ...
  [905418]     chrX [2462586, 2462586]      -   | SNP_A-8573804   rs5982546
  [905419]     chrX [2633624, 2633624]      -   | SNP_A-8573845   rs6567640
  [905420]     chrX [2589261, 2589261]      -   | SNP_A-8573915   rs7892606
  [905421]     chrX [2692710, 2692710]      +   | SNP_A-8573964   rs2259750
  [905422]     chrX [1812233, 1812233]      +   | SNP_A-8574011   rs7058531
  -------
  seqinfo: 25 sequences from an unspecified genome; no seqlengths
> snpgr.18
GRanges object with 905198 ranges and 2 metadata columns:
           seqnames             ranges strand   |            ID       dbsnp
              <Rle>          <IRanges>  <Rle>   |   <character> <character>
       [1]     chr1 [1145994, 1145994]      +   | SNP_A-2131660   rs2887286
       [2]     chr1 [2224111, 2224111]      +   | SNP_A-1967418   rs1496555
       [3]     chr1 [2319424, 2319424]      +   | SNP_A-1969580  rs41477744
       [4]     chr1 [2543484, 2543484]      +   | SNP_A-4263484   rs3890745
       [5]     chr1 [2926730, 2926730]      -   | SNP_A-1978185  rs10492936
       ...      ...                ...    ... ...           ...         ...
  [905194]     chrX [2472586, 2472586]      -   | SNP_A-8573804   rs5982546
  [905195]     chrX [2643624, 2643624]      -   | SNP_A-8573845   rs6567640
  [905196]     chrX [2599261, 2599261]      -   | SNP_A-8573915   rs7892606
  [905197]     chrX [2702710, 2702710]      +   | SNP_A-8573964   rs2259750
  [905198]     chrX [1772233, 1772233]      +   | SNP_A-8574011   rs7058531
  -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths
>

Et voila!

ADD COMMENT • link 9.6 years ago James W. MacDonald 68k

0

Entering edit mode

KB ▴ 50

@k-8495

Last seen 2.5 years ago

United States

Thank you !

I assume the same can be done for copy number probes as well ? (I need both the copy number and SNP probes in hg18 genome format)

ADD COMMENT • link 9.6 years ago KB ▴ 50

0

Entering edit mode

Yes, but the copy number probes have a start and an end, so when you generate the GRanges object, the call to IRanges() should include the start and end, rather than what I did (the start and end position are the same for a SNP, under the counting rules being used in Bioconductor, so I used the same position for the start and end).

ADD REPLY • link 9.6 years ago James W. MacDonald 68k

0

Entering edit mode

Got it. I will post code on this soon

> cnv <- dbGetQuery(con, "select man_fsetid, chrom, chrom_start, chrom_stop, strand from featureSetCNV;")
> head(cnv)
  man_fsetid chrom chrom_start chrom_stop strand
1  CN_477984     1       62140      62165      0
2  CN_473963     1       61723      61748      0
3  CN_473964     1       61796      61821      0
4  CN_473965     1       61811      61836      0
5  CN_497981     1       72764      72789      0

## make chromosomes the same as a chain file

## subset out SNPs with missing location  # not required for copy number probes

## convert strand info to +,-,*

## convert to GRanges

## now use liftOver from rtracklayer, using the hg19ToHg18.over.chain from UCSC

ADD REPLY • link 9.6 years ago KB ▴ 50

score 2 · Accepted Answer · 2015-11-25

2

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 6 days ago

United States

There are two different kinds of probes on that array, the SNP and CNV probes. The ones you show are CNV, but maybe you need both?

Anyway, you can get those data by querying the underlying SQLite database that is part of the pd.genomewidesnp.6 package.

> con <- db(pd.genomewidesnp.6)
> cnv <- dbGetQuery(con, "select man_fsetid, chrom, chrom_start, chrom_stop, strand from featureSetCNV;")
> head(cnv)
  man_fsetid chrom chrom_start chrom_stop strand
1  CN_477984     1       62140      62165      0
2  CN_473963     1       61723      61748      0
3  CN_473964     1       61796      61821      0
4  CN_473965     1       61811      61836      0
5  CN_497981     1       72764      72789      0
6  CN_473981     1       62908      62933      0
> snp <- dbGetQuery(con, "select dbsnp_rs_id, chrom, physical_pos, strand from featureSet;")
> head(snp)
  dbsnp_rs_id chrom physical_pos strand
1   rs2887286     1      1156131      0
2   rs1496555     1      2234251      0
3  rs41477744     1      2329564      0
4   rs3890745     1      2553624      0
5  rs10492936     1      2936870      1
6  rs10489588     1      2951834      1

ADD COMMENT • link 9.6 years ago James W. MacDonald 68k

2

Entering edit mode

Or maybe more useful

> snp <- dbGetQuery(con, "select man_fsetid, dbsnp_rs_id, chrom, physical_pos, strand from featureSet;")
> head(snp)
     man_fsetid dbsnp_rs_id chrom physical_pos strand
1 SNP_A-2131660   rs2887286     1      1156131      0
2 SNP_A-1967418   rs1496555     1      2234251      0
3 SNP_A-1969580  rs41477744     1      2329564      0
4 SNP_A-4263484   rs3890745     1      2553624      0
5 SNP_A-1978185  rs10492936     1      2936870      1
6 SNP_A-4264431  rs10489588     1      2951834      1
>

ADD REPLY • link 9.6 years ago James W. MacDonald 68k

0

Entering edit mode

Excellent ! Thank you. Both answers would work for me.

I wish this example was there as a vignette for this package. Thanks again !

ADD REPLY • link 9.6 years ago KB ▴ 50

0

Entering edit mode

Excellent ! Thank you. Both answers would work for me.

I wish this example was there as a vignette for this package. Thanks again !

ADD REPLY • link 9.6 years ago KB ▴ 50