Error in importing signals with createAffyIntensityFile (GWASTools)
1
0
Entering edit mode
@vinicius-henrique-da-silva-6713
Last seen 19 months ago
Brazil

Hello, I am using the GWASTools package and I am facing an error to import my signal file. I tried to mimetize my real data set in the follow example:

library(GWASTools)
snp.anno <-   'snpID chromosome position      snpName
 AX-100676796          1   501997 AX-100676796
 AX-100120875          1   503822 AX-100120875
 AX-100067350          1   504790 AX-100067350'

snp.anno <- read.table(text=snp.anno, header=T)

 signals <-  'probeset_id    sample1.CEL  sample1.CEL   sample1.CEL
 AX-100676796-A   2126.7557   1184.8638  1134.2687
 AX-100676796-B   427.1864  2013.8512   1495.0654
 AX-100120875-A   1775.5816 2013.8512  651.1691
 AX-100120875-B    335.9226  2013.8512  1094.7429
 AX-100067350-A   2365.7755  2695.0053  2758.1739
AX-100067350-B    2515.4818   2518.2818  28181.289 '
 p1summ <- read.table(text=signals, header=T)                          
 write.table(p1summ, "del.txt", sep="\t", col.names=T, row.names=F, quote=F)
 p1summ <- createAffyIntensityFile("del.txt", snp.annotation=snp.anno)

Error: all(snp.annotation$snpID == sort(snp.annotation$snpID)) is not TRUE
In addition: Warning messages:
1: In .checkSnpAnnotation(snp.annotation) : coerced snpID to type integer
2: In .checkSnpAnnotation(snp.annotation) :
  coerced chromosome to type integer

I used the probe Names with 'A' and 'B' pattern also, the error was the same:

  snp.annoab <-   'snpID chromosome position      snpName
 AX-100676796-A          1   501997 AX-100676796-A
AX-100676796-B          1   501997 AX-100676796-B
AX-100120875-A          1   503822 AX-100120875-A
AX-100120875-B          1   503822 AX-100120875-B
 AX-100067350-A          1   504790 AX-100067350-A
AX-100067350-B          1   504790 AX-100067350-B'

snp.annoab <- read.table(text=snp.annoab, header=T)

p1summ <- createAffyIntensityFile("del.txt", snp.annotation=snp.annoab)

Error: all(snp.annotation$snpID == sort(snp.annotation$snpID)) is not TRUE
In addition: Warning messages:
1: In .checkSnpAnnotation(snp.annotation) : coerced snpID to type integer
2: In .checkSnpAnnotation(snp.annotation) :
  coerced chromosome to type integer

In my real dataset the error is slight different, but do not work anyway:

Error: length(snp.annotation$snpID) == length(unique(snp.annotation$snpID)) is not TRUE
In addition: Warning messages:
1: In .checkSnpAnnotation(snp.annotation) : NAs introduced by coercion
2: In .checkSnpAnnotation(snp.annotation) : coerced snpID to type integer
3: In .checkSnpAnnotation(snp.annotation) : NAs introduced by coercion
4: In .checkSnpAnnotation(snp.annotation) :
  coerced chromosome to type integer

And the strange thing is that:

> length(snp.annotation$snpID) == length(unique(snp.annotation$snpID))
[1] TRUE

Thus, seems that the error is not in agreement with the command (to check if  the length is the same). I am missing some important detail in the format of my inputs? I would be grateful for any help. Thank you!

gwastools software error • 1.2k views
ADD COMMENT
2
Entering edit mode
@stephanie-m-gogarten-5121
Last seen 6 months ago
University of Washington

GWASTools requires that snpID be an integer vector, so you need to define an integer snpID instead of using the probe name. The warning tells you that createAffyIntensityFile coerced your snpID column to integer, which would result in all NA values. Here is a modification of your example that works:

library(GWASTools)
snp.anno <-   'snpID chromosome position      snpName
  AX-100676796          1   501997 AX-100676796
  AX-100120875          1   503822 AX-100120875
  AX-100067350          1   504790 AX-100067350'
snp.anno <- read.table(text=snp.anno, header=T)
signals <-  'probeset_id    sample1.CEL  sample1.CEL   sample1.CEL
  AX-100676796-A   2126.7557   1184.8638  1134.2687
  AX-100676796-B   427.1864  2013.8512   1495.0654
  AX-100120875-A   1775.5816 2013.8512  651.1691
  AX-100120875-B    335.9226  2013.8512  1094.7429
  AX-100067350-A   2365.7755  2695.0053  2758.1739
  AX-100067350-B    2515.4818   2518.2818  28181.289 '
p1summ <- read.table(text=signals, header=T)
write.table(p1summ, "del.txt", sep="\t", col.names=T, row.names=F, quote=F)

scan.anno <- data.frame(scanID=1L, scanName="sample1", file="del.txt")
snp.anno$snpID <- 1:nrow(snp.anno)
p1summ <- createAffyIntensityFile(path=".", filename="tmp.gds", snp.annotation=snp.anno, scan.annotation=scan.anno, verbose=FALSE)

And the resulting file:

> (gds <- GdsIntensityReader("tmp.gds"))
File: /projects/users/stephanie/Code/Bioconductor/tmp.gds (1.6 KB)
+    [  ]
|--+ sample.id   { Int32 1 ZIP(300.00%), 12 bytes }
|--+ snp.id   { Int32 3 ZIP(141.67%), 17 bytes }
|--+ snp.chromosome   { UInt8 3 ZIP(366.67%), 11 bytes }
|--+ snp.position   { Int32 3 ZIP(166.67%), 20 bytes }
|--+ snp.rs.id   { Int32,factor 3 ZIP(141.67%), 17 bytes } *
|--+ X   { Float32 3x1 ZIP(166.67%), 20 bytes }
|--+ Y   { Float32 3x1 ZIP(166.67%), 20 bytes }
> getX(gds)
[1] 2126.756 1775.582 2365.775
> getY(gds)
[1]  427.1864  335.9226 2515.4817

 

Note that GWASTools 1.14.1 assumes that the extension after the sample name in the probe file is ".cel" lowercase instead of ".CEL" uppercase. This has been fixed in version 1.41.2, which should be available for download sometime tomorrow after the update propagates through the build machine.

ADD COMMENT

Login before adding your answer.

Traffic: 776 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6