Importing affymetrix genotype calls with GWASTools
1
0
Entering edit mode
@vinicius-henrique-da-silva-6713
Last seen 18 months ago
Brazil

I am trying to import affymetrix genotype call data (-1, 0, 1 and 2) using createDataFile from GWASTools package. Follow my code and the error that I am getting:

library(GWASTools)
snp.anno <-   'snpID chromosome position      snpName
  AX-100676796          1   501997 AX-100676796
  AX-100120875          1   503822 AX-100120875
  AX-100067350          1   504790 AX-100067350'
snp.anno <- read.table(text=snp.anno, header=T)
signals <-  'probeset_id    sample1.cel  sample2.cel   sample3.cel
  AX-100676796-A   2126.7557   1184.8638  1134.2687
  AX-100676796-B   427.1864  2013.8512   1495.0654
  AX-100120875-A   1775.5816 2013.8512  651.1691
  AX-100120875-B    335.9226  2013.8512  1094.7429
  AX-100067350-A   2365.7755  2695.0053  2758.1739
  AX-100067350-B    2515.4818   2518.2818  28181.289 '
p1summ <- read.table(text=signals, header=T)
write.table(p1summ, "del.txt", sep="\t", col.names=T, row.names=F, quote=F)

### Make Scan
mdf <- p1summ
names <- as.data.frame(names(mdf))
names <- as.data.frame(names[-1,])
colnames(names) <- "scanName"
names$scanID <- 1:nrow(names)
names$file <- "del.txt"
scan.anno <- subset(names, select = c(scanID, scanName, file))
scan.anno$scanName <- gsub(".cel", "", scan.anno$scanName)

#scan.anno <- data.frame(scanID=1L, scanName="sample1", file="del.txt")
snp.anno$snpID <- 1:nrow(snp.anno)

p1summ <- createAffyIntensityFile(path=".", filename="tmp.gds", snp.annotation=snp.anno, scan.annotation=scan.anno, verbose=FALSE)
p1summ

(gds <- GdsIntensityReader("tmp.gds"))

getX(gds)

### Creating genotype files

geno <-  'probeset_id    sample1.cel  sample2.cel   sample3.cel
  AX-100676796   1   0  1
  AX-100120875   2 1  0
  AX-100067350   0  1  0'
geno <- read.table(text=geno, header=T)
write.table(geno, "geno.txt", sep="\t", col.names=T, row.names=F, quote=F)

  col.nums <- 'snp sample
      1  2'
col.nums <- read.table(text=col.nums, header=T)

path <- system.file("geno.txt", package="GWASdata")

diag.geno <- createDataFile(path=path, filename="tmp.gen", col.nums=col.nums, col.total=4, sep.type="\t", variables = "genotype", snp.annotation=snp.anno, scan.annotation=scan.anno, verbose=FALSE)

Error in .checkVars(variables, col.nums, col.total, intensity.vars) :
  snp id missing in col.nums

Probably I missunderstood what 'col.nums' stands for, but I am really stuck here. I would be grateful for some light.
Thank you very much.

gwastools genotype affymetrix microarrays • 1.7k views
ADD COMMENT
2
Entering edit mode
@stephanie-m-gogarten-5121
Last seen 4 months ago
University of Washington

As it says in the man page for createDataFile, col.nums is not a data.frame but a named integer vector. Some other issues with your code: createDataFile (and createAffyIntensityFile) assume that you have one file per sample, where the file name is given in the "file" column of the scan.annotation data frame. "path" is the directory where all these files are found, something like "/Users/my_name/my_project/raw_data". When reading genotype data, the function is expecting alleles, either A/B (allele.coding="AB") or A/C/G/T (allele.coding="nucleotide"), although these genotypes are stored as 0/1/2 after input. I encourage you to go through the examples in the "Data Cleaning" and "Preparing Affymetrix Data" vignettes, as they might help you understand how to use these functions.

Here is a working version of your code:

library(GWASTools)
snp.anno <-   'snpID chromosome position      snpName
AX-100676796          1   501997 AX-100676796
AX-100120875          1   503822 AX-100120875
AX-100067350          1   504790 AX-100067350'
snp.anno <- read.table(text=snp.anno, header=T)
snp.anno$snpID <- 1:nrow(snp.anno)

scan.anno <- data.frame(scanID=1:2, scanName=paste0("sample", 1:2), file=paste0("geno", 1:2, ".txt"))

geno <-  'probeset_id    sample1.cel
  AX-100676796   AB
  AX-100120875   AA
  AX-100067350   BB'
geno <- read.table(text=geno, header=T)
write.table(geno, "geno1.txt", sep="\t", col.names=T, row.names=F, quote=F)

geno <-  'probeset_id   sample2.cel
  AX-100676796   BB
  AX-100120875   AB
  AX-100067350   AB'
geno <- read.table(text=geno, header=T)
write.table(geno, "geno2.txt", sep="\t", col.names=T, row.names=F, quote=F)

col.nums <- as.integer(c(1,2)); names(col.nums) <- c("snp", "geno")

diag.geno <- createDataFile(path=".", filename="tmp.gds", col.nums=col.nums, col.total=2, sep.type="\t", variables = "genotype", snp.annotation=snp.anno, scan.annotation=scan.anno, skip.num=1, scan.name.in.file=0, verbose=FALSE)

And the result:

> (gds <- GdsGenotypeReader("tmp.gds"))
File: /projects/users/stephanie/Code/Bioconductor/tmp.gds (1.3 KB)
+    [  ]
|--+ sample.id   { Int32 2 ZIP(175.00%), 14 bytes }
|--+ snp.id   { Int32 3 ZIP(141.67%), 17 bytes }
|--+ snp.chromosome   { UInt8 3 ZIP(366.67%), 11 bytes }
|--+ snp.position   { Int32 3 ZIP(166.67%), 20 bytes }
|--+ snp.rs.id   { Int32,factor 3 ZIP(141.67%), 17 bytes } *
|--+ genotype   { Bit2 3x2, 2 bytes } *
> getGenotype(gds)
     [,1] [,2]
[1,]    1    0
[2,]    2    1
[3,]    0    1
> close(gds)
ADD COMMENT
0
Entering edit mode

col.nums <- as.integer(c(0,1,2,3,4,5,6,7,8)); names(col.nums) <- c("Name","Chr","Position","1.B Allele Freq","1.Log R Ratio", "1.GType","2.B Allele Freq","2.Log R Ratio", "2.GType")
variables = c("Name","Chr","Position","1.B Allele Freq","1.Log R Ratio", "1.GType","2.B Allele Freq","2.Log R Ratio", "2.GType")
createDataFile(path = "C:/Users/pdharia/Desktop/GWASTools/baftest1", "baftest1gds.gds", file.type = "gds",col.nums = col.nums, col.total = 7,variables = variables, sep.type="\t", snp.annotation= NULL, scan.annotation=NULL, skip.num=1, scan.name.in.file=0, verbose=FALSE)
(gds<-GdsGenotypeReader("baftest1gds.gds"))

 

I am getting an error : Error: all(variables %in% c("genotype", intensity.vars)) is not TRUE.

 

Any help is appreciated. Thank you 

ADD REPLY
0
Entering edit mode

Please read the documentation for createDataFile. The description of the "variables" argument is

variables: A character vector containing the names of the variables to
          create (must be one or more of ‘c("genotype", "quality", "X",
          "Y", "rawX", "rawY", "R", "Theta", "BAlleleFreq",
          "LogRRatio")’)

You use the col.nums argument to map the columns in your file to a standard set of variables in the output GDS file. Also note that the first column is 1, not 0, and that you must name col.nums according to the documentation:

‘names(col.nums)’ must be
          a subset of c("snp", "sample", "geno", "a1", "a2", "quality",
          "X", "Y", "rawX", "rawY", "R", "Theta", "BAlleleFreq",
          "LogRRatio")

Also, scan.annotation and snp.annotation cannot be NULL - you must supply valid data frames for these arguments. See the "Data Cleaning" vignette for an example of how to prepare these data.frames.

ADD REPLY

Login before adding your answer.

Traffic: 579 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6