Question

"set genotype" function for gds?

0

Entering edit mode

Lna • 0

@lna-10651

Last seen 6.8 years ago

I'm using GWASTools for my GWAS analysis. It turned out that there are some genotyping problems with a few hundred SNPs on the X-chromosome. I already have all my data stored in gds files for use in GWASTools and I want to change the genotypes of the erroneous SNPs directly in the gds file. I was hoping there is kind of a "setGenotype" function with works exactly like the "getGenotype" function that I use to obtain specific SNP genotypes, but couldn't find anything like that.

Did I miss anything? How can genotype changes of specific SNPs be done in gds files without going back to raw data file and creating a new gds file based on them?

Thank you for your help!

gwastools gds • 1.8k views

ADD COMMENT • link 7.2 years ago Lna • 0

score 1 · Answer 1 · 2017-02-01

1

Entering edit mode

Stephanie M. Gogarten ▴ 870

@stephanie-m-gogarten-5121

Last seen 21 days ago

University of Washington

You need to use the gdsfmt package to modify GDS files directly.

library(gdsfmt)

file <- system.file("extdata", "illumina_geno.gds", package="GWASdata")
gdsfile <- tempfile()
file.copy(file, gdsfile)

# open file in write mode
gdsobj <- openfn.gds(gdsfile, readonly=FALSE)

# get node of the genotype variable
node <- index.gdsn(gdsobj, "genotype")

# uncompress node so we can write to it
compression.gdsn(node, "")

# change some values for 100 SNPs
nsnp <- 100
nsamp <- 77
newsnps <- matrix(sample(c(0,1,2), size=nsnp*nsamp, replace=TRUE), nrow=nsnp, ncol=nsamp)

# write new values for SNPs 101:200
write.gdsn(node, newsnps, start=c(101,1), count=c(100,77))

# close
closefn.gds(gdsobj)
cleanup.gds(gdsfile)

file.remove(gdsfile)

ADD COMMENT • link 7.2 years ago Stephanie M. Gogarten ▴ 870

0

Entering edit mode

There's one thing... When I read an entry for one SNP which includes "NA"s

(like: NA NA NA NA NA NA NA NA 2 NA NA NA NA NA NA NA 1 NA NA NA 2 NA 2 2 NA NA NA NA)

from a gds file and try to write it with write.gdsn to another, the "NA"s are exchanged by "0"s when I read the same data from the changed gds file
(0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 1 0 0 0 2 0 2 2 0 0 0 0)

I checked the help entry for write.gdsn, because I hoped I can choose what happens to missing entries and it says:

"GDS format does not support missing characters NA, and any NA will be converted to a blank string "". "

I don't really get it. In general there can be missing values in a gds file (that is where my data came from), but how can I set them with write.gdsn?

ADD REPLY • link 7.2 years ago Lna • 0

0

Entering edit mode

The standard missing code for genotype gds files is 3. The GWASTools getGenotype function takes care of converting a value of 3 to NA, but the gdsfmt package does not. When writing the new genotype values, set the missing genotypes to 3 before calling write.gdsn. For example:

geno[is.na(geno)] <- 3

write.gdsn(node, geno, ...)

ADD REPLY • link 7.2 years ago Adrienne Stilp ▴ 30

0

Entering edit mode

Thank you! Now everything seems to work. Maybe it would make sense to include this information in the help text for write.gdsn...

ADD REPLY • link 7.2 years ago Lna • 0

score 0 · Answer 2 · 2017-02-02

0

Entering edit mode

Lna • 0

@lna-10651

Last seen 6.8 years ago

Thank you! That worked fine!

ADD COMMENT • link 7.2 years ago Lna • 0