Entering edit mode
I would like to export the genotypes (GT field) from a VCF file to allele dosage (0, 1 or 2 copies of a given allele for bi-allelic variants). As the VCF file can be large, I need to do it by batch:
gdose.con <- file("gdose.txt", open="a")
amap.con <- file("amap.txt", open="a")
cat("snp.names\tallele.1\tallele.2\tignore\n", file=amap.con, append=TRUE)
tabix.file <- TabixFile(file="variants.vcf.gz", yieldSize=10^3)
open(tabix.file)
while(nrow(vcf <- readVcf(file=tabix.file, genome="test"))){
res <- genotypeToSnpMatrix(vcf)
write.SnpMatrix(x=res$genotypes, file=gdose.con, append=TRUE,
quote=FALSE, sep="\t", row.names=TRUE,
col.names=FALSE)
write.table(x=res$map, file=amap.con, append=TRUE,
quote=FALSE, sep="\t", row.names=FALSE, col.names=FALSE)
}
close(tabix.file)
close(gdose.con)
close(amap.con)
The output "gdose" file is fine. However, the output "amap" file contains <S4 object of class "DNAStringSet"> in its 2nd column, even if I use as.data.frame. What should I do to have instead the alternate alleles?
