I am trying to import affymetrix genotype call data (-1, 0, 1 and 2) using createDataFile from GWASTools package. Follow my code and the error that I am getting:
library(GWASTools) snp.anno <- 'snpID chromosome position snpName AX-100676796 1 501997 AX-100676796 AX-100120875 1 503822 AX-100120875 AX-100067350 1 504790 AX-100067350' snp.anno <- read.table(text=snp.anno, header=T) signals <- 'probeset_id sample1.cel sample2.cel sample3.cel AX-100676796-A 2126.7557 1184.8638 1134.2687 AX-100676796-B 427.1864 2013.8512 1495.0654 AX-100120875-A 1775.5816 2013.8512 651.1691 AX-100120875-B 335.9226 2013.8512 1094.7429 AX-100067350-A 2365.7755 2695.0053 2758.1739 AX-100067350-B 2515.4818 2518.2818 28181.289 ' p1summ <- read.table(text=signals, header=T) write.table(p1summ, "del.txt", sep="\t", col.names=T, row.names=F, quote=F) ### Make Scan mdf <- p1summ names <- as.data.frame(names(mdf)) names <- as.data.frame(names[-1,]) colnames(names) <- "scanName" names$scanID <- 1:nrow(names) names$file <- "del.txt" scan.anno <- subset(names, select = c(scanID, scanName, file)) scan.anno$scanName <- gsub(".cel", "", scan.anno$scanName) #scan.anno <- data.frame(scanID=1L, scanName="sample1", file="del.txt") snp.anno$snpID <- 1:nrow(snp.anno) p1summ <- createAffyIntensityFile(path=".", filename="tmp.gds", snp.annotation=snp.anno, scan.annotation=scan.anno, verbose=FALSE) p1summ (gds <- GdsIntensityReader("tmp.gds")) getX(gds) ### Creating genotype files geno <- 'probeset_id sample1.cel sample2.cel sample3.cel AX-100676796 1 0 1 AX-100120875 2 1 0 AX-100067350 0 1 0' geno <- read.table(text=geno, header=T) write.table(geno, "geno.txt", sep="\t", col.names=T, row.names=F, quote=F) col.nums <- 'snp sample 1 2' col.nums <- read.table(text=col.nums, header=T) path <- system.file("geno.txt", package="GWASdata") diag.geno <- createDataFile(path=path, filename="tmp.gen", col.nums=col.nums, col.total=4, sep.type="\t", variables = "genotype", snp.annotation=snp.anno, scan.annotation=scan.anno, verbose=FALSE)
Error in .checkVars(variables, col.nums, col.total, intensity.vars) :
snp id missing in col.nums
Probably I missunderstood what 'col.nums' stands for, but I am really stuck here. I would be grateful for some light.
Thank you very much.
col.nums <- as.integer(c(0,1,2,3,4,5,6,7,8)); names(col.nums) <- c("Name","Chr","Position","1.B Allele Freq","1.Log R Ratio", "1.GType","2.B Allele Freq","2.Log R Ratio", "2.GType")
variables = c("Name","Chr","Position","1.B Allele Freq","1.Log R Ratio", "1.GType","2.B Allele Freq","2.Log R Ratio", "2.GType")
createDataFile(path = "C:/Users/pdharia/Desktop/GWASTools/baftest1", "baftest1gds.gds", file.type = "gds",col.nums = col.nums, col.total = 7,variables = variables, sep.type="\t", snp.annotation= NULL, scan.annotation=NULL, skip.num=1, scan.name.in.file=0, verbose=FALSE)
(gds<-GdsGenotypeReader("baftest1gds.gds"))
I am getting an error : Error: all(variables %in% c("genotype", intensity.vars)) is not TRUE.
Any help is appreciated. Thank you
Please read the documentation for
createDataFile
. The description of the "variables" argument isYou use the
col.nums
argument to map the columns in your file to a standard set of variables in the output GDS file. Also note that the first column is 1, not 0, and that you must namecol.nums
according to the documentation:Also,
scan.annotation
andsnp.annotation
cannot beNULL
- you must supply valid data frames for these arguments. See the "Data Cleaning" vignette for an example of how to prepare these data.frames.