Creating Intensity GDS file for the R package GWAStools
2
0
Entering edit mode
catbriggsm • 0
@catbriggsm-13604
Last seen 6.7 years ago

I am attempting to reproduce the Mis-annotated Sex Check from the GWAStools Data Cleaning Document with my own data. (https://bioconductor.org/packages/devel/bioc/vignettes/GWASTools/inst/doc/DataCleaning.pdf, Pgs. 46-48). I am having trouble creating the Intensity GDS file in R from the available files I have. I have both .IDAT files and Plink ped/map files. 

 

Below is my code to create the Genotype GDS from the Plink Files

ped.fn <- "F:/requested.idatfiles/kids.ped"
 map.fn <- "F:/requested.idatfiles/kids.map"

snpgdsPED2GDS(ped.fn, map.fn, "test.gds") 

genofile <- openfn.gds("test.gds")
genofile
File: F:\requested.idatfiles\test.gds (5.3M)
+    [  ] *
|--+ sample.id   { Str8 796 ZIP_ra(21.4%), 1.6K }
|--+ snp.id   { Int32 26528 ZIP_ra(34.6%), 35.9K }
|--+ snp.rs.id   { Str8 26528 ZIP_ra(36.1%), 112.2K }
|--+ snp.position   { Int32 26528 ZIP_ra(86.7%), 89.8K }
|--+ snp.chromosome   { Int32 26528 ZIP_ra(0.13%), 149B } *
|--+ snp.allele   { Str8 26528 ZIP_ra(15.5%), 16.1K }
|--+ genotype   { Bit2 796x26528, 5.0M } *
\--+ sample.annot   [ data.frame ] *
   |--+ family   { Str8 796 ZIP_ra(45.3%), 1.5K }
   |--+ father   { Str8 796 ZIP_ra(2.01%), 39B }
   |--+ mother   { Str8 796 ZIP_ra(2.01%), 39B }
   |--+ sex   { Str8 796 ZIP_ra(13.7%), 225B }
   \--+ phenotype   { Str8 796 ZIP_ra(1.59%), 45B }
snpgdsSummary("test.gds")

(gds <- GdsGenotypeReader(genofile))

scanID <- getScanID(gds)
family <- getVariable(gds, "sample.annot/family")
father <- getVariable(gds, "sample.annot/father")

mother <- getVariable(gds, "sample.annot/mother")

sex <- getVariable(gds, "sample.annot/sex")
sex[sex == ""] <- NA # sex must be coded as M/F/NA
phenotype <- getVariable(gds, "sample.annot/phenotype")
scanAnnot <- ScanAnnotationDataFrame(data.frame(scanID, father, mother,
                                                   sex, phenotype,
                                                  stringsAsFactors=FALSE))
 snpID <- getSnpID(gds)
 chromosome <- getChromosome(gds)
position <- getPosition(gds)
alleleA <- getAlleleA(gds)
alleleB <- getAlleleB(gds)
rsID <- getVariable(gds, "snp.rs.id")
snpAnnot <- SnpAnnotationDataFrame(data.frame(snpID, chromosome, position,
                                                 rsID, alleleA, alleleB,
                                                 stringsAsFactors=FALSE)
                                   ,YchromCode=as.integer(25), XchromCode=as.integer(23))
genoData <- GenotypeData(gds, scanAnnot=scanAnnot, snpAnnot=snpAnnot)

 

I am having trouble creating the Intensity File from the IDAT files I have. I read them into R through the crlmm package as follows, but can't find documentation on how to convert it to a Intensity GDS file. Should I be using a different package/function? Is there documentation online I can follow?

idats<-readIdatFiles(sampleSheet=NULL, arrayNames=NULL, ids=NULL, path=path.all,
              arrayInfoColNames=list(barcode="SentrixBarcode_A",
                                     position="SentrixPosition_A"),
              highDensity=FALSE, sep="_",
              fileExt=list(green="Grn.idat", red="Red.idat"),
              saveDate=FALSE, verbose=TRUE)

 

gwastools idat gds format • 1.8k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 6 hours ago
United States

The Idat files just contain the raw green and red signals. whereas the GDS file is expecting to get the SNP calls. You could use the CRLMM package to generate the SNP calls and then make a GDS file, but if you already have the SNP calls from some other source it seems like extra work that might not be that useful.

ADD COMMENT
0
Entering edit mode
@stephanie-m-gogarten-5121
Last seen 22 days ago
University of Washington

From a quick look at the documentation for crlmm, I think the "R" and "G" intensities output by readIdatFiles are analogous to the raw X and Y intensities output by Illumina's GenomeStudio (which is what we based the GWASTools input on). You will want to normalize the intensities before using them for the sex check. To get the data into GDS, you can either write a text file in the format expected by createDataFile, or create the GDS file yourself using commands from the gdsfmt package.

Similarly, I think you can use crlmm's calculateRBaf function to generate LRR and BAF from your IDAT files (those are used later in the GWASTools vignette).

ADD COMMENT

Login before adding your answer.

Traffic: 565 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6