Dear list,
When using the ld() function (to compute linkage disequilibrium indices) from snpStats package of BioConductor, one can get automatic NA values in many rows or columns of the LD matrix (whatever the statistic being used, e.g. R², D', etc.). This phenomenon happens even if there is no NA value in the original snpMatrix object.
I have tried using the example given in the manual. The input snpMatrix contains no NA values (coded as 00 in the genotype). The problem still occurs (see hereafter).
Does anyone of you know why the ld() function generates NA values?
data(testdata) ld2 <- ld(Autosomes[1:4, 1:20], Autosomes[1:4, 1:20], stats="R.squared") #Graphical representation of LD matrix lattice.options(default.theme=standard.theme(color=T)) print(levelplot(ld2, scales=list(x=list(cex=0), y=list(cex=0)), col.regions=topo.colors))
Graphical representation of the LD matrix, one easily notices the NA-containing rows or columns:
Original input snpMatrix (gives the genotype at each locus (in column) for each individual (in row)): 173760 173761 173762 173767 173769 173770 173772 173774 173775 173776 173778 173781 173809 173811 173824 173825 173829 173841 173842 173844 01 01 01 03 03 03 03 02 01 01 01 02 01 01 03 03 01 03 01 02 01 03 03 03 03 03 03 03 01 01 01 03 01 01 03 03 01 03 00 01 01 02 02 02 03 03 03 02 01 01 01 03 01 01 03 02 01 03 01 01 01 02 02 01 03 03 03 03 01 01 01 03 01 01 03 03 01 03 01 01
Example of a subset from the LD matrix:
173760 173761 173762 173767 173769 173770 173772 173774 173775 173760 NA NA NA NA NA NA NA NA NA 173761 NA 1.0000000 1.0000000 0.00000000 NA NA NA 0.3333333 NA 173762 NA 1.0000000 1.0000000 0.00000000 NA NA NA 0.3333333 NA 173767 NA 0.0000000 0.0000000 1.00000000 NA NA NA 0.2000000 NA 173769 NA NA NA NA NA NA NA NA NA
Many thanks again for your insights,
Sincerely,
Rémi
sessionInfo() R version 3.1.2 (2014-10-31) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 [3] LC_MONETARY=French_France.1252 LC_NUMERIC=C [5] LC_TIME=French_France.1252 attached base packages: [1] splines stats4 parallel stats graphics grDevices utils datasets [9] methods base other attached packages: [1] snpStats_1.16.0 Matrix_1.1-4 survival_2.37-7 [4] lattice_0.20-29 GenomicFeatures_1.18.3 AnnotationDbi_1.28.1 [7] Biobase_2.26.0 VariantAnnotation_1.12.8 Rsamtools_1.18.2 [10] Biostrings_2.34.1 XVector_0.6.0 GenomicRanges_1.18.4 [13] GenomeInfoDb_1.2.4 IRanges_2.0.1 S4Vectors_0.4.0 [16] BiocGenerics_0.12.1 BiocInstaller_1.16.1 abc_2.0 [19] locfit_1.5-9.1 MASS_7.3-35 quantreg_5.11 [22] SparseM_1.6 nnet_7.3-8