Automatic NA values generated by the ld() function of snpStats package
2
0
Entering edit mode
@remitournebize-7099
Last seen 9.2 years ago
France

Dear list,

When using the ld() function (to compute linkage disequilibrium indices) from snpStats package of BioConductor, one can get automatic NA values in many rows or columns of the LD matrix (whatever the statistic being used, e.g. R², D', etc.). This phenomenon happens even if there is no NA value in the original snpMatrix object.

I have tried using the example given in the manual. The input snpMatrix contains no NA values (coded as 00 in the genotype). The problem still occurs (see hereafter).

Does anyone of you know why the ld() function generates NA values?

data(testdata)
ld2 <- ld(Autosomes[1:4, 1:20], Autosomes[1:4, 1:20], stats="R.squared")

#Graphical representation of LD matrix
lattice.options(default.theme=standard.theme(color=T))
print(levelplot(ld2, scales=list(x=list(cex=0), y=list(cex=0)), col.regions=topo.colors))

Graphical representation of the LD matrix, one easily notices the NA-containing rows or columns:

Original input snpMatrix (gives the genotype at each locus (in column) for each individual (in row)):

    173760    173761    173762    173767    173769    173770    173772    173774    173775    173776    173778    173781    173809    173811    173824    173825    173829    173841    173842    173844

01    01    01    03    03    03    03    02    01    01    01    02    01    01    03    03    01    03    01    02
01    03    03    03    03    03    03    03    01    01    01    03    01    01    03    03    01    03    00    01
01    02    02    02    03    03    03    02    01    01    01    03    01    01    03    02    01    03    01    01
01    02    02    01    03    03    03    03    01    01    01    03    01    01    03    03    01    03    01    01

Example of a subset from the LD matrix:

       173760    173761    173762     173767 173769 173770 173772    173774 173775
173760     NA        NA        NA         NA     NA     NA     NA        NA     NA
173761     NA 1.0000000 1.0000000 0.00000000     NA     NA     NA 0.3333333     NA
173762     NA 1.0000000 1.0000000 0.00000000     NA     NA     NA 0.3333333     NA
173767     NA 0.0000000 0.0000000 1.00000000     NA     NA     NA 0.2000000     NA
173769     NA        NA        NA         NA     NA     NA     NA        NA     NA

Many thanks again for your insights,

Sincerely,

Rémi


sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252   
[3] LC_MONETARY=French_France.1252 LC_NUMERIC=C                  
[5] LC_TIME=French_France.1252    

attached base packages:
 [1] splines   stats4    parallel  stats     graphics  grDevices utils     datasets
 [9] methods   base     

other attached packages:
 [1] snpStats_1.16.0          Matrix_1.1-4             survival_2.37-7         
 [4] lattice_0.20-29          GenomicFeatures_1.18.3   AnnotationDbi_1.28.1    
 [7] Biobase_2.26.0           VariantAnnotation_1.12.8 Rsamtools_1.18.2        
[10] Biostrings_2.34.1        XVector_0.6.0            GenomicRanges_1.18.4    
[13] GenomeInfoDb_1.2.4       IRanges_2.0.1            S4Vectors_0.4.0         
[16] BiocGenerics_0.12.1      BiocInstaller_1.16.1     abc_2.0                 
[19] locfit_1.5-9.1           MASS_7.3-35              quantreg_5.11           
[22] SparseM_1.6              nnet_7.3-8           
snpStats BioConductor LD Linkage Disequilibrium NA • 1.8k views
ADD COMMENT
1
Entering edit mode
@vincent-j-carey-jr-4
Last seen 6 weeks ago
United States

Your example is problematic.

> col.summary(Autosomes[1:4, 1:20])

       Calls Call.rate Certain.calls   RAF   MAF P.AA P.AB P.BB      z.HWE

173760     4      1.00             1 0.000 0.000 1.00 0.00 0.00         NA

173761     4      1.00             1 0.500 0.500 0.25 0.50 0.25  0.0000000

173762     4      1.00             1 0.500 0.500 0.25 0.50 0.25  0.0000000

173767     4      1.00             1 0.625 0.375 0.25 0.25 0.50 -0.9333333

173769     4      1.00             1 1.000 0.000 0.00 0.00 1.00         NA

173770     4      1.00             1 1.000 0.000 0.00 0.00 1.00         NA

173772     4      1.00             1 1.000 0.000 0.00 0.00 1.00         NA

By using so few samples you have sharply limited genotypic diversity.  It appears that

distances involving loci with extreme allele frequencies are reported as NA.

0
Entering edit mode
@remitournebize-7099
Last seen 9.2 years ago
France

Dear Vincent,

Many thanks for your insight. Now it gets clear for me.

Best regards,

Rémi

ADD COMMENT

Login before adding your answer.

Traffic: 448 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6