Entering edit mode
Ping-Hsun Hsieh
▴
30
@ping-hsun-hsieh-3315
Last seen 9.7 years ago
Dear R/BioC experts,
I am interested in the R package âGenABELâ and would like to test
it on my dataset, but not successful.
My toy dataset, 9 samples x 1000 SNPs, were successfully converted
into gwaa.data class, however, its coding does not look right to me.
Here is part of the genotype file.
name chr pos strand MDSNP_02 MDSNP_04
MDSNP_06 MDSNP_07 MDSNP_08 MDSNP_10
MDSNP_11 MDSNP_12 MDSNP_15
SNP_A-2131660 1 1145994 +
CT CT TT TT TT TT
TT TT CT
SNP_A-1967418 1 2224111 +
GG GG GG GG AG AG GG
GG GG
SNP_A-1969580 1 2319424 +
GG GG GG GG GG GG GG
GG GG
SNP_A-4263484 1 2543484 +
TT CT CT CC CC TT
TT TT CT
SNP_A-1978185 1 2926730 -
CC CC CC CC CC CC
CC CC CC
SNP_A-4264431 1 2941694 -
CT CC CC CC CC CC
CT CT CC
SNP_A-1980898 1 3084986 -
GG GG GG GG GG GG CG
CG GG
SNP_A-1983139 1 3155127 +
AC AA AA AA AA AA
AA AA AA
The coding of the first sample, for first 9 SNPs only.
>as.character(toydf@gtdata)[1,1:9]
SNP_A-2131660 SNP_A-1967418 SNP_A-1969580
SNP_A-4263484 SNP_A-1978185
"T/C" "G/G"
"1/1" "T/T"
"1/1"
SNP_A-4264431 SNP_A-1980898
SNP_A-1983139 SNP_A-4265735
"C/T" "G/G"
"A/C" "C/T"
The coding of the third SNP, âSNP_A-1969580â, for all 9 samples.
> as.character(toydf@gtdata)[1:9,3]
MDSNP_02 MDSNP_04 MDSNP_06 MDSNP_07
MDSNP_08
"1/1" "1/1" "1/1"
"1/1" "1/1"
MDSNP_10 MDSNP_11 MDSNP_12 MDSNP_15
"1/1" "1/1" "1/1"
"1/1"
As you can see, for example, SNP_A-1969580 are GG across 9 samples.
Why does the coding of this SNP show â1/1â, rather than âG/Gâ?
The other question I have, also related to the above question, is if
it is required to have DNA bases in the genotype file.
Could I use AA,AB, and BB as coding scheme, rather than the exact
bases in GenABEL? Will it give troubles/errors?
Thanks for your answer/response in advance!
Best Regards,
Mike
[[alternative HTML version deleted]]