Question

Different order of samples in output of snpgdsSlidingWindow to calculate Fst and aggregation method

0

Entering edit mode

serpalma.v ▴ 60

@serpalmav-8912

Last seen 2.3 years ago

Germany

Hello!

I am using SNPRelate to calculate Fst for sliding windows. There are two things that I cannot find information about.

(1) If I pass a set of samples having a specific order, for example and their corresponding populations:

> samps
 [1] "H07750-L1" "H07754-L1" "H07760-L1" "H07775"    "H07762-L1" "H07782-L1"
 [7] "H07758-L1" "H07792-L1" "H07793-L1" "H07742-L1" "H07751-L1" "H07784"
[13] "H07746-L1" "H07767-L1" "H07781-L1" "H07741-L1" "H07779-L1" "H07748-L1"
[19] "H07778"    "H07773-L1"

> pops
 [1] pop1 pop1 pop1 pop1 pop1 pop1 pop1 pop1 pop1 pop1 pop2 pop2 pop2 pop2 pop2 pop2 pop2 pop2 pop2 pop2
Levels: pop1 pop2

After running the command:

res <- snpgdsSlidingWindow(genofile, winsize = 500000, shift = 250000, FUN ="snpgdsFst",sample.id = samps, population=pops, method = "W&C84")

The order of the samples is changed (sorted) in the output:

> res$sample.id
 [1] "H07741-L1" "H07742-L1" "H07746-L1" "H07748-L1" "H07750-L1" "H07751-L1"
 [7] "H07754-L1" "H07758-L1" "H07760-L1" "H07762-L1" "H07767-L1" "H07773-L1"
[13] "H07775"    "H07778"    "H07779-L1" "H07781-L1" "H07782-L1" "H07784"
[19] "H07792-L1" "H07793-L1"

I'm not sure what this means:

Is this the order in which samples are assigned to the argument population? --> not desired
res$sample.id just shows the samples that were used, but they were assigned to population as originally intended.

(2) Finally, how is the Fst window score calculated, is it the arithmetic mean of all Fst scores within?

Thanks in advance

SNPRelate • 855 views

ADD COMMENT • link updated 4.9 years ago by zhengx ▴ 30 • written 5.0 years ago by serpalma.v ▴ 60

score 0 · Answer 1 · 2019-05-16

SNPRelate re-orders "population" internally according to the order of sample IDs. res$sample.id is the sample order in the GDS file.

If you are not sure whether the order of population is correct, you could order your input sample IDs as the order in the GDS file and provide population information according to your sample IDs.

See the function "snpgdsFst", there are two Fst (weighted Fst, mean Fst), snpgdsSlidingWindow() returns weighted Fst ("W&C84" suggests).