Different order of samples in output of snpgdsSlidingWindow to calculate Fst and aggregation method
Entering edit mode
serpalma.v ▴ 60
Last seen 2.5 years ago


I am using SNPRelate to calculate Fst for sliding windows. There are two things that I cannot find information about.

(1) If I pass a set of samples having a specific order, for example and their corresponding populations:

> samps
 [1] "H07750-L1" "H07754-L1" "H07760-L1" "H07775"    "H07762-L1" "H07782-L1"
 [7] "H07758-L1" "H07792-L1" "H07793-L1" "H07742-L1" "H07751-L1" "H07784"
[13] "H07746-L1" "H07767-L1" "H07781-L1" "H07741-L1" "H07779-L1" "H07748-L1"
[19] "H07778"    "H07773-L1"

> pops
 [1] pop1 pop1 pop1 pop1 pop1 pop1 pop1 pop1 pop1 pop1 pop2 pop2 pop2 pop2 pop2 pop2 pop2 pop2 pop2 pop2
Levels: pop1 pop2

After running the command:

res <- snpgdsSlidingWindow(genofile, winsize = 500000, shift = 250000, FUN ="snpgdsFst",sample.id = samps, population=pops, method = "W&C84")

The order of the samples is changed (sorted) in the output:

> res$sample.id
 [1] "H07741-L1" "H07742-L1" "H07746-L1" "H07748-L1" "H07750-L1" "H07751-L1"
 [7] "H07754-L1" "H07758-L1" "H07760-L1" "H07762-L1" "H07767-L1" "H07773-L1"
[13] "H07775"    "H07778"    "H07779-L1" "H07781-L1" "H07782-L1" "H07784"
[19] "H07792-L1" "H07793-L1"

I'm not sure what this means:

  • Is this the order in which samples are assigned to the argument population? --> not desired
  • res$sample.id just shows the samples that were used, but they were assigned to population as originally intended.

(2) Finally, how is the Fst window score calculated, is it the arithmetic mean of all Fst scores within?

Thanks in advance

SNPRelate • 918 views
Entering edit mode
zhengx ▴ 30
Last seen 4.9 years ago
United States

SNPRelate re-orders "population" internally according to the order of sample IDs. res$sample.id is the sample order in the GDS file.

If you are not sure whether the order of population is correct, you could order your input sample IDs as the order in the GDS file and provide population information according to your sample IDs.

See the function "snpgdsFst", there are two Fst (weighted Fst, mean Fst), snpgdsSlidingWindow() returns weighted Fst ("W&C84" suggests).


Login before adding your answer.

Traffic: 847 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6