Hi all,
I have a feeling this is an easy one, but browsing the forum and the BiocParallel vignette, I can't find it.
Is there a BiocParallel equivalent of parRapply?
I have a nice MulticoreParam object to handle parallel computing, and if I could I would like to avoid making a separate cluster using the parallel package.
For context, I would like to apply a function to each row of the geno(vcf)[["GT"]] matrix to calculate the allele frequency of each variant (AF = 1*[0/1] + 2*[1/1]). I have ~80k thousand variants that I would like to reduce by filtering those > 1% MAF.
...except if I am missing an existing function that does part or all that I want? The reason why I am calculating it myself is that there are some funky genotypes in there (1/0, ./1, 0/., and more) which makes me wonder how the AF field of the VCF is calculated. At this stage, I simply prefer calculating the AF checking genotypes myself.
Cheers!
==== EDIT to add example ===
Thanks for both your comments.
Here is a made-up example:
# Define accepted genotypes
refG = c("0/0")
altHet = c("0/1")
altHom = c("1/1")
# A sample of genotypes observed in the VCF file
allG = c(
"0/0", "0", "1/1",
"1/0", "1", "2",
".", "./.", "./0", "./1", "0/.", "1/.")
# Make up a matrix like the one obtained using VariantAnnotation
genos <- matrix(
data = sample(
x = allG, size = 1E6, replace = TRUE,
prob = c(rep(0.25, 3), rep(0.25/9, 9))),
nrow = 1E3,
ncol = 1E3)
dim(genos)
genos[1:4,1:4]
# My current solution to calculating MAF considering "accepted" genotypes
minorAlleleFreq <- function(genos, refGenos, altHetGenos, altHomGenos){
tableCounts <- table(factor(
x = genos,
levels = c(refGenos, altHetGenos, altHomGenos)
))
alleleCount <- tableCounts[altHetGenos] + 2 * tableCounts[altHomGenos]
alleleNumber <- 2 * sum(tableCounts)
return(alleleCount / alleleNumber)
}
mafs <- apply(
X = genos,
MARGIN = 1,
FUN = minorAlleleFreq,
refGenos = refG,
altHetGenos = altHet,
altHomGenos = altHom)
# For your eyes only :)
hist(mafs)
I am just wondering if there was a more efficient way to calculate mafs above.
Cheers!

There isn't a parRapply equivalent. But can you provide more detail, maybe with an example using one of the vcf files that comes with VariantAnnotation or a simple matrix constructed 'by hand' of what you want to calculate? I'd guess that there is a vectorized version of the calculation that will be very fast, so no need for parallel evaluation.
Martin is right, there has to be a faster way than looping, and it doesn't make sense to use
rapplyover a genotype matrix anyway, since there is only a single level of nesting. To avoid the nesting entirely, callexpand(vcf)and then use conventional matrix operations.