It is possible to manipulate edgeR into ignoring the samples for which there are no distinguishing SNPs. This strategy uses a paired-sample design whereby the maternal and paternal counts for each individual are treated as a pair. A paired design should be more appropriate than your one-way layout, as the former accounts for any individual-specific effects on gene expression. Assuming that counts from the same individual are in adjacent columns of the count matrix, we can do:
individual <- rep(1:10, rep(2, 10))
design <- model.matrix(~factor(parent) + factor(individual))
And then, as Steve suggested:
counts[is.na(counts)] <- 0
The analysis can proceed with the GLM-based methods in edgeR. Specification of the
group is not necessary as the experimental design is described in
d <- DGEList(counts=counts, genes=row.names(counts), remove.zeros=TRUE)
d <- estimateGLMCommonDisp(d, design)
The idea here is that both of the counts for an individual will be
NA if no SNPs are available to distinguish the maternal and paternal alleles. When these missing values are converted into zeros, the coefficient corresponding to that individual will approach negative infinity in the fitted GLM. This will not affect nor be affected by the estimated (finite) value of the coefficient corresponding to the paternal/maternal log-fold change for all other individuals. As such, the missing individual is effectively taken out of consideration during inference of allele-specific DE.
In practice, dispersion estimation is slightly biased by this strategy, due to inaccuracies with calculation of the Cox-Reid adjusted profile likelihood when a fitted value is arbitrarily close to zero. Nonetheless, it's a good place to start.