Your two-step suggestion is the classical approach to testing for main effects. However, the use of empirical Bayes shrinkage in edgeR poses a major problem to fitting the second GLM. Any genes with a significant interaction effect in (1) will have an inflated dispersion in (2). This would subsequently drag up the dispersion estimates for all other genes - even those that do not have any interactions - and decrease the estimated prior degrees of freedom, both of which can greatly reduce power for all genes.
You might suggest simply getting rid of all genes with a significant interaction before fitting (2). This may have some unpredictable effects depending on the chosen p-value threshold. My guess is that any choice of threshold that sufficiently protects you from genuinely non-zero interaction terms will also remove some genes that have large estimates of the zero interaction terms by chance (i.e., the genes don't exhibit any real interaction but have a non-zero estimate nonetheless). This would effectively trim the expected distribution of dispersion estimates at the right tail, causing the common/trended dispersion to be underestimated and inflating the false positive rate for all non-DE genes.
Another problem is that (2) could also lead to some misleading conclusions if the interaction term is not balanced across your main effects. You might be led to think that there is a main effect where there is none, or a main effect in an opposite direction than the truth. This would then require you to refer back to the interaction results in order to interpret your main effect results, which defeats the purpose of thinking about nice simple "main effects". These problems motivate the prevailing advice of "don't interpret your main effect estimates when your interaction effect is significant". The argument presented on the Analysis Factor page is limited - sure, if the main effects and interaction are in the same direction, then everything's easy to interpret, but that's not always going to be true for all genes. In fact, I would say that many of the more scientifically interesting cases occur when the main and interaction effects disagree.
If you must say something about "main effects", I would suggest comparing the grand means of the groups with different levels for the factor of interest. See Section 3.2.4 of the edgeR user's guide for an example - this also seems to be what you are referring to in Gordon's answer in the linked post above. Comparing the grand means does pretty much the same thing as looking at the main effects, with the benefit of avoiding any mis-estimation of the dispersions. However, you still need to mentally censor out genes with significant interactions when you look at the results, lest you make an incorrect general conclusion about the effect of the factor. This is problematic as such censoring implicitly involves making inferences on genes for which you accepted the null hypothesis, and it's hard to control the error rate for that. (And yes, we do care about the error rate, otherwise there's not much point to this whole process.)
You should think about what scientific question you actually want to answer - there's at least a few ways to peel this orange. Personally, I find it easiest to do an ANODEV involving pairwise comparisons for my factor of interest (e.g., A1 - A2, B1 - B2 if my comparison of interest is 1 vs 2) and take set of significant DE genes that are in the same direction in all contrasts. For these genes, the factor exerts a significant effect in the same direction, which is pretty simple to interpret. If I want to make an even stronger statement, I perform pairwise comparisons for each factor of interest separately and intersect the results. (Specifically, I perform an intersection-union test to compute p-values for each gene.) This requires that genes be DE in each pairwise comparison, which is a pretty rigorous test.