A PCA plot simply shows you the largest differences between samples, so 'not aligning well' can mean more than one thing. For example, it may be that there is lots of technical variability that is obscuring the biological differences between your samples. But this is a matter of degree!
If you have really large changes between samples for a lot of genes, but larger technical variability due to batches or whatever, then the technical variability can obscure the biological variability (which usually shows up in higher principal components). In this case, using something like RUVSeq or svaseq from the sva package can help control for the unwanted technical variability.
However, if you have consistent, but real differences between samples in just a few genes, then the 'normal' variability that one might expect is often predominant in a PCA plot. This (IMO) doesn't necessarily mean you have to do something to 'fix' the data. With any adjustments to the data you always run the risk that you may be capturing some of your real biological variability with a surrogate variable, and thereby reducing your abilities to see the real changes that exist.
My point is that there is no free lunch here. Any adjustment you make to fix perceived faults in your data may well erase real signal. So I usually try to figure out if I really do have a problem, and if I can identify the source of the problem first.
As to correcting for SE and PE data, if they were run in separate batches (you seem to imply that these data were all run together, although I am inferring that from you saying 'the same stage, library preparation, and species' , which may not mean what I think), then you would simply fit a batch effect in your model. But it is pretty uncommon in my experience for samples to be run using the same library preparation, but sequenced differently.
Perhaps this is just a compilation of a bunch of different samples from different labs? If that is the case, you really shouldn't just be piling them all into one analysis. You would be better off doing separate analyses and then using something like the GeneMeta package to do a meta-analysis.
Thank you for your reply. I will try using negative control genes but the vignette does not include how to use specific genes but rather how to use spike ins. I have a list of potential control genes but no spike ins, do you know how to use a list of genes that I have by gene name to use as a negative control gene?
I'm not sure I understand your question. The same way you specify the names of the spike ins, you can specify the names of the endogenous genes that you want to use as negative controls. Section 2.4 of the vignette uses endogenous genes as negative controls.
Thank you, I completely missed 2.4, it does exactly what I need it to