Hi everybody,
I am trying to compare gene expression using RNA Seq data from 50 and 50 human tissue samples with two phenotypically similar diseases that may exhibit distinct pathophysiological differences on the transcriptomic level (= my hypothesis).
Several, but not all subjects in the study provided more than one sample, in one case even one from each disease type. Thus, I would want to correct for the subject as a random effect.
Additional (fixed) covariates include gender (disease B has more men that A) and a measure of disease severity (continuous integer values).
In order to be able to incorporate both my fixed and random factors (as a blocking factor), I used limma/voom
design <- model.matrix(~1+ targets$disease + targets$gender + targets$severity)
y <- voom(x,design,plot=F, normalize="quantile")
corfit <- duplicateCorrelation(y, design, block=targets$Patient)
y <- voom(x,design,plot=FALSE, block=targets$Patient, correlation=corfit$consensus, normalize="quantile")
fit=lmFit(y, design, block=targets$Patient, correlation=corfit$consensus)
fit <- eBayes(fit)
I have also tried TMM normalization, or voomWithQualityWeights.
My problem is that I get a number of DE genes for the disease B vs. A with large FCs (up to 10fold) that are mostly Y-linked, and/or ribosomal proteins. These results do not make sense biologically.
As a comparison, I have used DESeq2 to analyze the same data. Correcting for severity and patient as fixed factors (yes, I know, patient should be random...), I find very different genes to be differentially expressed, with lower FCs. Those make more sense biologically.
I know that DESeq2 is probably not appropriate for my data, and I would like to use limma here. But what am I doing wrong here? Is it a normalization problem?
Many thanks in advance for your thoughts and comments!

You don't show what
topTablecommand you used to get your DE list. Large log-fold changes for Y-linked genes; are you sure you're not testing for the sex effect? Also, everything butseverityhad better be a factor type.Many thanks for your comment.
You are right, I should have posted that (see below).
> topTable(fit, coef=2) ensembl_gene_id ensembl_transcript_id entrezgene hgnc_symbol ENSG00000129824 ENSG00000129824 ENST00000250784 6192 RPS4Y1 ENSG00000131002 ENSG00000131002 ENST00000407724 246126 TXLNGY ENSG00000067048 ENSG00000067048 ENST00000360160 8653 DDX3Y ENSG00000012817 ENSG00000012817 ENST00000469599 8284 KDM5D ENSG00000114374 ENSG00000114374 ENST00000338981 8287 USP9Y ENSG00000183878 ENSG00000183878 ENST00000331397 7404 UTY ENSG00000067646 ENSG00000067646 ENST00000383052 7544 ZFY ENSG00000099725 ENSG00000099725 ENST00000528056 5616 PRKY ENSG00000198692 ENSG00000198692 ENST00000361365 9086 EIF1AY ENSG00000235109 ENSG00000235109 ENST00000481934 64288 ZSCAN31 description ENSG00000129824 ribosomal protein S4, Y-linked 1 [Source:HGNC Symbol;Acc:HGNC:10425] ENSG00000131002 taxilin gamma pseudogene, Y-linked [Source:HGNC Symbol;Acc:HGNC:18473] ENSG00000067048 DEAD-box helicase 3, Y-linked [Source:HGNC Symbol;Acc:HGNC:2699] ENSG00000012817 lysine demethylase 5D [Source:HGNC Symbol;Acc:HGNC:11115] ENSG00000114374 ubiquitin specific peptidase 9, Y-linked [Source:HGNC Symbol;Acc:HGNC:12633] ENSG00000183878 ubiquitously transcribed tetratricopeptide repeat containing, Y-linked [Source:HGNC Symbol;Acc:HGNC:12638] ENSG00000067646 zinc finger protein, Y-linked [Source:HGNC Symbol;Acc:HGNC:12870] ENSG00000099725 protein kinase, Y-linked, pseudogene [Source:HGNC Symbol;Acc:HGNC:9444] ENSG00000198692 eukaryotic translation initiation factor 1A, Y-linked [Source:HGNC Symbol;Acc:HGNC:3252] ENSG00000235109 zinc finger and SCAN domain containing 31 [Source:HGNC Symbol;Acc:HGNC:14097] logFC AveExpr t P.Value adj.P.Val ENSG00000129824 4.7201875 1.0601605 6.609595 1.924885e-09 3.699244e-05 ENSG00000131002 4.2981889 1.1965928 6.241831 1.062628e-08 5.891736e-05 ENSG00000067048 4.0875311 1.3380538 6.161889 1.532869e-08 5.891736e-05 ENSG00000012817 4.1857966 1.1529671 6.184375 1.383026e-08 5.891736e-05 ENSG00000114374 4.1592220 1.0849112 6.191685 1.337496e-08 5.891736e-05 ENSG00000183878 3.7892623 1.3358802 5.865323 5.866505e-08 1.879042e-04 ENSG00000067646 3.5861337 0.1536729 5.476375 3.260742e-07 8.952134e-04 ENSG00000099725 2.7008122 0.6064057 5.056551 1.946346e-06 4.675611e-03 ENSG00000198692 3.1083149 -0.5579705 4.857377 4.425510e-06 9.449939e-03 ENSG00000235109 -0.4464802 4.7901975 -4.256346 4.706265e-05 7.537083e-02 B ENSG00000129824 9.944145 ENSG00000131002 8.572458 ENSG00000067048 8.335578 ENSG00000012817 8.285957 ENSG00000114374 8.261634 ENSG00000183878 7.209747 ENSG00000067646 5.140808 ENSG00000099725 3.866204 ENSG00000198692 2.705380 ENSG00000235109 1.858661 > is.factor(targets$disease) [1] TRUE > is.factor(targets$gender) [1] TRUE > is.integer(targets$severity) [1] TRUEI also looked into the gender effects - very similar genes appear, with better p-values.
> topTable(fit, coef=3) ensembl_gene_id ensembl_transcript_id entrezgene hgnc_symbol ENSG00000099725 ENSG00000099725 ENST00000528056 5616 PRKY ENSG00000067646 ENSG00000067646 ENST00000383052 7544 ZFY ENSG00000129824 ENSG00000129824 ENST00000250784 6192 RPS4Y1 ENSG00000005889 ENSG00000005889 ENST00000497813 7543 ZFX ENSG00000233864 ENSG00000233864 ENST00000457658 64595 TTTY15 ENSG00000198692 ENSG00000198692 ENST00000361365 9086 EIF1AY ENSG00000114374 ENSG00000114374 ENST00000338981 8287 USP9Y ENSG00000012817 ENSG00000012817 ENST00000469599 8284 KDM5D ENSG00000131002 ENSG00000131002 ENST00000407724 246126 TXLNGY ENSG00000067048 ENSG00000067048 ENST00000360160 8653 DDX3Y description ENSG00000099725 protein kinase, Y-linked, pseudogene [Source:HGNC Symbol;Acc:HGNC:9444] ENSG00000067646 zinc finger protein, Y-linked [Source:HGNC Symbol;Acc:HGNC:12870] ENSG00000129824 ribosomal protein S4, Y-linked 1 [Source:HGNC Symbol;Acc:HGNC:10425] ENSG00000005889 zinc finger protein, X-linked [Source:HGNC Symbol;Acc:HGNC:12869] ENSG00000233864 testis-specific transcript, Y-linked 15 (non-protein coding) [Source:HGNC Symbol;Acc:HGNC:18567] ENSG00000198692 eukaryotic translation initiation factor 1A, Y-linked [Source:HGNC Symbol;Acc:HGNC:3252] ENSG00000114374 ubiquitin specific peptidase 9, Y-linked [Source:HGNC Symbol;Acc:HGNC:12633] ENSG00000012817 lysine demethylase 5D [Source:HGNC Symbol;Acc:HGNC:11115] ENSG00000131002 taxilin gamma pseudogene, Y-linked [Source:HGNC Symbol;Acc:HGNC:18473] ENSG00000067048 DEAD-box helicase 3, Y-linked [Source:HGNC Symbol;Acc:HGNC:2699] logFC AveExpr t P.Value adj.P.Val ENSG00000099725 4.5733716 0.6064057 6.815547 7.283925e-10 4.046800e-06 ENSG00000067646 5.6232992 0.1536729 6.784889 8.422936e-10 4.046800e-06 ENSG00000129824 6.6336546 1.0601605 6.586468 2.145456e-09 8.246275e-06 ENSG00000005889 -0.3415792 6.4751894 -6.056646 2.475845e-08 3.660061e-05 ENSG00000233864 4.6152907 -0.7869401 6.852263 6.118939e-10 4.046800e-06 ENSG00000198692 5.0844618 -0.5579705 6.786910 8.342729e-10 4.046800e-06 ENSG00000114374 5.8648622 1.0849112 6.408775 4.915018e-09 1.349383e-05 ENSG00000012817 5.9082179 1.1529671 6.358619 6.201300e-09 1.390369e-05 ENSG00000131002 6.1177434 1.1965928 6.339047 6.788850e-09 1.390369e-05 ENSG00000067048 5.9136206 1.3380538 6.304632 7.958193e-09 1.390369e-05 B ENSG00000099725 9.635516 ENSG00000067646 9.292060 ENSG00000129824 9.117774 ENSG00000005889 8.866058 ENSG00000233864 8.849122 ENSG00000198692 8.827255 ENSG00000114374 8.511719 ENSG00000012817 8.368047 ENSG00000131002 8.302359 ENSG00000067048 8.232692