Further clarification of unbalanced group sizes DESeq2
1
0
Entering edit mode
hmillike • 0
@2b9bf55e
Last seen 6 weeks ago
United States

I have samples for two groups: males with disease (n=114) and females with disease (n=65) (male or female set as condition). I wanted to assess differential gene expression. My results show that a male-specific gene (SRY) has a negative log2fold change when females are set as reference meaning (if I am understanding correctly) that SRY was downregulated in males compared to females. SRY gene should only be found in males. I analyzed my gene count matrix and all the females have expression of zero for SRY however many males do as well (possibly indicative of the disease state). I have read the vignette and many posts here and see that the difference in sample size will affect the power of the results however, could the difference in group size result in so great a difference? The script I used is below. I am relatively new to R and have even less experience with DESeq2. I appreciate any thoughts!

Code should be placed in three backticks as shown below

```r gene_counts <- read.csv("gene_count_matrix.csv", header = TRUE, row.names = 1)

study_countsLabels <- read.csv("col_data.csv", header = TRUE, row.names = 1)

colData <- read.csv("col_data.csv", header=T, row.names=1, sep=",")

all(colnames(data) %in% rownames(colData))

all(colnames(data) == rownames(colData))

dds_study_gene <- DESeqDataSetFromMatrix(countData = gene_counts, colData = study_countsLabels, design = ~condition)

dds_study_gene$condition <- relevel(dds_study_gene$condition, ref = "female")

dds_gene <- DESeq(dds_study_gene)

results_DESeq2_female_vs_male_gene <- results(dds_gene, contrast = c("condition", "female", "male"))

results_DESeq2_female_vs_male_gene$gene_id <- rownames(results_DESeq2_female_vs_male_gene)

results_DESeq2_female_vs_male_gene <- results_DESeq2_female_vs_male_gene[, c("gene_id", colnames(results_DESeq2_female_vs_male_gene)[-ncol(results_DESeq2_female_vs_male_gene)])]

DESeq2 • 319 views
ADD COMMENT
0
Entering edit mode
ATpoint ★ 4.0k
@atpoint-13662
Last seen 2 days ago
Germany

You are using a contrast c("condition", "female", "male") which reads as 'female vs male' so negative fold change means higher in male. It's all fine.

ADD COMMENT
0
Entering edit mode

Thank you for your response! Does the fact that I set "female" as reference in the script change what you describe being inferred by contrast? I thought that a negative fold change would mean that "SRY gene is downregulated in males relative to females", but you mean that it's actually "SRY gene is downregulated in females relative to males"?

ADD REPLY
1
Entering edit mode

A contrast is an explicit definition of the comparison. The reference level has no influence here.

ADD REPLY
0
Entering edit mode

Thank you-much appreciated!

ADD REPLY

Login before adding your answer.

Traffic: 547 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6