Question

Allele-specific expression with different gene lenghts

0

Entering edit mode

gtechbio ▴ 10

@gtechbio-13996

Last seen 10 months ago

Spain

Hi,
I want to use DESeq2 for assessing allele-specific expression. I will compare gene-level allelic counts of a yeast hybrid (basically comparing counts of orthologous genes of parentals of this hybrid).
I'm aware of this tutorial http://rstudio-pubs-static.s3.amazonaws.com/275642_e9d578fe1f7a404aad0553f52236c0a4.html.

My question is that parentals might have orthologous genes with different lengths, so for example ortholog in parent A is 1000bp and in parent B is 1020 bp. Is it possible at some step of DESeq2 to account for this length difference?

Thanks

deseq2 allele-specific-expression rnaseq • 1.2k views

ADD COMMENT • link updated 6.0 years ago by Michael Love 41k • written 6.0 years ago by gtechbio ▴ 10

score 1 · Answer 1 · 2018-04-17

1

Entering edit mode

Michael Love 41k

@mikelove

Last seen 2 hours ago

United States

If you provide a matrix at the beginning of the analysis called “avgTxLength”, the analysis will control for gene length changes across sample. This should happen before running DESeq().

assays(dds)[["avgTxLength"]] <- lengths

ADD COMMENT • link 6.0 years ago Michael Love 41k

0

Entering edit mode

Hi Michael,

Thanks for reply.

Just to make sure I do everything in order:

My `count_table` looks like

                        SC1    SC2    SC3    SU1    SU2    SU3
YBR177C_Sbay_4.429      2711   3397   3765   1246   2156   3287
YIL140W_Sbay_9.48         178    111    204    109    271    250
YLR268W_Sbay_10.366    1208   1170   1704    631    825    865

The code is

groups<-factor(x=c(rep("SC",3), rep("SU",3)), levels=c("SC","SU"))

colData <- DataFrame(condition=groups)

dds <- DESeqDataSetFromMatrix(count_table, colData, formula(~condition))
assays(dds)[["avgTxLength"]] <- as.matrix(length_data)   ### correcting for gene length

`length_data` looks like

                       SC1   SC2   SC3   SU1   SU2   SU3
YBR177C_Sbay_4.429     1353  1353  1353  1359  1359  1359
YIL140W_Sbay_9.48      2469  2469  2469  2457  2457  2457
YLR268W_Sbay_10.366     642   642   642   642   642   642

sizeFactors(dds) <- c(as.numeric(rep("1",6)))   ### this is allelic expression, so everything comes from the same library
dds <- DESeq(dds)
res <- results(dds)

Is everything correct?
Cheers and sorry if formatting is a bit messed

EDIT: When I omit sizeFactors(dds) <- c(as.numeric(rep("1",6))), I get the message `using 'avgTxLength' from assays(dds), correcting for library size`, but if I do both preset library size to 1 and feed the code with gene length, I do not receive the message above. Does it mean that with predefined size factors its not possible to account for gene length?

ADD REPLY • link 6.0 years ago gtechbio ▴ 10

0

Entering edit mode

There's two issues here, first is that you should correct for library size unless you are comparing within individual (e.g. alt to total or alt to ref, etc.). Above you don't have the same kind of count table as in my example, where there were two columns for every sample.

The other issue is that you're correct, if you want to specify the size factors as 1, the avgTxLength method isn't the way to go. Instead you should use normalizationFactors instead:

normalizationFactors(dds) <- lengths / exp(rowMeans(log(lengths)))

ADD REPLY • link 6.0 years ago Michael Love 41k

0

Entering edit mode

Dear Michael,

Regarding normalization, so thanks a lot, I will try running the code like that.

Regarding the count table, so for example SC1 and SU1 are from the same library (as other samples with the same numbers), and I thought my design will compare alt to ref, like you have mentioned, won't it? Is there a conceptual difference for Deseq2 calculations if I also add "sample" column?

Once again thank you for your time!

ADD REPLY • link 6.0 years ago gtechbio ▴ 10

0

Entering edit mode

I'd say take a look at how I went about it in the workflow you posted. Yes, there's a critical difference in adding the sample information and identifying which column is which allele or not.

ADD REPLY • link 6.0 years ago Michael Love 41k

0

Entering edit mode

Hi Michael,
I am sorry for ignorance, but still I don't understand why my example wouldn't work properly: I specify in design formula that there are two conditions (basically parents), and I control for library size and gene length. So basically it will compare condition 1 vs condition 2, and will show in which condition there is higher or lower expression (which in theory corresponds to ASE). What does this logic miss?
Thanks again for your time

ADD REPLY • link 6.0 years ago gtechbio ▴ 10

0

Entering edit mode

One of the strengths of the ASE approach is that you compare counts within individual, controlling for many potential biases. In the workflow I created, this is the approach. Your approach doesn’t have individual in it.

ADD REPLY • link 6.0 years ago Michael Love 41k

0

Entering edit mode

Thanks for elaborating!Now it makes sense for me. And I guess for human data (i.e. patients or any "case-control" study) this "individualized" approach would matter compared to mine. In my case though I analyze yeasts (they are quite homogenous within the culture), so I can in principle assume that between-colony variations will not bias overall analysis.
Anyway, for my analysis I will do as you suggest with design like `~parent+sample`.
Also, just to clarify, with `normalizationFactors` I don't to set sizefactors to 1, correct?

ADD REPLY • link 6.0 years ago gtechbio ▴ 10

0

Entering edit mode

That design is confounded. Again, I’d recommend to take a look at that workflow.

The only difference is that you will provide norm factors before DESeq()

ADD REPLY • link 6.0 years ago Michael Love 41k

0

Entering edit mode

Dear Michael,

I was wondering if in DESeq2 it is possible to simultaneously set size factors and control for gene length (like you have suggested in your post)?
Thanks in advance

ADD REPLY • link 6.0 years ago gtechbio ▴ 10