Question: Allele-specific expression with different gene lenghts
gravatar for gtechbio
8 months ago by
gtechbio0 wrote:

I want to use DESeq2 for assessing allele-specific expression. I will compare gene-level allelic counts of a yeast hybrid (basically comparing counts of orthologous genes of parentals of this hybrid).
I'm aware of this tutorial

My question is that parentals might have orthologous genes with different lengths, so for example ortholog in parent A is 1000bp and in parent B is 1020 bp. Is it possible at some step of DESeq2 to account for this length difference?


ADD COMMENTlink modified 8 months ago by Michael Love20k • written 8 months ago by gtechbio0
gravatar for Michael Love
8 months ago by
Michael Love20k
United States
Michael Love20k wrote:

If you provide a matrix at the beginning of the analysis called “avgTxLength”, the analysis will control for gene length changes across sample. This should happen before running DESeq().

assays(dds)[["avgTxLength"]] <- lengths
ADD COMMENTlink written 8 months ago by Michael Love20k

Hi Michael,

Thanks for reply.

Just to make sure I do everything in order:

My `count_table` looks like

                        SC1    SC2    SC3    SU1    SU2    SU3
YBR177C_Sbay_4.429      2711   3397   3765   1246   2156   3287
YIL140W_Sbay_9.48         178    111    204    109    271    250
YLR268W_Sbay_10.366    1208   1170   1704    631    825    865

The code is

groups<-factor(x=c(rep("SC",3), rep("SU",3)), levels=c("SC","SU"))

colData <- DataFrame(condition=groups)

dds <- DESeqDataSetFromMatrix(count_table, colData, formula(~condition))
assays(dds)[["avgTxLength"]] <- as.matrix(length_data)   ### correcting for gene length

`length_data` looks like

                       SC1   SC2   SC3   SU1   SU2   SU3
YBR177C_Sbay_4.429     1353  1353  1353  1359  1359  1359
YIL140W_Sbay_9.48      2469  2469  2469  2457  2457  2457
YLR268W_Sbay_10.366     642   642   642   642   642   642
sizeFactors(dds) <- c(as.numeric(rep("1",6)))   ### this is allelic expression, so everything comes from the same library
dds <- DESeq(dds)
res <- results(dds)

Is everything correct?
Cheers and sorry if formatting is a bit messed

EDIT: When I omit sizeFactors(dds) <- c(as.numeric(rep("1",6))), I get the message `using 'avgTxLength' from assays(dds), correcting for library size`, but if I do both preset library size to 1 and feed the code with gene length, I do not receive the message above. Does it mean that with predefined size factors its not possible to account for gene length?

ADD REPLYlink modified 7 months ago • written 7 months ago by gtechbio0

There's two issues here, first is that you should correct for library size unless you are comparing within individual (e.g. alt to total or alt to ref, etc.). Above you don't have the same kind of count table as in my example, where there were two columns for every sample.

The other issue is that you're correct, if you want to specify the size factors as 1, the avgTxLength method isn't the way to go. Instead you should use normalizationFactors instead:

normalizationFactors(dds) <- lengths / exp(rowMeans(log(lengths)))
ADD REPLYlink written 7 months ago by Michael Love20k

Dear Michael,

Regarding normalization, so thanks a lot, I will try running the code like that.

Regarding the count table, so for example SC1 and SU1 are from the same library (as other samples with the same numbers), and I thought my design will compare alt to ref, like you have mentioned, won't it? Is there a conceptual difference for Deseq2 calculations if I also add "sample" column?

Once again thank you for your time!

ADD REPLYlink written 7 months ago by gtechbio0

I'd say take a look at how I went about it in the workflow you posted. Yes, there's a critical difference in adding the sample information and identifying which column is which allele or not.

ADD REPLYlink written 7 months ago by Michael Love20k

Hi Michael,
I am sorry for ignorance, but still I don't understand why my example wouldn't work properly: I specify in design formula that there are two conditions (basically parents), and I control for library size and gene length. So basically it will compare condition 1 vs condition 2, and will show in which condition there is higher or lower expression (which in theory corresponds to ASE). What does this logic miss?
Thanks again for your time 

ADD REPLYlink written 7 months ago by gtechbio0

One of the strengths of the ASE approach is that you compare counts within individual, controlling for many potential biases. In the workflow I created, this is the approach. Your approach doesn’t have individual in it.


ADD REPLYlink written 7 months ago by Michael Love20k

Thanks for elaborating!Now it makes sense for me. And I guess for human data (i.e. patients or any "case-control" study) this "individualized" approach would matter compared to mine. In my case though I analyze yeasts (they are quite homogenous within the culture), so I can in principle assume that between-colony variations will not bias overall analysis.
Anyway, for my analysis I will do as you suggest with design like `~parent+sample`.
Also, just to clarify, with `normalizationFactors` I don't to set sizefactors to 1, correct?

ADD REPLYlink written 7 months ago by gtechbio0

That design is confounded. Again, I’d recommend to take a look at that workflow.

The only difference is that you will provide norm factors before DESeq()

ADD REPLYlink written 7 months ago by Michael Love20k

Dear Michael,

I was wondering if in DESeq2 it is possible to simultaneously set size factors and control for gene length (like you have suggested in your post)?
Thanks in advance

ADD REPLYlink written 7 months ago by gtechbio0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 159 users visited in the last hour