Question

GOseq Probability Weight Function bad fit

0

Entering edit mode

francesca • 0

@6d92f3cd

Last seen 13 days ago

Germany

I am trying to perform a GO analysis of differentially expressed genes with GOseq, but I get a very poor fit of pwf by the nullp function.

A bit of context:

I used bowtie to map the reads onto the transcriptome. Then salmon in alignment mode, tximport to import the results into R, and DESeq2 for the DGE analysis (FDR p< 0.001)
I use all the genes obtained from salmon as assayed genes (which are actually all the genes in my transcriptome since I used the same data to assemble it)
since my organism is not supported by GOseq, I created a vector with gene lengths from tximport count matrix with lengthData <- rowMeans(count.matrix$length) which gives the mean across sample of transcript length in nucleotides

pwf <- nullp(DEgenes = go.seq.v,bias.data = lengthData) gives a weird fit pwf

I tried to round the length values to the integer

lengthData.round <- as.integer(round(lengthData,-0))

pwf.r <- nullp(DEgenes = go.seq.v,bias.data = lengthData.round)

the fit is better, but the function shape is not what I expected at all.

pwf_round

Hence my questions:

Is it correct to use float or integer for the length values? Why does it change the plot so much? Should I change something here
Could my DE genes not suffer from length bias? Maybe because DESeq2 is already accounting for it.
What shall I change to get a better fit? Can I change something in the model? Or should I work on the data?
Given that for whatever reason the pwf is not fitting the data well, would it make more sense not to account for gene length at all? Can I do this in GOseq or should I use another software like ClusterProfiler?

Thanks in advance for any input on the matter

goseq • 88 views

ADD COMMENT • link written 20 days ago by francesca • 0