goseq pwf length bias plot: help interpreting plot
Entering edit mode
Last seen 5.3 years ago
University of Illinois, Urbana-Champaign

I am using goseq on some bovine data and I am using Ensembl IDs for bovine genome UMD3.1, which is not supported by goseq, so I had to input lenght and GO information manually. I tried obtaining length information 2 ways: 1) from featCounts files results; 2) from biomaRt.

Both gave me weirdly shaped plots.
Not sure if I'm doing something wrong. The graphs are in totally opposite direction then I expected them to be ( instead of going up and stabilizing as I expected, they are going down).

Below are the codes of how I obtained length info and plots, as well as the plots

1) Using featureCounts results:

featCountsData <- read.table("charolais_145A_TGACCA.featCounts.txt", header=TRUE, comment.char = "#", sep="\t")

featCountsLengthData <- as.numeric(featCountsData[,"Length"])
names(featCountsLengthData) <- featCountsData$Geneid

LenData <- names(geneLengthData) %in% names(genes)
LenData <- geneLengthData[LenData]

pwf <- nullp(DEgenes = genes, bias.data = LenData, plot.fit = TRUE)

2) Using Biomart:

txsData <- makeTxDbFromBiomart(biomart ="ensembl", dataset = "btaurus_gene_ensembl")
txsByGene <- transcriptsBy(txsData,"gene")
geneLengthData <- median(width(txsByGene))

BiomartLenData <- names(geneLengthData) %in% names(genes)
BiomartLenData <- geneLengthData[testLenData]

pwf <- nullp(DEgenes = genes, bias.data = testLenData, plot.fit = TRUE)

PLOT 1: Length info obtained from featureCounts:


PLOT 2: Length info obtained from biomaRt


Any help is appreciated!


goseq • 1.7k views
Entering edit mode
Last seen 3.3 years ago


I think your gene lengths and what you've done with goseq should be fine. One thing you could check, is if you get the same plots when you order the genes in "LenData" the same as in "genes". Otherwise, to work out why your plot is inverted, I think you'll need to go back to the DE analysis. Can you think of any reason why short genes would be more likely to be DE from the biology of the experiment? Perhaps a class of gene? Some details about the DE analysis might also help us to work out the source of the bias. For example what program did you use? What sort of cut-offs did you apply? How many genes do you get as DE?




Entering edit mode

Hi Nadia
Thank you so much for your response and e-mail.

Regarding my analysis previously to go seq, I analyzed my RNA-seq data for differential expression using sva (due to the presence of unknown batch effects) and edgeR. I obtained 507 differentially expressed genes at a 0.01 FDR.

And all genes inputed into goseq were genes significant at 0.25 FDR for an anova analysis (cut-off which I previously used for module detection in WGCNA).
I am not sure if that could have anything to do with the analysis.

Any thoughts?

Thank you!!!

Entering edit mode


What happens if you input all genes and not just those from the anova analysis? Do you get a regular looking plot. Your analysis sounds a bit different from the one in the vignette, so it seems possible to me that the biases could be different.

Cheers, Nadia.

Entering edit mode

Hi Nadia!
I tried including all genes in the analysis (without any previous filtering) and now I get a regular plot, like in the tutorial.

It seems like the problem has been solved. Thank you so much for your help!


Login before adding your answer.

Traffic: 449 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6