goseq pwf length bias plot: help interpreting plot
1
0
Entering edit mode
@mrodriguesfernanda-9433
Last seen 7.0 years ago
University of Illinois, Urbana-Champaign

Hi!
I am using goseq on some bovine data and I am using Ensembl IDs for bovine genome UMD3.1, which is not supported by goseq, so I had to input lenght and GO information manually. I tried obtaining length information 2 ways: 1) from featCounts files results; 2) from biomaRt.

Both gave me weirdly shaped plots.
Not sure if I'm doing something wrong. The graphs are in totally opposite direction then I expected them to be ( instead of going up and stabilizing as I expected, they are going down).

Below are the codes of how I obtained length info and plots, as well as the plots

1) Using featureCounts results:

featCountsData <- read.table("charolais_145A_TGACCA.featCounts.txt", header=TRUE, comment.char = "#", sep="\t")

featCountsLengthData <- as.numeric(featCountsData[,"Length"])
names(featCountsLengthData) <- featCountsData$Geneid

LenData <- names(geneLengthData) %in% names(genes)
LenData <- geneLengthData[LenData]

pwf <- nullp(DEgenes = genes, bias.data = LenData, plot.fit = TRUE)

2) Using Biomart:

txsData <- makeTxDbFromBiomart(biomart ="ensembl", dataset = "btaurus_gene_ensembl")
txsByGene <- transcriptsBy(txsData,"gene")
geneLengthData <- median(width(txsByGene))

BiomartLenData <- names(geneLengthData) %in% names(genes)
BiomartLenData <- geneLengthData[testLenData]

pwf <- nullp(DEgenes = genes, bias.data = testLenData, plot.fit = TRUE)

PLOT 1: Length info obtained from featureCounts:

https://drive.google.com/file/d/0B6Ho35U9KAepTGxuR0FxdXc5UUU/view?usp=sharing

PLOT 2: Length info obtained from biomaRt

https://drive.google.com/file/d/0B6Ho35U9KAepTnk2UGdvQ0VxeUU/view?usp=sharing

Any help is appreciated!

 

goseq • 2.3k views
ADD COMMENT
1
Entering edit mode
@nadia-davidson-5739
Last seen 5.0 years ago
Australia

Hi,

I think your gene lengths and what you've done with goseq should be fine. One thing you could check, is if you get the same plots when you order the genes in "LenData" the same as in "genes". Otherwise, to work out why your plot is inverted, I think you'll need to go back to the DE analysis. Can you think of any reason why short genes would be more likely to be DE from the biology of the experiment? Perhaps a class of gene? Some details about the DE analysis might also help us to work out the source of the bias. For example what program did you use? What sort of cut-offs did you apply? How many genes do you get as DE?

 

Cheers,

Nadia.

ADD COMMENT
0
Entering edit mode

Hi Nadia
Thank you so much for your response and e-mail.

Regarding my analysis previously to go seq, I analyzed my RNA-seq data for differential expression using sva (due to the presence of unknown batch effects) and edgeR. I obtained 507 differentially expressed genes at a 0.01 FDR.

And all genes inputed into goseq were genes significant at 0.25 FDR for an anova analysis (cut-off which I previously used for module detection in WGCNA).
I am not sure if that could have anything to do with the analysis.

Any thoughts?

Thank you!!!

ADD REPLY
1
Entering edit mode

Hi,

What happens if you input all genes and not just those from the anova analysis? Do you get a regular looking plot. Your analysis sounds a bit different from the one in the vignette, so it seems possible to me that the biases could be different.

Cheers, Nadia.

ADD REPLY
0
Entering edit mode

Hi Nadia!
I tried including all genes in the analysis (without any previous filtering) and now I get a regular plot, like in the tutorial.

It seems like the problem has been solved. Thank you so much for your help!

ADD REPLY

Login before adding your answer.

Traffic: 418 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6