Ballgown stattest: Why do p-values change when changing filtration?
1
0
Entering edit mode
@katrinegraversen-24419
Last seen 3.5 years ago

Hello,

I am not very experienced in data analysis. I am analysing data from a small RNAseq experiment (two conditions, 5 samples with each), and found great help in following the guidelines by Pertea M et al. (Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat Protoc. 2016 Sep;11(9):1650-67.)

I have reached the final step in the analysis and is puzzled about the output I get from Ballgown stattest , since my p-values are changing when I change my filtration settings. I am completely aware that q-values should change, but I thought that p-values should be unaffected. Can anyone explain that or find the mistake in my procedure?

Thank you very much in advance! Kind regards Katrine

#Read in samples overview
pheno_data = read.csv("samples_overview.csv")

#Read in the expression data from StringTie
bg_data = ballgown(dataDir = "ballgown", samplePattern = "sample_", pData=pheno_data)

Different filtering options - only one applied at the time :)

#1 Remove all transcripts with a variance across samples less than one.
bg_data_filt = subset(bg_data,"rowVars(texpr(bg_data)) >1",genomesubset=TRUE)

#2 Remove all transcripts with less than 10 reads across all samples
bg_data_filt = subset(bg_data,"rowSums(texpr(bg_data)) >= 10",genomesubset=TRUE)

#3 Do not filter
bg_data_filt = bg_data

Each filtering option followed by

#Identify transcripts that show statistically significant differences between groups 
group_transcripts = stattest(bg_data_filt, feature="transcript", covariate="group", meas="FPKM", getFC = T)

And then let's just look at Il10 as an example:

group_transcripts[5591,"pval"]

Depending on the filtering option this command returns either: 1: 0.9136172; 2: 0.874399; 3: 0.1992573;

ballgown • 965 views
ADD COMMENT
0
Entering edit mode
@lcolladotor
Last seen 5 days ago
United States

Hi @katrinegraversen,

My understanding from looking at the source code is that you are getting different p-values because stattest(libadjust = NULL) is the default, and when that is the case, the expression data is automatically adjusted using the 75th percentile. Since the input data is different in all 3 cases, the 75th percentile will also be different, leading to different expression data values used in all 3 use cases you have, and thus different p-values.

Try using stattest(libadjust = FALSE) to try this out.

Details

Best, Leo

PS This was an example for my team on learning how to help others.

ADD COMMENT

Login before adding your answer.

Traffic: 489 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6