Thanks Dr Leek, Salzberg and their collegues and for sharing the fascinating HISAT-StringTie-Ballgown pipeline, presented in nature protocol and NatureBiotech.
Now we are reprocessing the RNA-seq data using the pipeline. However, we have encounter one bottleneck in using Ballgown, maybe because of the insufficient replicates of SRA data. The phenotype data as following, but we obtained 0 differentially expressed genes and <10 differentially expressed transcripts. Furthermore, we can't figure out what do the differentially expressed transcripts mean? That is to say, do they present the difference between the control and treat?
ids time treat
Cd0h-1 0h control0
Cd0h-2 0h control0
Cd1h-1 1h treat1
Cd1h-2 1h treat1
CdCK-1 1h control1
CdCK-2 1h control1
Cdck24 24h control24 (without any replicate)
CdCd24 24h treat24 (without any replicate)
In fact, we want to obtain the differentially expressed transcripts between treat VS. control both at 1 hour, and the DE transcripts between treat VS. control both at 24 hour (without any replicate).
All right, we did some test by analyzing treat VS. control (4 samples "Cd1h-1,Cd1h-2,CdCK-1,CdCK-2"), both at 1 hour with 2 replicates. However, we obtained 0 differentially expressed genes or transcripts.
After change the code, R tell us as following:
Wrapping up the results
> bg_filt = subset(bg,"rowVars(texpr(bg)) >1",genomesubset=TRUE)
> results_transcripts = stattest(bg_filt,feature="transcript",group=rep(c(1,0), each=2), getFC=TRUE, meas="FPKM")
Error in stattest(bg_filt, feature = "transcript", group = rep(c(1, 0), :
unused argument (group = rep(c(1, 0), each = 2))
Then we try to tell R the replicate information, also returned error.
> bg = ballgown(dataDir="ballgown", samples=rep(c(1,0), each=2), meas='all')
Error in strsplit(x, split = "/") : non-character argument
Another try also get the error warning,
bg = ballgown(dataDir="ballgown", samples="Cd1h-1,Cd1h-2,CdCK-1,CdCK-2", meas='all')
Sun Oct 9 20:43:21 2016
Error in ballgown(dataDir = "ballgown", samples = "Cd1h-1,Cd1h-2,CdCK-1,CdCK-2", :
Something is wrong: are you missing .ctab files? Do extra files/folders (other than tablemaker output folders) match your samples/dataDir/samplePattern argument(s)?
Thus we are confused about how to process RNA-seq data without sufficient replicates, such as Cd1h VS. CK1h (2 replicates each), Cd24 VS.ck24 (one sample each).
Thanks for your concern.
Is it possible to extract the TPM data from StringTie gtf files with new version of ballgown?
Hi, I also confused about 'how to hook out the data of TPM'? Have you found a solution for this? Thanks for the share.