Question

Tissue Specificity R-package

0

Entering edit mode

BioinfGuru ▴ 70

@yagalbi-11519

Last seen 15 months ago

Ireland

Hello everyone,

I have written a pipeline that takes count data and calculates tissue specificity. I am planning to make an R-package from this and go for publication. Before I begin packaging and writing drafts, I am doing my literature review and also looking to see if someone has created a package with this function already. Is anyone aware of another package/program that calculates tissue specificity? I certainly couldn't find one - maybe someone else has. Any pointers or advice is appreciated.

I will be posting this on biostars and stack overflow also.

Thanks in advance,

Kenneth

tissue specificity r package package development • 3.0k views

ADD COMMENT • link updated 7.8 years ago by jaro.slamecka ▴ 140 • written 7.8 years ago by BioinfGuru ▴ 70

score 1 · Answer 1 · 2018-02-25

1

Entering edit mode

jaro.slamecka ▴ 140

@jaroslamecka-7419

Last seen 10 months ago

Mitchell Cancer Institute, Mobile AL, U…

Take a look at CellNet from Dr. Patrick Cahan's lab, it uses gene regulatory networks to classify samples into around 14-16 tissue types. It works with human and mouse bulk and single-cell RNA-seq data but it also includes tools for training new tissue types. CellNet classifies and scores the similarity of the samples to real tissues and also returns a list of transcription factors the researchers might be interested in trying to modulate for their samples to score better. It uses salmon to quantify transcripts first so it's pretty fast too.

https://github.com/pcahan1/CellNet

https://www.nature.com/articles/nprot.2017.022 (latest RNA-seq protocol)

https://www.sciencedirect.com/science/article/pii/S0092867414009349?via%3Dihub (original microarray version)

That being said, the fact there's already a tool out there doesn't at all mean yours won't offer something that the other does not, it's always good to have more than one. So please go for it, I'll definitely be curious.

ADD COMMENT • link 7.8 years ago jaro.slamecka ▴ 140

0

Entering edit mode

thank you for that Jaro - I'll be looking into it.

ADD REPLY • link 7.8 years ago BioinfGuru ▴ 70

0

Entering edit mode

As you clearly have some experience with CellNet - does it have clear advantages or disadvantages?

ADD REPLY • link 7.8 years ago BioinfGuru ▴ 70

0

Entering edit mode

I'd say the main advantage is that the authors curated lots of datasets derived by expression profiling of real tissues. So as a biologist if you're developing a new protocol to engineer cells and tissues (e.g. by differentiation of pluripotent stem cells), it can help you check how well you've done without you having to get the tissues yourself, extract RNA and use it as a control in your expression profiling. The training data for each tissue comes from multiple samples and sources deliberately to account for perturbations, something that an individual lab would have to invest considerable resources to be able to match. Another advantage is its ability to calculate candidate genes for the biologist to target in order to bring the engineered cells or tissues closer to normal tissues, the authors demonstrated that in another publication.

One disadvantage used to be the lack of support for single-cell RNA-seq data but the newest version has added that (I haven't had a chance to test that). The only disadvantage I can think of is that it only works with single-end data so if you have paired-end data, only reads from one of the pairs is kept. It also trims the reads down to 40 bases before running salmon. So if you have 100PE data you could argue that only 20% of that information is included in the profiling. The authors say that this is for consistency across a greater number of RNA-seq datasets.

I don't know how much of the training data is adapted from its original microarray version, that's maybe another thing to consider. One other bioinformatic assay that can calculate pluripotency (PluriTest) was also originally built around microarray data and what they did to adapt it to RNA-seq was that they intersected microarray probes with corresponding RNA-seq reads, if I'm not mistaken, which would also mean that a part of the RNA-seq data is discarded before running the test. So if your approach can make full use of paired-end RNA-seq data, it could have an edge. Also, starting from the count matrix as you are proposing would make it easier to quickly analyze data from other labs, provided that there wouldn't be major differences between all the possible pipelines that you can use to get the counts.

Either way, it would be great to directly compare your approach and CellNet. Hope this helps!

ADD REPLY • link 7.8 years ago jaro.slamecka ▴ 140

0

Entering edit mode

Oh wow thank you Jaro...I'll go through this tomorrow...but just some thoughts on what I'm considering doing:

1) The basic function of the package would be to take count data and simply calculate specificity from that. For anyone that can load a package in R its a fast algorithm that takes about 30 sec once you have the normalised counts from each tissue. Bare in mind also that the algorithm is working on counts and so has already been applied (by us here) to counts from other types of data i.e. ChIP-seq histone marks

2) However , lots of labs will only have their own tissue data, not know where to obtain more and not know how to use R. For this reason, I'm considering also including counts data from >20 tissues I have processed myself.

3) This raises the issue of a lab processing data with their own pipeline to compare with data from my pipeline. I'm considering packaging my pipeline into a docker container so if they have access to a bioinformatician the new data can be added with confidence.

ADD REPLY • link 7.8 years ago BioinfGuru ▴ 70