scOntoMatch is an R package which unifies ontology annotation of scRNA-seq datasets to make them comparable across studies.

Author&Maintainer: Yuyao Song

Large-scale single cell atlases often have curated annotations using a standard ontology system. This package aims to match ontology labels between different datasets to make them comparable across studies.


## from source



First download the ontology .obo file from the OBO foundry. Common ontologies include:

Refer to vignette for detailed usage.

Get input ready

adatas = getAdatas(metadata = metadata, sep = "\t")
ont = ontologyIndex::get_OBO(oboFile, propagate_relationships = c('is_a', 'part_of'), )

Trim ontology tree to remove redundant terms

adatas_minimal = ontoMultiMinimal(adatas, ont = ont, anno_col = "cell_ontology_type", onto_id_col = "cell_ontology_id")

Match ontology cross datasets by direct mapping and mapping descendants to ancestor terms.

adatas_matched = ontoMultiMatch(adatas_minimal, ont = ont, anno_col = "cell_ontology_base")

This package also provides convenient plotting functions to help comprehend the hierarchy of cell types in any single cell dataset.

Plot a ontology tree per dataset

plotOntoTree(ont, onts, plot_ancestors=TRUE, ont_query, fontsize = 20)

Plot a matched ontology tree for all datasets

# use 'animal cells' as root
plts = plotMatchedOntoTree(ont = ont, adatas = adatas, anno_col = "cell_ontology_mapped", roots = 'CL:0000548', fontsize=25)

The Single Cell Expression Atlas hosted at EBI provides uniformly analysed and annotated scRNA-Seq data across multiple species. Datasets with curated ontology labels are all great inputs to this package. scRNA-seq data stored as h5ad files can be downloaded via the ftp site. Files that has extension .project.h5ad can be pass to ontoMatch with anno_col = 'authors_cell_type_-_ontology_labels'.

This package imports functions from ontologyIndex, ontologyPlot and anndata