Technical Documentation
Phenotype Library Inclusion Criteria
  • Define a disease (e.g. hypertension), life style risk factor (e.g. smoking) or biomarker (e.g. blood pressure)
  • Derive information from one or more electronic health record data sources. This can include national and local sources. The definition of EHR includes administrative data such as billing/claims data, and clinical audits.
  • Have one or more peer-reviewed outputs associated with it e.g. journal publication, scientific conferences, policy white papers etc.
  • Provide evidence of how the phenotyping algorithm was validated.

Phenotyping algorithms are stored in the Phenotype Library usign a combination of YAML and CSV files. There are two main components to each algorothm:

  • a) the phenotype definition file (which is defined in a YAML file) and,
  • b) one or more teminology files (also known as codelists) which can be stored inline within the YAML file or in linked CSV files.

The section below provides information on their schema and contents.

Electronic Health Records Phenotyping algorithm
Electronic Health Records Phenotyping algorithm
Phenotype definition file Metadata Content
Codelist file
Codelist file
Codelist file
Codelist file
Codelist file
Codelist file
Viewer does not support full SVG 1.1
File Naming

All phenotype definition files associated with a phenotype use a common naming pattern:


for example: axson_bronchiestasis_ZckoXfUWNXn8Jn7fdLQuxj.yaml

Similarly, code list files follow a similar pattern:


for example: axson_bronchiestasis_ZckoXfUWNXn8Jn7fdLQuxj_ICD10.csv

Phenotype definition file

The phenotype definition file is a YAML header file. The YAML file is used to record metadata fields capturing information about the algorithm, the data sources, controlled clinical terminologies and other information.

phenotype_uuid: 5bqxGMaqvZFBhtEYb5mhZJ
title: Anxiety
publish: FALSE
- Matthew J Carr
- Sarah Steeg
- Roger T Webb
- Nav Kapur
- Carolyn A Chew-Graham
- Kathrym M Abel
- Holly Hope
- Matthias Pierce
- Darren M. Ashcroft
type: Disease or Syndrome
valid_event_data_range: 01/01/2020 - 31/10/2020
- Female
- Male
- 20
phenoflowid: 4
description: |-
Using electronic health records from 1714 UK general practices registered with the Clinical Practice Research Datalink we examined incidence and event rates of depression and anxiety disorders, self-harm, prescriptions for antidepressants and benzodiazepines and GP referrals to mental health services per 100,000 person-months, before, during and after the peak of the Covid-19 emergency. Analyses were stratified by gender, age group and practice-level Index of Multiple Deprivation quintile.
- Matthew J Carr, Sarah Steeg, Roger T Webb, Nav Kapur, Carolyn A Chew-Graham, Kathrym M Abel, Holly Hope, Matthias Pierce, Darren M. Ashcroft, Primary care contact for mental illness and self-harm before during and after the peak of the Covid-19 pandemic in the UK: cohort study of 13 million individual". 2020.
- 5
- 6
- Anxiety - Primary Care:
   - type: csv
   - coding_system: 5
   - filepath: ./path/to/file.csv

The required Phenotype metadata fields are:

  • title (string): Phenotype (long) name
  • publish (bool): Whether to publish the phenotype and concepts immediately after uploading (publishing allows anyone to view your work)
  • type (string): Type of phenotype (a list of valid Phenotype types is available here)
  • author (list of strings): list of phenotype authors
  • sex (list of strings): list of sexes valid for the phenotype
  • concepts (list of concept objects): A list of concept objects

Additional Phenotype metadata fields are available:

  • phenotype_uuid (list of strings): Unique universal phenotype identifier, generated using the shortuuid Python module.
  • valid_event_data_range (list of strings): DD/MM/YYYY date range for events
  • description (string): Markdown text field providing a description of the Phenotype
  • implementation (string): Markdown text field providing details on the implementation of the phenotype
  • primary_publication (string): Citation of the primary publication
  • primary_publication_link (string): Link to the primary publication
  • primary_publication_doi (string): DOI linking to the primary publication
  • publications (list of strings): list of publications
  • tags (list of ints): List of tag ids that are associated with this Phenotype (a list of valid tag ids is available here)
  • collections (list of ints): List of collection ids that are associated with this Phenotype (a list of valid collection ids is available here)
  • data_sources (list of ints): List of data source ids that the Phenotype sources information from (a list of valid data source ids is available here). The data sources available in the Phenotype Library have been sourced from HDR Gateway.
  • validation (list of strings): evidence of validation used as evidence of phenotype robustness - valid values:
    • prognostic: the ability to replicate known prognostic associations
    • aetiologic: the ability to replicate known associations with risk factors
    • genetic : the abity to replicate associations with known regions or variants
    • cross-source: has the algorithm been evaluated in a similar external data source
    • casenote review : has the algorithm been validated through manual review of clinical notes (this usually would result to PPV, NPV values)
    • cross-country : has the algorithm been evaluated in a similar external healthcare system
Defining a Concept

With a csv file:

  • - [concept name] (string): The name of the concept
    •   - type: csv
    •   - coding_system (string): The coding system contained in the concept (a list of valid coding systems is available here)
    •   - filepath (string): Location of the csv file, e.g. C:/myphenotype/myconcept.csv

By entering the codes as a list:

  • - [concept name] (string): The name of the concept
    •   - type: inline
    •   - coding_system (string): The coding system contained in the concept (a list of valid coding systems is available here)
    •   - codes (list of strings): A list of codes that this concept should contain

By referencing a concept that has been published on the Phenotype Library:

  • - [concept name] (string): The name of the concept
    •   - type: existing_concept
    •   - concept_id (string): The concept ID as displayed on the Phenotype Library, e.g. C123
How to submit data

You can download a sample template file from the repository:

If you have a phenotyping algorithm that meets the eligibility requirements, we invite you to submit your data by one of the following ways: