Define a disease (e.g. hypertension), life style risk factor (e.g. smoking) or biomarker (e.g. blood pressure)
Derive information from one or more electronic health record data sources. This can include national and local sources. The definition of EHR includes administrative data such as billing/claims data, and clinical audits.
Have one or more peer-reviewed outputs associated with it e.g. journal publication, scientific conferences, policy white papers etc.
Provide evidence of how the phenotyping algorithm was validated.
Specification
Phenotyping algorithms are stored in the Phenotype Library usign a combination of YAML and CSV files. There are two main components to each algorothm:
a) the phenotype definition file (which is defined in a YAML file) and,
b) one or more teminology files (also known as codelists) which can be stored inline within the YAML file or in linked CSV files.
The section below provides information on their schema and contents.
File Naming
All phenotype definition files associated with a phenotype use a common naming pattern:
AUTHORSURNAME_NAME_UUID.yaml
for example: axson_bronchiestasis_ZckoXfUWNXn8Jn7fdLQuxj.yaml
Similarly, code list files follow a similar pattern:
NAME_UUID_TERMINOLOGY.csv
for example: axson_bronchiestasis_ZckoXfUWNXn8Jn7fdLQuxj_ICD10.csv
Phenotype definition file
The phenotype definition file is a YAML header file. The YAML file is used to record metadata fields capturing information about the algorithm, the data sources, controlled clinical terminologies and other information.
phenotype_uuid:5bqxGMaqvZFBhtEYb5mhZJtitle:Anxietypublish:FALSEauthor:-Matthew J Carr-Sarah Steeg-Roger T Webb-Nav Kapur-Carolyn A Chew-Graham-Kathrym M Abel-Holly Hope-Matthias Pierce-Darren M. Ashcrofttype:Disease or Syndromevalid_event_data_range:01/01/2020 - 31/10/2020sex:-Female-Malecollections:-20phenoflowid:4description: |-Using electronic health records from 1714 UK general practices registered with the Clinical Practice Research Datalink we examined incidence and event rates of depression and anxiety disorders, self-harm, prescriptions for antidepressants and benzodiazepines and GP referrals to mental health services per 100,000 person-months, before, during and after the peak of the Covid-19 emergency. Analyses were stratified by gender, age group and practice-level Index of Multiple Deprivation quintile.publications:-Matthew J Carr, Sarah Steeg, Roger T Webb, Nav Kapur, Carolyn A Chew-Graham, Kathrym M Abel, Holly Hope, Matthias Pierce, Darren M. Ashcroft, Primary care contact for mental illness and self-harm before during and after the peak of the Covid-19 pandemic in the UK: cohort study of 13 million individual". 2020.data_sources:-5-6concepts:-Anxiety - Primary Care:- type:csv- coding_system:5- filepath:./path/to/file.csv
The required Phenotype metadata fields are:
title (string): Phenotype (long) name
publish (bool): Whether to publish the phenotype and concepts immediately after uploading (publishing allows anyone to view your work)
type (string): Type of phenotype (a list of valid Phenotype types is available here)
author (list of strings): list of phenotype authors
sex (list of strings): list of sexes valid for the phenotype
concepts (list of concept objects): A list of concept objects
Additional Phenotype metadata fields are available:
phenotype_uuid (list of strings): Unique universal phenotype identifier, generated using the shortuuidPython module.
valid_event_data_range (list of strings): DD/MM/YYYY date range for events
description (string): Markdown text field providing a description of the Phenotype
implementation (string): Markdown text field providing details on the implementation of the phenotype
primary_publication (string): Citation of the primary publication
primary_publication_link (string): Link to the primary publication
primary_publication_doi (string): DOI linking to the primary publication
publications (list of strings): list of publications
tags (list of ints): List of tag ids that are associated with this Phenotype (a list of valid tag ids is available here)
collections (list of ints): List of collection ids that are associated with this Phenotype (a list of valid collection ids is available here)
data_sources (list of ints): List of data source ids that the Phenotype sources information from (a list of valid data source ids is available here). The data sources available in the Phenotype Library have been sourced from HDR Gateway.
validation (list of strings): evidence of validation used as evidence of phenotype robustness - valid values:
prognostic: the ability to replicate known prognostic associations
aetiologic: the ability to replicate known associations with risk factors
genetic : the abity to replicate associations with known regions or variants
cross-source: has the algorithm been evaluated in a similar external data source
casenote review : has the algorithm been validated through manual review of clinical notes (this usually would result to PPV, NPV values)
cross-country : has the algorithm been evaluated in a similar external healthcare system
Defining a Concept
With a csv file:
- [concept name] (string): The name of the concept
- type: csv
- coding_system (string): The coding system contained in the concept (a list of valid coding systems is available here)
- filepath (string): Location of the csv file, e.g. C:/myphenotype/myconcept.csv
By entering the codes as a list:
- [concept name] (string): The name of the concept
- type: inline
- coding_system (string): The coding system contained in the concept (a list of valid coding systems is available here)
- codes (list of strings): A list of codes that this concept should contain
By referencing a concept that has been published on the Phenotype Library:
- [concept name] (string): The name of the concept
- type: existing_concept
- concept_id (string): The concept ID as displayed on the Phenotype Library, e.g. C123