LLM Safety Mechanisms Explorer

This project supports holistic analysis of Large Language Model safety mechanisms, using data from my LLM Safety Mechanisms GitHub repository. Please raise any issues/suggestions via GitHub.

Why do we need it?

Understanding which safety mechanisms are implemented across large language models currently requires piecing together information from scattered documentation, each using different terminology and varying levels of detail. This work provides a structured, queryable view of safety technique coverage across major frontier models - as a coverage profile that assists researchers, practitioners, and policymakers to make informed risk assessments.


Provider-Technique Relationships

This is designed to support coverage analysis. Use the filter below this graph to reduce the dataset for improved clarity. You can apply force layout on selected subsets of nodes.

Loading…

Dataset Filter

Constrain the collection using the following tools.


Safety Mechanisms by Category

This chart provides a visual overview of the safety mechanisms documented in this project. The Categories and individual techniques have been defined as a common taxonomy across the set of providers over months of iteration and analysis. This has been a data-driven approach, collapsing members where there was high overlap. I've also removed life cycle stage as higher order categories, and these are now represented intersectionally with techniques in a different section of the dataset.

Loading…

Summary Statistics


Model Development Lifecycle

Safety techniques mapped across the six phases of model development. Techniques appearing in multiple phases are connected with bridge lines. The governance band spans the full lifecycle to reflect its cross-cutting nature. Use the provider filter to compare coverage profiles.

Loading…

Standards Alignment

Coverage of safety techniques mapped against external governance and security frameworks including NIST AI RMF, OWASP LLM Top 10, MITRE ATLAS, EU AI Act, ISO 42001, and the Weidinger taxonomy of LM risks.

Loading…

Third-Party Commentary

External analysis and research discussing specific safety techniques — academic papers, independent audits, and expert commentary on technique effectiveness.

Loading…

Reported Incidents

Documented safety incidents linked to specific models and, where identifiable, to the safety techniques that were insufficient. Incident data sourced from the AI Incident Database (AIID) (CC BY-SA 4.0).

Loading…

Documentation Map

The following chart shows the relationship between documents in the collection to providers (via models). This is to provide a quick overview as to which documentation has been brought into the dataset for analysis and will also assist in coverage analysis as I identify gaps in information. Click and drag to move things around. You can export the layout and save it as you prefer. Tooltips on the document nodes provide the URIs for the original source document referenced.

Loading…

Export


Current (& Planned) Activity

This project is under active development. Current priorities include:


Documentation

Data Sources

This notebook fetches live data from the following GitHub repository endpoints:

Methodology

Usage Examples

Basic Filtering

  1. Select a provider from the dropdown to focus on specific implementations
  2. Choose a technique type to analyse particular safety approaches
  3. Adjust the minimum rating slider to filter by confidence threshold
  4. Use the search box for free-text filtering across descriptions

Advanced Analytics

Provider Comparison: Compare safety mechanism adoption across providers

Data Export