arckit-data-model

Name: arckit-data-model
Author: tractorjuice/arc-kit

$npx mdskill add tractorjuice/arc-kit/arckit-data-model

Build compliant data models anchored in explicit requirements.

Generates entity relationships and governance rules from DR and NFR-SEC inputs.
Depends on ARC-*.md artifacts and global policy documents for context.
Validates outputs against mandatory requirements before model generation.
Delivers structured schemas ready for database design and API specs.

SKILL.md

.github/skills/arckit-data-modelView on GitHub ↗

---
name: arckit-data-model
description: "Create comprehensive data model with entity relationships, GDPR compliance, and data governance"
---

You are helping an enterprise architect create a comprehensive data model for a project that will guide database design, API specifications, and compliance requirements.

## User Input

```text
$ARGUMENTS
```

## Instructions

> **Note**: Before generating, scan `projects/` for existing project directories. For each project, list all `ARC-*.md` artifacts, check `external/` for reference documents, and check `000-global/` for cross-project policies. If no external docs exist but they would improve output, ask the user.

1. **Read existing artifacts from the project context:**

   **MANDATORY** (warn if missing):
   - **REQ** (Requirements)
     - Extract: All DR (data requirements), NFR-SEC (security/privacy), INT (integration/data exchange), BR (data-related business requirements)
     - If missing: STOP and warn user to run `$arckit-requirements` first — data model MUST be based on DR-xxx requirements

   **RECOMMENDED** (read if available, note if missing):
   - **STKE** (Stakeholder Analysis)
     - Extract: Data owners from RACI matrix, governance stakeholders, data stewardship responsibilities
   - **PRIN** (Architecture Principles, in 000-global)
     - Extract: Data governance standards, privacy by design principles, data sovereignty requirements

   **OPTIONAL** (read if available, skip silently if missing):
   - **SOBC** (Business Case)
     - Extract: Data-related benefits and costs
   - **RSCH** (Research Findings)
     - Extract: Database technology recommendations, data platform choices

2. **Identify the target project**:
   - Use the **ArcKit Project Context** (above) to find the project matching the user's input (by name or number)
   - If no match, create a new project:
     1. Use Glob to list `projects/*/` directories and find the highest `NNN-*` number (or start at `001` if none exist)
     2. Calculate the next number (zero-padded to 3 digits, e.g., `002`)
     3. Slugify the project name (lowercase, replace non-alphanumeric with hyphens, trim)
     4. Use the Write tool to create `projects/{NNN}-{slug}/README.md` with the project name, ID, and date — the Write tool will create all parent directories automatically
     5. Also create `projects/{NNN}-{slug}/external/README.md` with a note to place external reference documents here
     6. Set `PROJECT_ID` = the 3-digit number, `PROJECT_PATH` = the new directory path

3. **Read external documents and policies**:
   - Read any **external documents** listed in the project context (`external/` files) — extract entity definitions, relationships, data types, constraints, existing schemas, migration requirements
   - Read any **enterprise standards** in `projects/000-global/external/` — extract enterprise data dictionaries, master data management standards, cross-project data architecture patterns
   - If no external docs exist but they would improve the data model, ask: "Do you have any existing database schemas, ERD diagrams, or data dictionaries? I can read PDFs, images, and SQL files directly. Place them in `projects/{project-dir}/external/` and re-run, or skip."
   - **Citation traceability**: When referencing content from external documents, follow the citation instructions in `.arckit/references/citation-instructions.md`. Place inline citation markers (e.g., `[PP-C1]`) next to findings informed by source documents and populate the "External References" section in the template.

4. **Read the template** (with user override support):
   - **First**, check if `.arckit/templates/data-model-template.md` exists in the project root
   - **If found**: Read the user's customized template (user override takes precedence)
   - **If not found**: Read `.arckit/templates/data-model-template.md` (default)

   > **Tip**: Users can customize templates with `$arckit-customize data-model`

5. **Extract data requirements**:
   - Read the project's requirements document (`ARC-*-REQ-*.md`)
   - Extract ALL Data Requirements (DR-xxx)
   - Also look for privacy/GDPR requirements in NFR section
   - Identify integration requirements (INT-xxx) that involve data exchange
   - Note any data-related business requirements (BR-xxx)

6. **Load Mermaid Syntax Reference**:
   - Read `.arckit/skills/mermaid-syntax/references/entityRelationshipDiagram.md` for official Mermaid ER diagram syntax — entity definitions, relationship types, cardinality notation, and attribute syntax.

7. **Generate comprehensive data model**:

   **A. Executive Summary**:
   - Total number of entities identified
   - Data classification summary (Public, Internal, Confidential, Restricted)
   - PII/sensitive data identified (Yes/No)
   - GDPR/DPA 2018 compliance status
   - Key data governance stakeholders

   **B. Visual Entity-Relationship Diagram (ERD)**:
   - Create Mermaid ERD syntax showing:
     - All entities (E-001, E-002, etc.)
     - Relationships (one-to-one, one-to-many, many-to-many)
     - Cardinality notation
   - Organise by logical domain/bounded context if possible
   - Use descriptive entity and relationship names

   **C. Entity Catalog** (E-001, E-002, etc.):
   - For each entity, document:
     - **Entity ID**: E-001, E-002, etc.
     - **Entity Name**: Customer, Transaction, Product, etc.
     - **Description**: What this entity represents
     - **Source Requirement**: Which DR-xxx requirement(s) drive this entity
     - **Business Owner**: From stakeholder RACI matrix
     - **Technical Owner**: Data steward or database team
     - **Data Classification**: Public/Internal/Confidential/Restricted
     - **Estimated Volume**: Initial records + growth rate
     - **Retention Period**: How long data is kept (GDPR requirement)
     - **Attributes Table**:

       ```text
       | Attribute | Type | Required | PII | Description | Validation | Source Req |
       |-----------|------|----------|-----|-------------|------------|------------|
       | customer_id | UUID | Yes | No | Unique identifier | UUID v4 | DR-001 |
       | email | String(255) | Yes | Yes | Contact email | RFC 5322, unique | DR-002 |
       ```

     - **Relationships**: What other entities this connects to
     - **Indexes**: Primary key, foreign keys, performance indexes
     - **Privacy Notes**: GDPR considerations, data subject rights

   **D. Data Governance Matrix**:
   - For each entity, identify:
     - **Data Owner**: Business stakeholder responsible (from RACI matrix)
     - **Data Steward**: Person responsible for quality and compliance
     - **Data Custodian**: Technical team managing storage/backups
     - **Access Control**: Who can view/modify (roles/permissions)
     - **Sensitivity**: Public, Internal, Confidential, Restricted
     - **Compliance**: GDPR, PCI-DSS, HIPAA, etc.
     - **Quality SLA**: Accuracy, completeness, timeliness targets

   **E. CRUD Matrix** (Create, Read, Update, Delete):
   - Map which components/systems can perform which operations on each entity
   - Example:

     ```text
     | Entity | Payment API | Admin Portal | Reporting Service | CRM Integration |
     |--------|-------------|--------------|-------------------|-----------------|
     | E-001: Customer | CR-- | CRUD | -R-- | -R-- |
     | E-002: Transaction | CR-- | -R-- | -R-- | ---- |
     ```

   - Helps identify unauthorized access patterns and data flows

   **F. Data Integration Mapping**:
   - **Upstream Systems**: Where data comes from
     - System name, entity mapping, update frequency, data quality SLA
   - **Downstream Systems**: Where data goes to
     - System name, entity mapping, sync method (API, batch, event), latency SLA
   - **Master Data Management**: Which system is "source of truth" for each entity

   **G. Privacy & Compliance**:
   - **GDPR/DPA 2018 Compliance**:
     - List all PII attributes across all entities
     - Document legal basis for processing (consent, contract, legitimate interest, etc.)
     - Data subject rights implementation (access, rectification, erasure, portability)
     - Data retention schedules per entity
     - Cross-border data transfer considerations (UK-EU adequacy)
   - **Data Protection Impact Assessment (DPIA)**:
     - Is DPIA required? (Yes if high-risk processing of PII)
     - Key privacy risks identified
     - Mitigation measures
     - ICO notification requirements
   - **Sector-Specific Compliance**:
     - PCI-DSS: If payment card data (special handling requirements)
     - HIPAA: If healthcare data (US projects)
     - FCA regulations: If financial services (UK)
     - Government Security Classifications: If public sector (OFFICIAL, SECRET)

   **H. Data Quality Framework**:
   - **Quality Dimensions**:
     - **Accuracy**: How correct is the data? (validation rules, reference data)
     - **Completeness**: Required fields populated? (% target)
     - **Consistency**: Same data across systems? (reconciliation rules)
     - **Timeliness**: How current is the data? (update frequency, staleness tolerance)
     - **Uniqueness**: No duplicates? (deduplication rules)
     - **Validity**: Conforms to format? (regex patterns, enums, ranges)
   - **Data Quality Metrics**:
     - Define measurable targets per entity (e.g., "Customer email accuracy >99%")
     - Data quality monitoring approach
     - Data quality issue resolution process

   **I. Requirements Traceability**:
   - Create traceability table:

     ```text
     | Requirement | Entity | Attributes | Rationale |
     |-------------|--------|------------|-----------|
     | DR-001 | E-001: Customer | customer_id, email, name | Store customer identity |
     | DR-002 | E-002: Transaction | transaction_id, amount, status | Track payments |
     | NFR-SEC-003 | E-001: Customer | password_hash (encrypted) | Secure authentication |
     ```

   - Show how every DR-xxx requirement maps to entities/attributes
   - Flag any DR-xxx requirements NOT yet modeled (gaps)

   **J. Implementation Guidance**:
   - **Database Technology Recommendation**:
     - Relational (PostgreSQL, MySQL) for transactional data
     - Document (MongoDB, DynamoDB) for flexible schemas
     - Graph (Neo4j) for highly connected data
     - Time-series (InfluxDB, TimescaleDB) for metrics/events
   - **Schema Migration Strategy**: How to evolve schema (Flyway, Liquibase, Alembic)
   - **Backup and Recovery**: RPO/RTO targets, backup frequency
   - **Data Archival**: When to move data from hot to cold storage
   - **Testing Data**: Anonymization/pseudonymization for test environments

8. **UK Government Compliance** (if applicable):
   - **Government Security Classifications**: OFFICIAL, SECRET, TOP SECRET
   - **Data Standards**: Use GDS Data Standards Catalogue where applicable
   - **Open Standards**: Preference for open data formats (JSON, CSV, OData)
   - **ICO Data Protection**: Reference ICO guidance for public sector
   - **National Cyber Security Centre (NCSC)**: Data security patterns

Before writing the file, read `.arckit/references/quality-checklist.md` and verify all **Common Checks** plus the **DATA** per-type checks pass. Fix any failures before proceeding.

9. **Write the output**:
   - Write to `projects/{project-dir}/ARC-{PROJECT_ID}-DATA-v1.0.md`
   - Use the exact template structure from `data-model-template.md`
   - Include Mermaid ERD at the top for quick visualization
   - Include all sections even if some are TBD
   - Create comprehensive entity catalog with ALL attributes

**IMPORTANT - Auto-Populate Document Information Fields**:

Before completing the document, populate document information fields:

### Auto-populated fields

- `[PROJECT_ID]` → Extract from project path (e.g., "001")
- `[VERSION]` → Start with "1.0" for new documents
- `[DATE]` / `[YYYY-MM-DD]` → Current date in YYYY-MM-DD format
- `[DOCUMENT_TYPE_NAME]` → Document purpose
- `ARC-[PROJECT_ID]-DATA-v[VERSION]` → Generated document ID
- `[STATUS]` → "DRAFT" for new documents
- `[CLASSIFICATION]` → Default to "OFFICIAL" (UK Gov) or "PUBLIC"

### User-provided fields

- `[PROJECT_NAME]` → Full project name
- `[OWNER_NAME_AND_ROLE]` → Document owner

### Revision History

```markdown
| 1.0 | {DATE} | ArcKit AI | Initial creation from `$arckit-data-model` command |
```

### Generation Metadata Footer

```markdown
**Generated by**: ArcKit `$arckit-data-model` command
**Generated on**: {DATE}
**ArcKit Version**: {ARCKIT_VERSION}
**Project**: {PROJECT_NAME} (Project {PROJECT_ID})
**AI Model**: [Actual model name]
```

10. **Summarize what you created**:

- How many entities defined (E-001, E-002, etc.)
- How many total attributes across all entities
- How many entities contain PII (privacy-sensitive)
- Data classification breakdown (Public/Internal/Confidential/Restricted)
- GDPR compliance status (compliant / needs DPIA / gaps identified)
- Key data governance stakeholders identified
- Requirements coverage (% of DR-xxx requirements modeled)
- Suggested next steps (e.g., "Review data model with data protection officer before proceeding to HLD" or "Run `$arckit-hld-review` to validate database technology choices")

## Example Usage

User: `$arckit-data-model Create data model for payment gateway project`

You should:

- Check prerequisites (requirements document exists, stakeholder analysis recommended)
- Find project directory (e.g., `projects/001-payment-gateway-modernization/`)
- Extract DR-xxx requirements from the requirements document
- Generate comprehensive data model:
  - Mermaid ERD showing Customer, Transaction, PaymentMethod, RefundRequest entities
  - Detailed entity catalog with attributes, PII flags, retention periods
  - GDPR compliance: PII identified, legal basis documented, DPIA required
  - Data governance: CFO owns financial data, DPO owns PII, IT owns storage
  - CRUD matrix: Payment API can create transactions, Admin can read all, Reporting read-only
  - PCI-DSS compliance: Payment card data encrypted, tokenized, not stored long-term
  - Requirements traceability: All DR-001 through DR-008 mapped to entities
- **CRITICAL - Token Efficiency**: Use the **Write tool** to create `projects/001-payment-gateway-modernization/ARC-001-DATA-v1.0.md`
  - **DO NOT** output the full document in your response (this exceeds 32K token limit!)
- Show summary only (see Output Instructions below)

## Important Notes

- **Data model drives database schema, API contracts, and data governance policies**
- **GDPR compliance is MANDATORY for any PII - identify and protect it**
- **Every entity MUST trace back to at least one DR-xxx requirement**
- **Data ownership is critical - assign business owners from stakeholder RACI matrix**
- **PII requires special handling**: encryption at rest, encryption in transit, access controls, audit logging, retention limits
- **Use Mermaid ERD syntax** for GitHub-renderable diagrams (not PlantUML or other formats)
- **Data quality metrics should be measurable** (not "high quality", use "99% accuracy")
- **Consider data lifecycle**: creation, updates, archival, deletion (GDPR "right to erasure")
- **Reference architecture principles** from any `ARC-000-PRIN-*.md` file in `projects/000-global/` if they exist
- **Flag any DR-xxx requirements that cannot be modeled** (gaps for requirements clarification)
- **UK Government data projects**: The data model supports [National Data Strategy](https://www.gov.uk/government/publications/uk-national-data-strategy/national-data-strategy) alignment — Data Foundations pillar (metadata, standards, quality) and Availability pillar (data access, sharing). The Data Quality Framework section maps to the [Government Data Quality Framework](https://www.gov.uk/government/publications/the-government-data-quality-framework/the-government-data-quality-framework) 6 dimensions. See `docs/guides/national-data-strategy.md` and `docs/guides/data-quality-framework.md` for full mappings.

- **Markdown escaping**: When writing less-than or greater-than comparisons, always include a space after `<` or `>` (e.g., `< 3 seconds`, `> 99.9% uptime`) to prevent markdown renderers from interpreting them as HTML tags or emoji

## Integration with Other Commands

- **Input**: Requires requirements document (`ARC-*-REQ-*.md`) for DR-xxx requirements
- **Input**: Uses stakeholder analysis (`ARC-*-STKE-*.md`) for data ownership RACI matrix
- **Input**: References SOBC (`ARC-*-SOBC-*.md`) for data-related costs and benefits
- **Output**: Feeds into `$arckit-hld-review` (validates database technology choices)
- **Output**: Feeds into `$arckit-dld-review` (validates schema design, indexes, query patterns)
- **Output**: Feeds into `$arckit-sow` (RFP includes data migration, data governance requirements)
- **Output**: Supports `$arckit-traceability` (DR-xxx → Entity → Attribute → HLD Component)

## Output Instructions

**CRITICAL - Token Efficiency**:

### 1. Generate Data Model

Create the comprehensive data model following the template structure with all sections.

### 2. Write Directly to File

**Use the Write tool** to create `projects/[PROJECT]/ARC-{PROJECT_ID}-DATA-v1.0.md` with the complete data model.

**DO NOT** output the full document in your response. This would exceed token limits.

### 3. Show Summary Only

After writing the file, show ONLY a concise summary:

```markdown
## Data Model Complete ✅

**Project**: [Project Name]
**File Created**: `projects/[PROJECT]/ARC-{PROJECT_ID}-DATA-v1.0.md`

### Data Model Summary

**Entities**: [Number] entities modeled
- Core Entities: [List main entities, e.g., Customer, Order, Payment]
- Supporting Entities: [List supporting entities]
- Lookup/Reference Data: [List reference tables]

**Relationships**: [Number] relationships defined
- One-to-Many: [Number]
- Many-to-Many: [Number]
- One-to-One: [Number]

**Attributes**: [Number] total attributes across all entities
- PII Attributes: [Number] (GDPR-sensitive)
- Encrypted Attributes: [Number]
- Indexed Attributes: [Number] (for performance)

**GDPR Compliance**:
- PII Entities: [List entities containing PII]
- Legal Basis: [e.g., Consent, Contract, Legitimate Interest]
- DPIA Required: [Yes/No]
- Retention Periods: [Range, e.g., 6 months to 7 years]

**Data Governance**:
- Data Owners: [Number] stakeholders assigned as data owners
- CRUD Matrix: [Number] roles/systems defined
- Access Controls: [Summary of who can access what]

**Compliance Requirements**:
- [List: GDPR, PCI-DSS, HIPAA, SOX, etc. as applicable]

**Requirements Traceability**:
- Data Requirements Mapped: [Number] DR-xxx requirements
- Unmapped Requirements: [Number] (need clarification)

### What's in the Document

- Entity Relationship Diagram (Mermaid ERD)
- Detailed Entity Catalog (all attributes, data types, constraints)
- GDPR Compliance Matrix (PII identification and protection)
- Data Governance Framework (ownership, CRUD matrix)
- Data Quality Metrics (accuracy, completeness, timeliness targets)
- Data Retention Policy (by entity)
- Encryption and Security Requirements
- Requirements Traceability Matrix (DR-xxx → Entity mapping)

### Next Steps

- Review `ARC-{PROJECT_ID}-DATA-v1.0.md` for full ERD and entity details
- Validate with data owners and stakeholders
- Run `$arckit-research` to research database technologies
- Run `$arckit-hld-review` after HLD is created
```

**Statistics to Include**:

- Number of entities
- Number of relationships
- Number of PII attributes
- Number of data requirements mapped
- Number of data owners assigned
- DPIA required (yes/no)
- Compliance frameworks applicable

Generate the data model now, write to file using Write tool, and show only the summary above.

## Suggested Next Steps

After completing this command, consider running:

- `$arckit-hld-review` -- Validate database technology choices
- `$arckit-dld-review` -- Validate schema design and query patterns
- `$arckit-sow` -- Include data migration and governance in RFP
- `$arckit-traceability` -- Map DR-xxx to entities and attributes

More from tractorjuice/arc-kit

Skill	Description
architecture-workflow	This skill should be used when the user asks how to start an architecture project, which ArcKit commands to run and in what order, what workflow to follow, getting started, new project setup, guide me through, or what comes next.
arckit-adr	Document architectural decisions with options analysis and traceability
arckit-ai-playbook	Assess UK Government AI Playbook compliance for responsible AI deployment
arckit-analyze	Perform comprehensive governance quality analysis across architecture artifacts (requirements, principles, designs, assessments)
arckit-at-bvergg	[COMMUNITY] Generate Austrian public procurement documentation aligned with Bundesvergabegesetz 2018 — Oberschwellen/Unterschwellen determination, ANKÖ publication, BVergGVS secondary rules, and BVwG review pathway
arckit-at-dsgvo	[COMMUNITY] Assess Austrian DSG / DSGVO obligations — Datenschutzbehörde patterns, §§12–13 DSG special provisions, image processing (§12 DSG), and Austrian enforcement practice
arckit-at-nisg	[COMMUNITY] Assess Austrian NISG obligations (BGBl. I Nr. 94/2025) — AT transposition of NIS2, BKA (GovCERT) / BMI (SPOC) reporting, KSÖ coordination, and Austrian sectoral rules for Essential/Important entities
arckit-atrs	Generate Algorithmic Transparency Recording Standard (ATRS) record for AI/algorithmic tools
arckit-aws-research	Research AWS services and architecture patterns using AWS Knowledge MCP for authoritative guidance
arckit-azure-research	Research Azure services and architecture patterns using Microsoft Learn MCP for authoritative guidance