arckit-data-model
$
npx mdskill add tractorjuice/arc-kit/arckit-data-modelBuild compliant data models anchored in explicit requirements.
- Generates entity relationships and governance rules from DR and NFR-SEC inputs.
- Depends on ARC-*.md artifacts and global policy documents for context.
- Validates outputs against mandatory requirements before model generation.
- Delivers structured schemas ready for database design and API specs.
SKILL.md
.github/skills/arckit-data-modelView on GitHub ↗
---
name: arckit-data-model
description: "Create comprehensive data model with entity relationships, GDPR compliance, and data governance"
---
You are helping an enterprise architect create a comprehensive data model for a project that will guide database design, API specifications, and compliance requirements.
## User Input
```text
$ARGUMENTS
```
## Instructions
> **Note**: Before generating, scan `projects/` for existing project directories. For each project, list all `ARC-*.md` artifacts, check `external/` for reference documents, and check `000-global/` for cross-project policies. If no external docs exist but they would improve output, ask the user.
1. **Read existing artifacts from the project context:**
**MANDATORY** (warn if missing):
- **REQ** (Requirements)
- Extract: All DR (data requirements), NFR-SEC (security/privacy), INT (integration/data exchange), BR (data-related business requirements)
- If missing: STOP and warn user to run `$arckit-requirements` first — data model MUST be based on DR-xxx requirements
**RECOMMENDED** (read if available, note if missing):
- **STKE** (Stakeholder Analysis)
- Extract: Data owners from RACI matrix, governance stakeholders, data stewardship responsibilities
- **PRIN** (Architecture Principles, in 000-global)
- Extract: Data governance standards, privacy by design principles, data sovereignty requirements
**OPTIONAL** (read if available, skip silently if missing):
- **SOBC** (Business Case)
- Extract: Data-related benefits and costs
- **RSCH** (Research Findings)
- Extract: Database technology recommendations, data platform choices
2. **Identify the target project**:
- Use the **ArcKit Project Context** (above) to find the project matching the user's input (by name or number)
- If no match, create a new project:
1. Use Glob to list `projects/*/` directories and find the highest `NNN-*` number (or start at `001` if none exist)
2. Calculate the next number (zero-padded to 3 digits, e.g., `002`)
3. Slugify the project name (lowercase, replace non-alphanumeric with hyphens, trim)
4. Use the Write tool to create `projects/{NNN}-{slug}/README.md` with the project name, ID, and date — the Write tool will create all parent directories automatically
5. Also create `projects/{NNN}-{slug}/external/README.md` with a note to place external reference documents here
6. Set `PROJECT_ID` = the 3-digit number, `PROJECT_PATH` = the new directory path
3. **Read external documents and policies**:
- Read any **external documents** listed in the project context (`external/` files) — extract entity definitions, relationships, data types, constraints, existing schemas, migration requirements
- Read any **enterprise standards** in `projects/000-global/external/` — extract enterprise data dictionaries, master data management standards, cross-project data architecture patterns
- If no external docs exist but they would improve the data model, ask: "Do you have any existing database schemas, ERD diagrams, or data dictionaries? I can read PDFs, images, and SQL files directly. Place them in `projects/{project-dir}/external/` and re-run, or skip."
- **Citation traceability**: When referencing content from external documents, follow the citation instructions in `.arckit/references/citation-instructions.md`. Place inline citation markers (e.g., `[PP-C1]`) next to findings informed by source documents and populate the "External References" section in the template.
4. **Read the template** (with user override support):
- **First**, check if `.arckit/templates/data-model-template.md` exists in the project root
- **If found**: Read the user's customized template (user override takes precedence)
- **If not found**: Read `.arckit/templates/data-model-template.md` (default)
> **Tip**: Users can customize templates with `$arckit-customize data-model`
5. **Extract data requirements**:
- Read the project's requirements document (`ARC-*-REQ-*.md`)
- Extract ALL Data Requirements (DR-xxx)
- Also look for privacy/GDPR requirements in NFR section
- Identify integration requirements (INT-xxx) that involve data exchange
- Note any data-related business requirements (BR-xxx)
6. **Load Mermaid Syntax Reference**:
- Read `.arckit/skills/mermaid-syntax/references/entityRelationshipDiagram.md` for official Mermaid ER diagram syntax — entity definitions, relationship types, cardinality notation, and attribute syntax.
7. **Generate comprehensive data model**:
**A. Executive Summary**:
- Total number of entities identified
- Data classification summary (Public, Internal, Confidential, Restricted)
- PII/sensitive data identified (Yes/No)
- GDPR/DPA 2018 compliance status
- Key data governance stakeholders
**B. Visual Entity-Relationship Diagram (ERD)**:
- Create Mermaid ERD syntax showing:
- All entities (E-001, E-002, etc.)
- Relationships (one-to-one, one-to-many, many-to-many)
- Cardinality notation
- Organise by logical domain/bounded context if possible
- Use descriptive entity and relationship names
**C. Entity Catalog** (E-001, E-002, etc.):
- For each entity, document:
- **Entity ID**: E-001, E-002, etc.
- **Entity Name**: Customer, Transaction, Product, etc.
- **Description**: What this entity represents
- **Source Requirement**: Which DR-xxx requirement(s) drive this entity
- **Business Owner**: From stakeholder RACI matrix
- **Technical Owner**: Data steward or database team
- **Data Classification**: Public/Internal/Confidential/Restricted
- **Estimated Volume**: Initial records + growth rate
- **Retention Period**: How long data is kept (GDPR requirement)
- **Attributes Table**:
```text
| Attribute | Type | Required | PII | Description | Validation | Source Req |
|-----------|------|----------|-----|-------------|------------|------------|
| customer_id | UUID | Yes | No | Unique identifier | UUID v4 | DR-001 |
| email | String(255) | Yes | Yes | Contact email | RFC 5322, unique | DR-002 |
```
- **Relationships**: What other entities this connects to
- **Indexes**: Primary key, foreign keys, performance indexes
- **Privacy Notes**: GDPR considerations, data subject rights
**D. Data Governance Matrix**:
- For each entity, identify:
- **Data Owner**: Business stakeholder responsible (from RACI matrix)
- **Data Steward**: Person responsible for quality and compliance
- **Data Custodian**: Technical team managing storage/backups
- **Access Control**: Who can view/modify (roles/permissions)
- **Sensitivity**: Public, Internal, Confidential, Restricted
- **Compliance**: GDPR, PCI-DSS, HIPAA, etc.
- **Quality SLA**: Accuracy, completeness, timeliness targets
**E. CRUD Matrix** (Create, Read, Update, Delete):
- Map which components/systems can perform which operations on each entity
- Example:
```text
| Entity | Payment API | Admin Portal | Reporting Service | CRM Integration |
|--------|-------------|--------------|-------------------|-----------------|
| E-001: Customer | CR-- | CRUD | -R-- | -R-- |
| E-002: Transaction | CR-- | -R-- | -R-- | ---- |
```
- Helps identify unauthorized access patterns and data flows
**F. Data Integration Mapping**:
- **Upstream Systems**: Where data comes from
- System name, entity mapping, update frequency, data quality SLA
- **Downstream Systems**: Where data goes to
- System name, entity mapping, sync method (API, batch, event), latency SLA
- **Master Data Management**: Which system is "source of truth" for each entity
**G. Privacy & Compliance**:
- **GDPR/DPA 2018 Compliance**:
- List all PII attributes across all entities
- Document legal basis for processing (consent, contract, legitimate interest, etc.)
- Data subject rights implementation (access, rectification, erasure, portability)
- Data retention schedules per entity
- Cross-border data transfer considerations (UK-EU adequacy)
- **Data Protection Impact Assessment (DPIA)**:
- Is DPIA required? (Yes if high-risk processing of PII)
- Key privacy risks identified
- Mitigation measures
- ICO notification requirements
- **Sector-Specific Compliance**:
- PCI-DSS: If payment card data (special handling requirements)
- HIPAA: If healthcare data (US projects)
- FCA regulations: If financial services (UK)
- Government Security Classifications: If public sector (OFFICIAL, SECRET)
**H. Data Quality Framework**:
- **Quality Dimensions**:
- **Accuracy**: How correct is the data? (validation rules, reference data)
- **Completeness**: Required fields populated? (% target)
- **Consistency**: Same data across systems? (reconciliation rules)
- **Timeliness**: How current is the data? (update frequency, staleness tolerance)
- **Uniqueness**: No duplicates? (deduplication rules)
- **Validity**: Conforms to format? (regex patterns, enums, ranges)
- **Data Quality Metrics**:
- Define measurable targets per entity (e.g., "Customer email accuracy >99%")
- Data quality monitoring approach
- Data quality issue resolution process
**I. Requirements Traceability**:
- Create traceability table:
```text
| Requirement | Entity | Attributes | Rationale |
|-------------|--------|------------|-----------|
| DR-001 | E-001: Customer | customer_id, email, name | Store customer identity |
| DR-002 | E-002: Transaction | transaction_id, amount, status | Track payments |
| NFR-SEC-003 | E-001: Customer | password_hash (encrypted) | Secure authentication |
```
- Show how every DR-xxx requirement maps to entities/attributes
- Flag any DR-xxx requirements NOT yet modeled (gaps)
**J. Implementation Guidance**:
- **Database Technology Recommendation**:
- Relational (PostgreSQL, MySQL) for transactional data
- Document (MongoDB, DynamoDB) for flexible schemas
- Graph (Neo4j) for highly connected data
- Time-series (InfluxDB, TimescaleDB) for metrics/events
- **Schema Migration Strategy**: How to evolve schema (Flyway, Liquibase, Alembic)
- **Backup and Recovery**: RPO/RTO targets, backup frequency
- **Data Archival**: When to move data from hot to cold storage
- **Testing Data**: Anonymization/pseudonymization for test environments
8. **UK Government Compliance** (if applicable):
- **Government Security Classifications**: OFFICIAL, SECRET, TOP SECRET
- **Data Standards**: Use GDS Data Standards Catalogue where applicable
- **Open Standards**: Preference for open data formats (JSON, CSV, OData)
- **ICO Data Protection**: Reference ICO guidance for public sector
- **National Cyber Security Centre (NCSC)**: Data security patterns
Before writing the file, read `.arckit/references/quality-checklist.md` and verify all **Common Checks** plus the **DATA** per-type checks pass. Fix any failures before proceeding.
9. **Write the output**:
- Write to `projects/{project-dir}/ARC-{PROJECT_ID}-DATA-v1.0.md`
- Use the exact template structure from `data-model-template.md`
- Include Mermaid ERD at the top for quick visualization
- Include all sections even if some are TBD
- Create comprehensive entity catalog with ALL attributes
**IMPORTANT - Auto-Populate Document Information Fields**:
Before completing the document, populate document information fields:
### Auto-populated fields
- `[PROJECT_ID]` → Extract from project path (e.g., "001")
- `[VERSION]` → Start with "1.0" for new documents
- `[DATE]` / `[YYYY-MM-DD]` → Current date in YYYY-MM-DD format
- `[DOCUMENT_TYPE_NAME]` → Document purpose
- `ARC-[PROJECT_ID]-DATA-v[VERSION]` → Generated document ID
- `[STATUS]` → "DRAFT" for new documents
- `[CLASSIFICATION]` → Default to "OFFICIAL" (UK Gov) or "PUBLIC"
### User-provided fields
- `[PROJECT_NAME]` → Full project name
- `[OWNER_NAME_AND_ROLE]` → Document owner
### Revision History
```markdown
| 1.0 | {DATE} | ArcKit AI | Initial creation from `$arckit-data-model` command |
```
### Generation Metadata Footer
```markdown
**Generated by**: ArcKit `$arckit-data-model` command
**Generated on**: {DATE}
**ArcKit Version**: {ARCKIT_VERSION}
**Project**: {PROJECT_NAME} (Project {PROJECT_ID})
**AI Model**: [Actual model name]
```
10. **Summarize what you created**:
- How many entities defined (E-001, E-002, etc.)
- How many total attributes across all entities
- How many entities contain PII (privacy-sensitive)
- Data classification breakdown (Public/Internal/Confidential/Restricted)
- GDPR compliance status (compliant / needs DPIA / gaps identified)
- Key data governance stakeholders identified
- Requirements coverage (% of DR-xxx requirements modeled)
- Suggested next steps (e.g., "Review data model with data protection officer before proceeding to HLD" or "Run `$arckit-hld-review` to validate database technology choices")
## Example Usage
User: `$arckit-data-model Create data model for payment gateway project`
You should:
- Check prerequisites (requirements document exists, stakeholder analysis recommended)
- Find project directory (e.g., `projects/001-payment-gateway-modernization/`)
- Extract DR-xxx requirements from the requirements document
- Generate comprehensive data model:
- Mermaid ERD showing Customer, Transaction, PaymentMethod, RefundRequest entities
- Detailed entity catalog with attributes, PII flags, retention periods
- GDPR compliance: PII identified, legal basis documented, DPIA required
- Data governance: CFO owns financial data, DPO owns PII, IT owns storage
- CRUD matrix: Payment API can create transactions, Admin can read all, Reporting read-only
- PCI-DSS compliance: Payment card data encrypted, tokenized, not stored long-term
- Requirements traceability: All DR-001 through DR-008 mapped to entities
- **CRITICAL - Token Efficiency**: Use the **Write tool** to create `projects/001-payment-gateway-modernization/ARC-001-DATA-v1.0.md`
- **DO NOT** output the full document in your response (this exceeds 32K token limit!)
- Show summary only (see Output Instructions below)
## Important Notes
- **Data model drives database schema, API contracts, and data governance policies**
- **GDPR compliance is MANDATORY for any PII - identify and protect it**
- **Every entity MUST trace back to at least one DR-xxx requirement**
- **Data ownership is critical - assign business owners from stakeholder RACI matrix**
- **PII requires special handling**: encryption at rest, encryption in transit, access controls, audit logging, retention limits
- **Use Mermaid ERD syntax** for GitHub-renderable diagrams (not PlantUML or other formats)
- **Data quality metrics should be measurable** (not "high quality", use "99% accuracy")
- **Consider data lifecycle**: creation, updates, archival, deletion (GDPR "right to erasure")
- **Reference architecture principles** from any `ARC-000-PRIN-*.md` file in `projects/000-global/` if they exist
- **Flag any DR-xxx requirements that cannot be modeled** (gaps for requirements clarification)
- **UK Government data projects**: The data model supports [National Data Strategy](https://www.gov.uk/government/publications/uk-national-data-strategy/national-data-strategy) alignment — Data Foundations pillar (metadata, standards, quality) and Availability pillar (data access, sharing). The Data Quality Framework section maps to the [Government Data Quality Framework](https://www.gov.uk/government/publications/the-government-data-quality-framework/the-government-data-quality-framework) 6 dimensions. See `docs/guides/national-data-strategy.md` and `docs/guides/data-quality-framework.md` for full mappings.
- **Markdown escaping**: When writing less-than or greater-than comparisons, always include a space after `<` or `>` (e.g., `< 3 seconds`, `> 99.9% uptime`) to prevent markdown renderers from interpreting them as HTML tags or emoji
## Integration with Other Commands
- **Input**: Requires requirements document (`ARC-*-REQ-*.md`) for DR-xxx requirements
- **Input**: Uses stakeholder analysis (`ARC-*-STKE-*.md`) for data ownership RACI matrix
- **Input**: References SOBC (`ARC-*-SOBC-*.md`) for data-related costs and benefits
- **Output**: Feeds into `$arckit-hld-review` (validates database technology choices)
- **Output**: Feeds into `$arckit-dld-review` (validates schema design, indexes, query patterns)
- **Output**: Feeds into `$arckit-sow` (RFP includes data migration, data governance requirements)
- **Output**: Supports `$arckit-traceability` (DR-xxx → Entity → Attribute → HLD Component)
## Output Instructions
**CRITICAL - Token Efficiency**:
### 1. Generate Data Model
Create the comprehensive data model following the template structure with all sections.
### 2. Write Directly to File
**Use the Write tool** to create `projects/[PROJECT]/ARC-{PROJECT_ID}-DATA-v1.0.md` with the complete data model.
**DO NOT** output the full document in your response. This would exceed token limits.
### 3. Show Summary Only
After writing the file, show ONLY a concise summary:
```markdown
## Data Model Complete ✅
**Project**: [Project Name]
**File Created**: `projects/[PROJECT]/ARC-{PROJECT_ID}-DATA-v1.0.md`
### Data Model Summary
**Entities**: [Number] entities modeled
- Core Entities: [List main entities, e.g., Customer, Order, Payment]
- Supporting Entities: [List supporting entities]
- Lookup/Reference Data: [List reference tables]
**Relationships**: [Number] relationships defined
- One-to-Many: [Number]
- Many-to-Many: [Number]
- One-to-One: [Number]
**Attributes**: [Number] total attributes across all entities
- PII Attributes: [Number] (GDPR-sensitive)
- Encrypted Attributes: [Number]
- Indexed Attributes: [Number] (for performance)
**GDPR Compliance**:
- PII Entities: [List entities containing PII]
- Legal Basis: [e.g., Consent, Contract, Legitimate Interest]
- DPIA Required: [Yes/No]
- Retention Periods: [Range, e.g., 6 months to 7 years]
**Data Governance**:
- Data Owners: [Number] stakeholders assigned as data owners
- CRUD Matrix: [Number] roles/systems defined
- Access Controls: [Summary of who can access what]
**Compliance Requirements**:
- [List: GDPR, PCI-DSS, HIPAA, SOX, etc. as applicable]
**Requirements Traceability**:
- Data Requirements Mapped: [Number] DR-xxx requirements
- Unmapped Requirements: [Number] (need clarification)
### What's in the Document
- Entity Relationship Diagram (Mermaid ERD)
- Detailed Entity Catalog (all attributes, data types, constraints)
- GDPR Compliance Matrix (PII identification and protection)
- Data Governance Framework (ownership, CRUD matrix)
- Data Quality Metrics (accuracy, completeness, timeliness targets)
- Data Retention Policy (by entity)
- Encryption and Security Requirements
- Requirements Traceability Matrix (DR-xxx → Entity mapping)
### Next Steps
- Review `ARC-{PROJECT_ID}-DATA-v1.0.md` for full ERD and entity details
- Validate with data owners and stakeholders
- Run `$arckit-research` to research database technologies
- Run `$arckit-hld-review` after HLD is created
```
**Statistics to Include**:
- Number of entities
- Number of relationships
- Number of PII attributes
- Number of data requirements mapped
- Number of data owners assigned
- DPIA required (yes/no)
- Compliance frameworks applicable
Generate the data model now, write to file using Write tool, and show only the summary above.
## Suggested Next Steps
After completing this command, consider running:
- `$arckit-hld-review` -- Validate database technology choices
- `$arckit-dld-review` -- Validate schema design and query patterns
- `$arckit-sow` -- Include data migration and governance in RFP
- `$arckit-traceability` -- Map DR-xxx to entities and attributes