diff --git a/docs/providers/json/ mgraph-json-tech-spec..md b/docs/providers/json/ mgraph-json-tech-spec..md index df7c759..9509c92 100644 --- a/docs/providers/json/ mgraph-json-tech-spec..md +++ b/docs/providers/json/ mgraph-json-tech-spec..md @@ -2,16 +2,16 @@ ## Overview -The JSON Provider will enable MGraph to ingest, manipulate, and export JSON data structures using MGraph's core graph capabilities. This document outlines the technical approach, architecture, and implementation details. +The JSON Provider enables MGraph to ingest, manipulate, and export JSON data structures using MGraph's core graph capabilities. This document outlines the technical approach, architecture, and implementation details. ## Core Concept: A Graph-Native Approach -The JSON Provider implements a pure graph-native approach to JSON representation, where every single JSON element becomes a node in its own right. This aligns perfectly with graph thinking and MGraph's core principles. +The JSON Provider implements a pure graph-native approach to JSON representation, where every JSON element becomes a node in its own right. This aligns perfectly with graph thinking and MGraph's core principles. Key Principles: - Every JSON element (value, object, array) is a node - Structure is maintained purely through edges -- No special property nodes or complex containers needed +- Property names are represented as dedicated nodes - Natural graph traversal and transformation - Maximum simplicity and flexibility @@ -24,64 +24,102 @@ Following MGraph's architectural principles, the implementation is divided into ```mermaid classDiagram %% Schema Layer - class Schema__Json__Node__Type { - <> - VALUE - DICT - LIST + class Schema__MGraph__Json__Node { + +node_data: Schema__MGraph__Node__Data } - class Schema__Json__Node__Data { + class Schema__MGraph__Json__Node__Value { + +node_data: Schema__MGraph__Json__Node__Value__Data + } + + class Schema__MGraph__Json__Node__Value__Data { +value: Any - +node_type: Schema__Json__Node__Type + +value_type: type } - class Schema__Json__Node { - +node_data: Schema__Json__Node__Data + class Schema__MGraph__Json__Node__Property { + +node_data: Schema__MGraph__Json__Node__Property__Data } + + class Schema__MGraph__Json__Node__Property__Data { + +name: str + } + + class Schema__MGraph__Json__Node__Dict + class Schema__MGraph__Json__Node__List %% Model Layer - class Model__Json__Node { - +data: Schema__Json__Node - +is_value() - +is_dict() - +is_list() - +get_value() + class Model__MGraph__Json__Node { + +data: Schema__MGraph__Json__Node + +node_id + +node_type } - %% Domain Layer - class Domain__Json__Node { - +node: Model__Json__Node - +graph: Model__Json__Graph + class Model__MGraph__Json__Node__Value { + +data: Schema__MGraph__Json__Node__Value + +value + +value_type + +is_primitive() } - Schema__Json__Node__Data -- Schema__Json__Node__Type - Schema__Json__Node *-- Schema__Json__Node__Data - Model__Json__Node *-- Schema__Json__Node - Domain__Json__Node *-- Model__Json__Node + class Model__MGraph__Json__Node__Property { + +data: Schema__MGraph__Json__Node__Property + +name + } + + %% Inheritance + Schema__MGraph__Json__Node__Value --|> Schema__MGraph__Json__Node + Schema__MGraph__Json__Node__Dict --|> Schema__MGraph__Json__Node + Schema__MGraph__Json__Node__List --|> Schema__MGraph__Json__Node + Schema__MGraph__Json__Node__Property --|> Schema__MGraph__Json__Node + + Model__MGraph__Json__Node__Value --|> Model__MGraph__Json__Node + Model__MGraph__Json__Node__Dict --|> Model__MGraph__Json__Node + Model__MGraph__Json__Node__List --|> Model__MGraph__Json__Node + Model__MGraph__Json__Node__Property --|> Model__MGraph__Json__Node ``` Each layer has clear responsibilities: -- **Schema Layer**: Pure data structures and type definitions -- **Model Layer**: Operations on single entities -- **Domain Layer**: High-level JSON operations and business logic +- **Schema Layer**: Pure data structures and type definitions through inheritance +- **Model Layer**: Operations on single entities, type validation, and value access +- **Domain Layer**: High-level JSON operations and business logic (graph traversal, queries) ### Node Types -The system recognizes three fundamental node types: +The system uses inheritance to define different types of nodes, each serving a specific purpose in representing JSON structures: ```python -class Schema__Json__Node__Type(Enum): - VALUE = "value" # Primitive values (str, int, bool, None) - DICT = "dict" # JSON objects {} - LIST = "list" # JSON arrays [] - -class Schema__Json__Node__Data(Type_Safe): - value : Any # The actual value for VALUE nodes - node_type: Schema__Json__Node__Type # Type of this node +class Schema__MGraph__Json__Node(Schema__MGraph__Node): + """Base schema for all JSON nodes""" + pass + +class Schema__MGraph__Json__Node__Value__Data: + """Value node data""" + value: Any # The actual JSON value + value_type: type # Python type of the value + +class Schema__MGraph__Json__Node__Value(Schema__MGraph__Json__Node): + """For JSON primitive values (str, int, bool, null)""" + node_data: Schema__MGraph__Json__Node__Value__Data + +class Schema__MGraph__Json__Node__Dict(Schema__MGraph__Json__Node): + """For JSON objects {}""" + pass + +class Schema__MGraph__Json__Node__List(Schema__MGraph__Json__Node): + """For JSON arrays []""" + pass + +class Schema__MGraph__Json__Node__Property__Data: + """Property node data""" + name: str # Property name + +class Schema__MGraph__Json__Node__Property(Schema__MGraph__Json__Node): + """For object property names""" + node_data: Schema__MGraph__Json__Node__Property__Data ``` -This minimal structure captures all possible JSON structures while maintaining pure graph principles. Each node is self-contained and requires no special handling or complex containers. +This inheritance-based structure provides type safety and clear separation of concerns while maintaining the simplicity of the graph representation. ## JSON to Graph Mapping @@ -95,21 +133,21 @@ In our graph-native approach, every JSON element becomes a node, creating a pure "name": "John" } ``` - Becomes three nodes: + Becomes: ``` - [DICT node] --> [VALUE node: "name"] --> [VALUE node: "John"] + [Dict node] --> [Property node: name="name"] --> [Value node: value="John", value_type=str] ``` 2. **Array**: ```json ["a", "b"] ``` - Becomes three nodes: + Becomes: ``` - [LIST node] --> [VALUE node: "a"] - [VALUE node: "b"] + [List node] --> [Value node: value="a", value_type=str] + [Value node: value="b", value_type=str] ``` - Edge order preserves array ordering. + Edge order preserves array ordering through edge metadata. 3. **Nested Structures**: ```json @@ -123,174 +161,146 @@ In our graph-native approach, every JSON element becomes a node, creating a pure ``` Becomes: ``` - [root] --has_property--> [user] --has_property--> [details] --has_property--> [age: 30] + [Dict] --> [Property: "user"] --> [Dict] --> [Property: "details"] --> [Dict] --> [Property: "age"] --> [Value: 30] ``` ### Node and Edge Patterns -The system uses a minimal set of concepts: +The system uses a minimal set of node types with clear responsibilities: **Node Types:** -| Type | Purpose | Contains | -|-------|---------|----------| -| VALUE | Represents any JSON value | The actual value (string, number, bool, null) | -| DICT | Represents JSON objects | Nothing (structure through edges) | -| LIST | Represents JSON arrays | Nothing (structure through edges) | +| Type | Purpose | Contains | +|----------|---------|----------| +| VALUE | Represents any JSON value | value + value_type | +| DICT | Represents JSON objects | Nothing (pure structure) | +| LIST | Represents JSON arrays | Nothing (pure structure) | +| PROPERTY | Represents object keys | Property name | **Edge Usage:** -- Edges maintain structure and order -- No special edge types needed -- Array order preserved through edge metadata -- Object property names stored in VALUE nodes - -## Implementation Details - -### 1. Core Classes +- Edges maintain structure and ordering +- Object property edges connect: + 1. DICT → PROPERTY → VALUE + 2. DICT → PROPERTY → DICT + 3. DICT → PROPERTY → LIST +- Array edges connect: + 1. LIST → VALUE + 2. LIST → DICT + 3. LIST → LIST +- Array ordering preserved through edge metadata +- No special edge types needed - structure implied by node types + +## Model Layer Features + +The Model layer provides type-safe operations on individual nodes: ```python -class MGraph__Json: - """Main provider class for JSON operations""" +class Model__MGraph__Json__Node__Value: + """Model for JSON value nodes""" - def load(self, json_data: Union[str, dict]) -> Domain__Json__Graph: - """Load JSON data into graph structure""" + @property + def value(self) -> Any: + """Get the actual value""" + return self.data.node_data.value - def export(self, format: str = 'json') -> Union[dict, str]: - """Export graph to specified format""" - -class Schema__Json__Node__Data: - """Extended node data for JSON values""" - json_value: Any - json_type: str - json_key: Optional[str] + @value.setter + def value(self, new_value: Any): + """Set value and automatically update type""" + self.data.node_data.value = new_value + self.data.node_data.value_type = type(new_value) + + def is_primitive(self) -> bool: + """Check if value is JSON primitive""" + return self.value_type in (str, int, float, bool, type(None)) -class Domain__Json__Graph(Domain__MGraph__Graph): - """Domain-specific graph operations for JSON""" +class Model__MGraph__Json__Node__Property: + """Model for JSON property nodes""" - def query_json_path(self, path: str) -> Any: - """Query graph using JSON path syntax""" -``` - -### 2. Key Operations - -#### Loading JSON -1. Parse JSON input -2. Create root node -3. Recursively process structure -4. Create nodes for values -5. Create edges for relationships - -```python -def _process_json(self, data: Any, parent_node: Schema__Json__Node) -> None: - if isinstance(data, dict): - self._process_object(data, parent_node) - elif isinstance(data, list): - self._process_array(data, parent_node) - else: - self._process_primitive(data, parent_node) + @property + def name(self) -> str: + """Get property name""" + return self.data.node_data.name + + @name.setter + def name(self, new_name: str): + """Set property name""" + self.data.node_data.name = new_name ``` -#### Exporting JSON -1. Start from root node -2. Recursively rebuild structure -3. Handle circular references -4. Generate output format - -### 3. Special Cases +### Special Cases | Case | Handling Strategy | |------|------------------| -| Circular References | Track visited nodes, create reference edges | -| Large Arrays | Lazy loading for arrays over threshold size | -| Deep Nesting | Implement depth limit with warning | -| Schema Validation | Optional JSON Schema validation during load | - -## Usage Examples - -### Basic Usage -```python -# Create provider -json_provider = MGraph__Json() - -# Load JSON -with open('data.json') as f: - graph = json_provider.load(f.read()) - -# Manipulate -with graph.edit() as edit: - node = edit.query_json_path('$.users[0].name') - node.set_value('New Name') - -# Export -result = json_provider.export() -``` - -### Advanced Features -```python -# Query with JSON Path -users = graph.query_json_path('$.users[*].name') - -# Export to different format -rdf = json_provider.export(format='rdf') -xml = json_provider.export(format='xml') -``` +| Circular References | Handled at Domain layer through cycle detection | +| Large Arrays | Lazy loading implemented at Domain layer | +| Deep Nesting | Depth tracking and limits at Domain layer | +| Schema Validation | Optional JSON Schema validation at Domain layer | +| Array Ordering | Edge metadata maintains sequence | +| Number Types | Preserved through value_type | ## Implementation Phases -1. **Phase 1: Core Implementation** - - Basic JSON loading - - Simple object/array handling +1. **Phase 1: Core Implementation** [Complete] + - Schema layer inheritance structure + - Model layer value handling + - Basic node types and relations - Primitive value support - - JSON export -2. **Phase 2: Advanced Features** - - JSON Path queries +2. **Phase 2: Domain Layer** + - JSON structure traversal - Circular reference handling + - Array order management - Performance optimizations - - Large dataset support -3. **Phase 3: Format Support** - - RDF export - - XML export - - Other format support +3. **Phase 3: Advanced Features** + - JSON Path queries + - Format conversion + - Schema validation + - Large dataset support ## Testing Strategy 1. **Unit Tests** - - Individual component testing - - Edge case validation - - Type conversion verification + - Schema inheritance validation + - Model layer operations + - Value type handling + - Property name management 2. **Integration Tests** - - End-to-end workflows - - Format conversion accuracy - - Performance benchmarks + - Node type interactions + - Value updates and type changes + - Property name modifications + - Inheritance chain verification -3. **Validation Tests** - - JSON Schema compliance - - Circular reference handling - - Deep nesting scenarios +3. **Domain Tests** (Future) + - Structure traversal + - Circular references + - Array ordering + - Deep nesting ## Success Criteria -1. **Functionality** - - Accurate JSON representation - - Lossless round-trip conversion - - Efficient graph operations +1. **Type Safety** + - Clear inheritance hierarchy + - Strong type checking + - Proper value type handling + - Consistent node type system -2. **Performance** - - Linear time complexity for basic operations - - Efficient memory usage - - Scalable with large datasets +2. **Simplicity** + - Minimal node types + - Clear responsibilities + - Intuitive model operations + - Simple property access -3. **Usability** - - Simple, intuitive API - - Clear error messages - - Comprehensive documentation +3. **Extensibility** + - Easy to add node types + - Flexible value handling + - Clear extension points + - Domain layer preparation ## Next Steps -1. Implement Schema__Json__Node__Data -2. Create basic JSON loading functionality -3. Implement graph to JSON export -4. Add JSON Path query support -5. Develop format conversion capabilities \ No newline at end of file +1. Complete Domain layer implementation +2. Add array ordering metadata +3. Implement traversal helpers +4. Add format conversion +5. Implement query capabilities \ No newline at end of file