Skip to content

Test Plan & Test Cases: apcore-mcp

Field Value
Title apcore-mcp Test Plan & Test Cases
Version 1.0
Date 2026-02-15
Author aipartnerup QA Team
Status Draft
PRD Ref docs/prd-apcore-mcp.md v1.0
Tech Design docs/tech-design-apcore-mcp.md v1.0
License Apache 2.0

1. Test Plan Overview

1.1 Purpose and Scope

This document defines the comprehensive test plan and test cases for apcore-mcp, the automatic MCP Server and OpenAI Tools Bridge for the apcore ecosystem. As a greenfield project developed under TDD strict mode, this test plan establishes the testing standard before any implementation code is written. Every test case defined here will be implemented as executable pytest code prior to the corresponding production code.

The scope covers all 20 PRD features (F-001 through F-020) across 9 architectural components: Schema Converter, Annotation Mapper, Execution Router, Error Mapper, MCP Server Factory, OpenAI Converter, Transport Manager, CLI Module, and Dynamic Registry Listener. Testing spans five levels: unit, integration, end-to-end, performance, and security.

1.2 Test Objectives

  1. Validate 100% schema mapping accuracy between apcore ModuleDescriptor fields and MCP/OpenAI tool definitions.
  2. Verify all 8 apcore error types plus unexpected exceptions are correctly mapped to MCP error responses.
  3. Confirm all 3 transport types (stdio, Streamable HTTP, SSE) function correctly.
  4. Ensure annotation preservation rate of 100% (all 5 apcore annotation fields mapped).
  5. Validate performance targets: <100ms for 100-module registration, <5ms tool call overhead, <10MB memory for 100 modules.
  6. Confirm security guarantees: no sensitive data leakage, ACL enforcement, error sanitization.

1.3 Quality Goals

Metric Target
Line coverage >= 90%
P0 feature test pass 100%
P1 feature test pass >= 95%
P2 feature test pass >= 90%
Unit test pass rate 100%
Integration test pass 100%
Performance benchmarks All pass

2. Test Strategy

2.1 Test Levels and Pyramid Distribution

Level Target % Approx Count Description
Unit 60% ~95 Individual component behavior in isolation
Integration 25% ~20 Multi-component workflows, end-to-end data flows
E2E 10% ~8 Full server lifecycle with real MCP client
Performance 3% ~7 Benchmarks, stress tests, memory profiling
Security 2% ~6 ACL enforcement, error sanitization, input fuzzing

2.2 Test Frameworks and Tools

Tool Purpose
pytest >= 7.0 Test runner, fixtures, parametrize
pytest-asyncio Async test support (asyncio_mode = "auto")
pytest-cov >= 4.0 Line coverage measurement and enforcement
pytest-benchmark Performance benchmarking (tool call overhead)
unittest.mock MagicMock, AsyncMock for apcore/MCP SDK mocks
tracemalloc Memory profiling for performance tests

2.3 Mock Strategy

Component Under Test What to Mock What to Use Real
SchemaConverter Nothing Real: pure dict transform
AnnotationMapper Nothing Real: pure function
ErrorMapper Nothing Real: pure function
ModuleIDNormalizer Nothing Real: pure function
ExecutionRouter Mock: Executor.call_async() Real: ErrorMapper
MCPServerFactory Mock: mcp.server.lowlevel.Server Real: SchemaConverter, AnnotationMapper
OpenAIConverter Mock: Registry Real: SchemaConverter, AnnotationMapper, IDNormalizer
TransportManager Mock: Server, stdio_server Real: validation logic
CLI Module Mock: Registry, serve() Real: argparse
RegistryListener Mock: Registry.on(), Factory Real: internal tools dict
Integration tests Mock: module execute() only Real: all apcore-mcp components
E2E tests Nothing Real: full stack

2.4 Fixtures Strategy

Shared fixtures defined in tests/conftest.py:

  • sample_annotations: ModuleAnnotations with non-default values
  • sample_descriptor: ModuleDescriptor for "image.resize" with full schema
  • descriptor_with_refs: ModuleDescriptor with $defs/$ref in input_schema
  • descriptor_empty_schema: ModuleDescriptor with empty input_schema
  • descriptor_no_annotations: ModuleDescriptor with annotations=None
  • mock_registry: MagicMock(spec=Registry) returning sample descriptors
  • mock_executor: MagicMock(spec=Executor) with AsyncMock call_async
  • multi_module_registry: Registry mock with 5 diverse modules
  • large_registry: Registry mock with 100 modules for performance tests

3. Test Environment

3.1 Python Version Requirements

Version CI Matrix Notes
3.10 Yes Minimum supported
3.11 Yes Secondary
3.12 Yes apcore-python dev version
3.13 Yes Latest stable

3.2 Dependency Versions

Package Version Constraint
apcore >= 0.2.0, < 1.0
mcp >= 1.0.0, < 2.0
pytest >= 7.0
pytest-asyncio >= 0.21
pytest-cov >= 4.0
pytest-benchmark >= 4.0

3.3 Test Configuration (pyproject.toml)

[tool.pytest.ini_options]
testpaths = ["tests"]
asyncio_mode = "auto"
markers = [
    "unit: Unit tests",
    "integration: Integration tests",
    "e2e: End-to-end tests",
    "performance: Performance benchmark tests",
    "security: Security tests",
    "slow: Tests that take > 5 seconds",
]

[tool.coverage.run]
source = ["src/apcore_mcp"]
omit = ["src/apcore_mcp/__main__.py"]

[tool.coverage.report]
fail_under = 90
show_missing = true

3.4 CI/CD Integration Plan

Stage Trigger Tests Run
Pre-commit git commit Unit tests (fast)
PR validation Pull request Unit + Integration + Security
Nightly Cron (daily) All tests including E2E + Perf
Release gate Tag push (v*) Full suite, coverage enforcement

4. Feature-to-Test Traceability Matrix

PRD Feature Description Priority Test Case IDs
F-001 Registry-to-MCP Schema Mapping P0 TC-SCHEMA-001 to TC-SCHEMA-012, TC-INT-001
F-002 Annotation-to-MCP Mapping P0 TC-ANNOT-001 to TC-ANNOT-010, TC-INT-001
F-003 MCP Execution Routing P0 TC-EXEC-001 to TC-EXEC-012, TC-INT-001, TC-INT-003
F-004 MCP Error Mapping P0 TC-ERROR-001 to TC-ERROR-011, TC-INT-003
F-005 serve() Function P0 TC-SERVER-001 to TC-SERVER-010, TC-INT-001, TC-E2E-001
F-006 stdio Transport P0 TC-TRANSPORT-001 to TC-TRANSPORT-003, TC-INT-002, TC-E2E-001
F-007 Streamable HTTP Transport P0 TC-TRANSPORT-004 to TC-TRANSPORT-006, TC-INT-002, TC-E2E-002
F-008 to_openai_tools() Function P0 TC-OPENAI-001 to TC-OPENAI-012, TC-INT-004
F-009 CLI Entry Point P0 TC-CLI-001 to TC-CLI-010, TC-E2E-001
F-010 SSE Transport P1 TC-TRANSPORT-007 to TC-TRANSPORT-009, TC-INT-002
F-011 OpenAI Annotation Embedding P1 TC-OPENAI-005, TC-OPENAI-006
F-012 OpenAI Strict Mode P1 TC-OPENAI-007, TC-OPENAI-008, TC-OPENAI-009
F-013 Structured Output Responses P1 TC-EXEC-001, TC-EXEC-008
F-014 Executor Passthrough P1 TC-SERVER-003, TC-INT-005
F-015 Dynamic Tool Registration P1 TC-DYNAMIC-001 to TC-DYNAMIC-007, TC-INT-006, TC-E2E-003
F-016 Logging and Observability P1 TC-SERVER-009, TC-EXEC-006
F-017 to_openai_tools() Filtering P2 TC-OPENAI-010, TC-OPENAI-011
F-018 serve() Module Filtering P2 TC-SERVER-007, TC-SERVER-008
F-019 Health Check Endpoint P2 TC-E2E-004
F-020 MCP Resource Exposure P2 TC-E2E-005

5. Test Cases by Component

5.1 Schema Converter (TC-SCHEMA-xxx)

Test File: tests/unit/adapters/test_schema.py


TC-SCHEMA-001: Convert simple schema without $ref

  • Priority: P0
  • Type: Unit
  • Preconditions: SchemaConverter instance created
  • Test Data:
    input_schema = {
        "type": "object",
        "title": "ImageResizeInput",
        "properties": {
            "width": {"type": "integer", "description": "Target width in pixels"},
            "height": {"type": "integer", "description": "Target height in pixels"},
            "format": {"type": "string", "default": "png", "enum": ["png", "jpg", "webp"]}
        },
        "required": ["width", "height"]
    }
    descriptor = ModuleDescriptor(
        module_id="image.resize",
        description="Resize an image",
        input_schema=input_schema,
        output_schema={}
    )
    
  • Steps:
  • Create SchemaConverter instance.
  • Call converter.convert_input_schema(descriptor).
  • Assert the returned dict equals the input_schema exactly (no transformation needed).
  • Expected Result: Returned dict is identical to the input input_schema dict. All properties, types, required fields, and enums are preserved.
  • Traceability: F-001

TC-SCHEMA-002: Convert schema with single-level $ref inlining

  • Priority: P0
  • Type: Unit
  • Preconditions: SchemaConverter instance created
  • Test Data:
    input_schema = {
        "type": "object",
        "properties": {
            "workflow_name": {"type": "string"},
            "parameters": {"$ref": "#/$defs/WorkflowParams"}
        },
        "required": ["workflow_name", "parameters"],
        "$defs": {
            "WorkflowParams": {
                "type": "object",
                "properties": {
                    "seed": {"type": "integer", "default": 42},
                    "steps": {"type": "integer", "default": 20}
                }
            }
        }
    }
    
  • Steps:
  • Create SchemaConverter instance.
  • Call converter.convert_input_schema(descriptor).
  • Assert $defs key is not present in result.
  • Assert parameters property contains the inlined object definition.
  • Expected Result:
    {
        "type": "object",
        "properties": {
            "workflow_name": {"type": "string"},
            "parameters": {
                "type": "object",
                "properties": {
                    "seed": {"type": "integer", "default": 42},
                    "steps": {"type": "integer", "default": 20}
                }
            }
        },
        "required": ["workflow_name", "parameters"]
    }
    
  • Traceability: F-001

TC-SCHEMA-003: Convert schema with nested $ref (A references B)

  • Priority: P0
  • Type: Unit
  • Preconditions: SchemaConverter instance created
  • Test Data:
    input_schema = {
        "type": "object",
        "properties": {
            "config": {"$ref": "#/$defs/Config"}
        },
        "required": ["config"],
        "$defs": {
            "Config": {
                "type": "object",
                "properties": {
                    "output": {"$ref": "#/$defs/OutputSettings"}
                }
            },
            "OutputSettings": {
                "type": "object",
                "properties": {
                    "format": {"type": "string"},
                    "quality": {"type": "integer"}
                }
            }
        }
    }
    
  • Steps:
  • Call converter.convert_input_schema(descriptor).
  • Assert $defs is removed.
  • Assert nested OutputSettings is inlined inside Config.properties.output.
  • Expected Result: Both $ref nodes are resolved. config.properties.output contains {"type": "object", "properties": {"format": {"type": "string"}, "quality": {"type": "integer"}}}. No $defs key in result.
  • Traceability: F-001

TC-SCHEMA-004: Detect circular $ref and raise ValueError

  • Priority: P0
  • Type: Unit
  • Preconditions: SchemaConverter instance created
  • Test Data:
    input_schema = {
        "type": "object",
        "properties": {
            "node": {"$ref": "#/$defs/TreeNode"}
        },
        "$defs": {
            "TreeNode": {
                "type": "object",
                "properties": {
                    "value": {"type": "string"},
                    "child": {"$ref": "#/$defs/TreeNode"}
                }
            }
        }
    }
    
  • Steps:
  • Call converter.convert_input_schema(descriptor).
  • Assert ValueError is raised.
  • Expected Result: ValueError is raised with a message indicating circular $ref detected.
  • Traceability: F-001

TC-SCHEMA-005: Convert empty input_schema to valid object schema

  • Priority: P0
  • Type: Unit
  • Preconditions: SchemaConverter instance created
  • Test Data:
    descriptor = ModuleDescriptor(
        module_id="system.ping",
        description="Health check",
        input_schema={},
        output_schema={}
    )
    
  • Steps:
  • Call converter.convert_input_schema(descriptor).
  • Assert result has "type": "object" and "properties": {}.
  • Expected Result: {"type": "object", "properties": {}}.
  • Traceability: F-001 (AC5)

TC-SCHEMA-006: Strip $defs when no $ref references exist

  • Priority: P1
  • Type: Unit
  • Preconditions: SchemaConverter instance created
  • Test Data:
    input_schema = {
        "type": "object",
        "properties": {
            "name": {"type": "string"}
        },
        "$defs": {
            "UnusedModel": {
                "type": "object",
                "properties": {"x": {"type": "integer"}}
            }
        }
    }
    
  • Steps:
  • Call converter.convert_input_schema(descriptor).
  • Assert $defs key is not present in result.
  • Assert properties.name is preserved.
  • Expected Result: {"type": "object", "properties": {"name": {"type": "string"}}}.
  • Traceability: F-001

TC-SCHEMA-007: Convert schema with array items containing $ref

  • Priority: P1
  • Type: Unit
  • Preconditions: SchemaConverter instance created
  • Test Data:
    input_schema = {
        "type": "object",
        "properties": {
            "tags": {
                "type": "array",
                "items": {"$ref": "#/$defs/Tag"}
            }
        },
        "$defs": {
            "Tag": {
                "type": "object",
                "properties": {
                    "key": {"type": "string"},
                    "value": {"type": "string"}
                }
            }
        }
    }
    
  • Steps:
  • Call converter.convert_input_schema(descriptor).
  • Assert tags.items contains the inlined Tag object.
  • Expected Result: tags.items equals {"type": "object", "properties": {"key": {"type": "string"}, "value": {"type": "string"}}}. No $defs.
  • Traceability: F-001

TC-SCHEMA-008: Convert schema with oneOf containing $ref

  • Priority: P1
  • Type: Unit
  • Preconditions: SchemaConverter instance created
  • Test Data:
    input_schema = {
        "type": "object",
        "properties": {
            "source": {
                "oneOf": [
                    {"$ref": "#/$defs/FileSource"},
                    {"$ref": "#/$defs/URLSource"}
                ]
            }
        },
        "$defs": {
            "FileSource": {"type": "object", "properties": {"path": {"type": "string"}}},
            "URLSource": {"type": "object", "properties": {"url": {"type": "string", "format": "uri"}}}
        }
    }
    
  • Steps:
  • Call converter.convert_input_schema(descriptor).
  • Assert source.oneOf contains two inlined objects.
  • Expected Result: source.oneOf[0] equals {"type": "object", "properties": {"path": {"type": "string"}}}. source.oneOf[1] equals {"type": "object", "properties": {"url": {"type": "string", "format": "uri"}}}. No $defs.
  • Traceability: F-001

TC-SCHEMA-009: Ensure root type is object when missing

  • Priority: P1
  • Type: Unit
  • Preconditions: SchemaConverter instance created
  • Test Data:
    input_schema = {
        "properties": {
            "name": {"type": "string"}
        },
        "required": ["name"]
    }
    
  • Steps:
  • Call converter.convert_input_schema(descriptor).
  • Assert result has "type": "object".
  • Expected Result: Result includes "type": "object" alongside existing properties and required.
  • Traceability: F-001

TC-SCHEMA-010: Handle maximum nesting depth (32 levels)

  • Priority: P2
  • Type: Unit
  • Preconditions: SchemaConverter instance created
  • Test Data: Programmatically generated schema with 33 levels of nested $ref:
    defs = {}
    for i in range(33):
        name = f"Level{i}"
        next_name = f"Level{i+1}"
        if i < 32:
            defs[name] = {"type": "object", "properties": {"child": {"$ref": f"#/$defs/{next_name}"}}}
        else:
            defs[name] = {"type": "object", "properties": {"value": {"type": "string"}}}
    input_schema = {
        "type": "object",
        "properties": {"root": {"$ref": "#/$defs/Level0"}},
        "$defs": defs
    }
    
  • Steps:
  • Call converter.convert_input_schema(descriptor).
  • Assert ValueError is raised due to depth exceeding 32.
  • Expected Result: ValueError raised indicating maximum recursion depth exceeded.
  • Traceability: F-001

TC-SCHEMA-011: Preserve Unicode in property descriptions

  • Priority: P2
  • Type: Unit
  • Preconditions: SchemaConverter instance created
  • Test Data:
    input_schema = {
        "type": "object",
        "properties": {
            "name": {"type": "string", "description": "username"},
            "greeting": {"type": "string", "description": "Bonjour, comment allez-vous?"}
        }
    }
    
  • Steps:
  • Call converter.convert_input_schema(descriptor).
  • Assert Unicode descriptions are preserved exactly.
  • Expected Result: properties.name.description equals "username". properties.greeting.description equals "Bonjour, comment allez-vous?".
  • Traceability: F-001

TC-SCHEMA-012: Handle very large schema with 50+ properties

  • Priority: P1
  • Type: Unit
  • Preconditions: SchemaConverter instance created
  • Test Data: Programmatically generated schema:
    properties = {f"field_{i}": {"type": "string", "description": f"Field number {i}"} for i in range(50)}
    input_schema = {"type": "object", "properties": properties, "required": list(properties.keys())}
    
  • Steps:
  • Call converter.convert_input_schema(descriptor).
  • Assert result has exactly 50 properties.
  • Assert all 50 fields are in required.
  • Expected Result: All 50 properties preserved with correct types and descriptions. All 50 field names present in required array.
  • Traceability: F-001

5.2 Annotation Mapper (TC-ANNOT-xxx)

Test File: tests/unit/adapters/test_annotations.py


TC-ANNOT-001: Map readonly=True to readOnlyHint=True

  • Priority: P0
  • Type: Unit
  • Preconditions: AnnotationMapper instance created
  • Test Data:
    annotations = ModuleAnnotations(readonly=True, destructive=False, idempotent=False, requires_approval=False, open_world=True)
    
  • Steps:
  • Call mapper.to_mcp_annotations(annotations).
  • Assert result.read_only_hint is True.
  • Expected Result: ToolAnnotations(read_only_hint=True, destructive_hint=False, idempotent_hint=False, open_world_hint=True).
  • Traceability: F-002 (AC1)

TC-ANNOT-002: Map destructive=True to destructiveHint=True

  • Priority: P0
  • Type: Unit
  • Preconditions: AnnotationMapper instance created
  • Test Data:
    annotations = ModuleAnnotations(readonly=False, destructive=True, idempotent=False, requires_approval=False, open_world=True)
    
  • Steps:
  • Call mapper.to_mcp_annotations(annotations).
  • Assert result.destructive_hint is True.
  • Expected Result: destructive_hint=True, all others at their respective values.
  • Traceability: F-002 (AC2)

TC-ANNOT-003: Map idempotent=True to idempotentHint=True

  • Priority: P0
  • Type: Unit
  • Preconditions: AnnotationMapper instance created
  • Test Data:
    annotations = ModuleAnnotations(readonly=False, destructive=False, idempotent=True, requires_approval=False, open_world=True)
    
  • Steps:
  • Call mapper.to_mcp_annotations(annotations).
  • Assert result.idempotent_hint is True.
  • Expected Result: idempotent_hint=True.
  • Traceability: F-002 (AC3)

TC-ANNOT-004: Map open_world=False to openWorldHint=False

  • Priority: P0
  • Type: Unit
  • Preconditions: AnnotationMapper instance created
  • Test Data:
    annotations = ModuleAnnotations(readonly=False, destructive=False, idempotent=False, requires_approval=False, open_world=False)
    
  • Steps:
  • Call mapper.to_mcp_annotations(annotations).
  • Assert result.open_world_hint is False.
  • Expected Result: open_world_hint=False.
  • Traceability: F-002 (AC4)

TC-ANNOT-005: Map None annotations to MCP defaults

  • Priority: P0
  • Type: Unit
  • Preconditions: AnnotationMapper instance created
  • Test Data: annotations = None
  • Steps:
  • Call mapper.to_mcp_annotations(None).
  • Assert result uses MCP default values.
  • Expected Result: ToolAnnotations(read_only_hint=False, destructive_hint=False, idempotent_hint=False, open_world_hint=True).
  • Traceability: F-002 (AC5)

TC-ANNOT-006: Map all annotations set simultaneously

  • Priority: P0
  • Type: Unit
  • Preconditions: AnnotationMapper instance created
  • Test Data:
    annotations = ModuleAnnotations(readonly=True, destructive=True, idempotent=True, requires_approval=True, open_world=False)
    
  • Steps:
  • Call mapper.to_mcp_annotations(annotations).
  • Assert all four MCP hint fields match.
  • Expected Result: ToolAnnotations(read_only_hint=True, destructive_hint=True, idempotent_hint=True, open_world_hint=False).
  • Traceability: F-002

TC-ANNOT-007: requires_approval flag is preserved

  • Priority: P0
  • Type: Unit
  • Preconditions: AnnotationMapper instance created
  • Test Data:
    annotations = ModuleAnnotations(readonly=False, destructive=False, idempotent=False, requires_approval=True, open_world=True)
    
  • Steps:
  • Call mapper.has_requires_approval(annotations).
  • Assert returns True.
  • Expected Result: True.
  • Traceability: F-002 (AC6)

TC-ANNOT-008: requires_approval returns False for None annotations

  • Priority: P1
  • Type: Unit
  • Preconditions: AnnotationMapper instance created
  • Test Data: annotations = None
  • Steps:
  • Call mapper.has_requires_approval(None).
  • Assert returns False.
  • Expected Result: False.
  • Traceability: F-002 (AC6)

TC-ANNOT-009: Description suffix includes only non-default values

  • Priority: P1
  • Type: Unit
  • Preconditions: AnnotationMapper instance created
  • Test Data:
    annotations = ModuleAnnotations(readonly=False, destructive=True, idempotent=True, requires_approval=False, open_world=True)
    
  • Steps:
  • Call mapper.to_description_suffix(annotations).
  • Assert result contains destructive=true and idempotent=true.
  • Assert result does NOT contain readonly=false or open_world=true (defaults).
  • Expected Result: "\n\n[Annotations: destructive=true, idempotent=true]".
  • Traceability: F-011

TC-ANNOT-010: Description suffix is empty for None annotations

  • Priority: P1
  • Type: Unit
  • Preconditions: AnnotationMapper instance created
  • Test Data: annotations = None
  • Steps:
  • Call mapper.to_description_suffix(None).
  • Assert returns empty string.
  • Expected Result: "".
  • Traceability: F-011

5.3 Execution Router (TC-EXEC-xxx)

Test File: tests/unit/server/test_router.py


TC-EXEC-001: Successful tool call returns JSON output

  • Priority: P0
  • Type: Unit
  • Preconditions: ExecutionRouter with mock Executor; call_async returns {"status": "ok", "path": "/out/resized.png"}
  • Test Data:
    tool_name = "image.resize"
    arguments = {"width": 800, "height": 600}
    
  • Steps:
  • Call await router.handle_call("image.resize", {"width": 800, "height": 600}).
  • Assert result.isError is False.
  • Parse result.content[0].text as JSON.
  • Assert parsed JSON equals {"status": "ok", "path": "/out/resized.png"}.
  • Expected Result: CallToolResult with isError=False, content contains TextContent with text '{"status": "ok", "path": "/out/resized.png"}'.
  • Traceability: F-003 (AC1), F-013

TC-EXEC-002: Non-existent module returns error with module_id

  • Priority: P0
  • Type: Unit
  • Preconditions: ExecutionRouter with mock Executor; call_async raises ModuleNotFoundError("unknown.module")
  • Test Data:
    tool_name = "unknown.module"
    arguments = {}
    
  • Steps:
  • Call await router.handle_call("unknown.module", {}).
  • Assert result.isError is True.
  • Assert result.content[0].text contains "Module not found: unknown.module".
  • Expected Result: CallToolResult(isError=True) with text "Module not found: unknown.module".
  • Traceability: F-003 (AC2), F-004

TC-EXEC-003: Schema validation failure returns field-level errors

  • Priority: P0
  • Type: Unit
  • Preconditions: ExecutionRouter with mock Executor; call_async raises SchemaValidationError("Validation failed", errors=[{"field": "width", "code": "int_type", "message": "Input should be a valid integer"}])
  • Test Data:
    tool_name = "image.resize"
    arguments = {"width": "not_a_number", "height": 600}
    
  • Steps:
  • Call await router.handle_call("image.resize", {"width": "not_a_number", "height": 600}).
  • Assert result.isError is True.
  • Assert result.content[0].text contains "Input validation failed".
  • Assert text contains "width" field name.
  • Expected Result: CallToolResult(isError=True) with text "Input validation failed:\n- width: Input should be a valid integer (int_type)".
  • Traceability: F-003 (AC3), F-004

TC-EXEC-004: ACL denied returns access denied without caller_id

  • Priority: P0
  • Type: Unit
  • Preconditions: ExecutionRouter with mock Executor; call_async raises ACLDeniedError("mcp_client", "image.resize")
  • Test Data:
    tool_name = "image.resize"
    arguments = {"width": 800, "height": 600}
    
  • Steps:
  • Call await router.handle_call("image.resize", {"width": 800, "height": 600}).
  • Assert result.isError is True.
  • Assert result.content[0].text equals "Access denied".
  • Assert text does NOT contain "mcp_client".
  • Expected Result: CallToolResult(isError=True) with text exactly "Access denied". No caller identity leaked.
  • Traceability: F-003 (AC4), F-004

TC-EXEC-005: Timeout returns error with duration

  • Priority: P0
  • Type: Unit
  • Preconditions: ExecutionRouter with mock Executor; call_async raises ModuleTimeoutError("image.resize", 5000)
  • Test Data:
    tool_name = "image.resize"
    arguments = {"width": 800, "height": 600}
    
  • Steps:
  • Call await router.handle_call("image.resize", {"width": 800, "height": 600}).
  • Assert result.isError is True.
  • Assert text contains "timed out" and "5000ms".
  • Expected Result: CallToolResult(isError=True) with text "Module timed out after 5000ms".
  • Traceability: F-003 (AC5), F-004

TC-EXEC-006: Unexpected exception returns sanitized error and logs traceback

  • Priority: P0
  • Type: Unit
  • Preconditions: ExecutionRouter with mock Executor; call_async raises RuntimeError("disk full")
  • Test Data:
    tool_name = "image.resize"
    arguments = {"width": 800, "height": 600}
    
  • Steps:
  • Call await router.handle_call(...) with caplog fixture.
  • Assert result.isError is True.
  • Assert result.content[0].text equals "Internal error occurred".
  • Assert text does NOT contain "disk full".
  • Assert "disk full" appears in ERROR-level log output.
  • Expected Result: Client receives "Internal error occurred". Server logs full traceback at ERROR level with "disk full" message.
  • Traceability: F-003, F-004, F-016

TC-EXEC-007: Empty arguments passed to executor

  • Priority: P1
  • Type: Unit
  • Preconditions: ExecutionRouter with mock Executor; call_async returns {"status": "pong"}
  • Test Data:
    tool_name = "system.ping"
    arguments = {}
    
  • Steps:
  • Call await router.handle_call("system.ping", {}).
  • Assert mock Executor's call_async was called with ("system.ping", {}).
  • Assert result.isError is False.
  • Expected Result: Empty dict passed through to Executor. Success result returned.
  • Traceability: F-003

TC-EXEC-008: Non-serializable output uses default=str fallback

  • Priority: P1
  • Type: Unit
  • Preconditions: ExecutionRouter with mock Executor; call_async returns {"timestamp": datetime(2026, 1, 15, 10, 30, 0), "path": PosixPath("/tmp/out.png")}
  • Test Data:
    from datetime import datetime
    from pathlib import PosixPath
    tool_name = "image.resize"
    arguments = {"width": 800, "height": 600}
    
  • Steps:
  • Call await router.handle_call(...).
  • Assert result.isError is False.
  • Parse result.content[0].text as JSON.
  • Assert timestamp field is a string representation.
  • Assert path field is "/tmp/out.png".
  • Expected Result: json.dumps(output, default=str) converts datetime and Path to strings. Output is valid JSON.
  • Traceability: F-013

TC-EXEC-009: Concurrent tool calls handled correctly

  • Priority: P1
  • Type: Unit
  • Preconditions: ExecutionRouter with mock Executor; call_async returns different values based on module_id using side_effect
  • Test Data:
    async def mock_call(module_id, inputs, context=None):
        await asyncio.sleep(0.01)
        return {"module": module_id, "result": "ok"}
    executor.call_async = AsyncMock(side_effect=mock_call)
    
  • Steps:
  • Launch 10 concurrent router.handle_call() calls via asyncio.gather().
  • Assert all 10 results have isError=False.
  • Assert each result contains the correct module_id.
  • Expected Result: All 10 calls succeed independently. No cross-contamination between concurrent call results.
  • Traceability: F-003

TC-EXEC-010: Module returning None handled gracefully

  • Priority: P1
  • Type: Unit
  • Preconditions: ExecutionRouter with mock Executor; call_async returns None
  • Test Data:
    tool_name = "system.shutdown"
    arguments = {}
    
  • Steps:
  • Call await router.handle_call("system.shutdown", {}).
  • Assert result.isError is False.
  • Assert content text is "null" or "{}".
  • Expected Result: CallToolResult with isError=False. Output serialized gracefully (null JSON or empty object).
  • Traceability: F-003, F-013

TC-EXEC-011: Executor passthrough -- call_async receives exact arguments

  • Priority: P1
  • Type: Unit
  • Preconditions: ExecutionRouter with mock Executor
  • Test Data:
    tool_name = "text.translate"
    arguments = {"text": "Hello world", "target_lang": "es", "options": {"formal": True}}
    
  • Steps:
  • Call await router.handle_call("text.translate", arguments).
  • Assert executor.call_async.assert_called_once_with("text.translate", {"text": "Hello world", "target_lang": "es", "options": {"formal": True}}).
  • Expected Result: Executor receives the exact tool_name and arguments dict without modification.
  • Traceability: F-003, F-014

TC-EXEC-012: InvalidInputError returns descriptive message

  • Priority: P0
  • Type: Unit
  • Preconditions: ExecutionRouter with mock Executor; call_async raises InvalidInputError("Missing required field: width")
  • Test Data:
    tool_name = "image.resize"
    arguments = {"height": 600}
    
  • Steps:
  • Call await router.handle_call("image.resize", {"height": 600}).
  • Assert result.isError is True.
  • Assert text contains "Invalid input: Missing required field: width".
  • Expected Result: CallToolResult(isError=True) with text "Invalid input: Missing required field: width".
  • Traceability: F-004

5.4 Error Mapper (TC-ERROR-xxx)

Test File: tests/unit/adapters/test_errors.py


TC-ERROR-001: ModuleNotFoundError mapping

  • Priority: P0
  • Type: Unit
  • Preconditions: ErrorMapper instance created
  • Test Data: error = ModuleNotFoundError("image.resize")
  • Steps:
  • Call mapper.to_mcp_error(error).
  • Assert result.isError is True.
  • Assert text equals "Module not found: image.resize".
  • Expected Result: CallToolResult(isError=True, content=[TextContent(text="Module not found: image.resize")]).
  • Traceability: F-004

TC-ERROR-002: SchemaValidationError with single field

  • Priority: P0
  • Type: Unit
  • Preconditions: ErrorMapper instance created
  • Test Data:
    error = SchemaValidationError(
        "Validation failed",
        errors=[{"field": "width", "code": "int_type", "message": "Input should be a valid integer"}]
    )
    
  • Steps:
  • Call mapper.to_mcp_error(error).
  • Assert text starts with "Input validation failed:".
  • Assert text contains "width: Input should be a valid integer (int_type)".
  • Expected Result: "Input validation failed:\n- width: Input should be a valid integer (int_type)".
  • Traceability: F-004

TC-ERROR-003: SchemaValidationError with multiple fields

  • Priority: P0
  • Type: Unit
  • Preconditions: ErrorMapper instance created
  • Test Data:
    error = SchemaValidationError(
        "Validation failed",
        errors=[
            {"field": "width", "code": "int_type", "message": "Input should be a valid integer"},
            {"field": "format", "code": "enum", "message": "Input should be 'png', 'jpg' or 'webp'"}
        ]
    )
    
  • Steps:
  • Call mapper.to_mcp_error(error).
  • Assert text contains both field error lines.
  • Expected Result: Multi-line text with both width and format errors listed.
  • Traceability: F-004

TC-ERROR-004: ACLDeniedError does not leak caller_id

  • Priority: P0
  • Type: Unit
  • Preconditions: ErrorMapper instance created
  • Test Data: error = ACLDeniedError("secret_user_123", "image.resize")
  • Steps:
  • Call mapper.to_mcp_error(error).
  • Assert text equals "Access denied".
  • Assert "secret_user_123" does NOT appear in text.
  • Assert "image.resize" does NOT appear in text.
  • Expected Result: Exactly "Access denied" -- no sensitive information.
  • Traceability: F-004 (AC3)

TC-ERROR-005: ModuleTimeoutError includes duration

  • Priority: P0
  • Type: Unit
  • Preconditions: ErrorMapper instance created
  • Test Data: error = ModuleTimeoutError("image.resize", 30000)
  • Steps:
  • Call mapper.to_mcp_error(error).
  • Assert text equals "Module timed out after 30000ms".
  • Expected Result: "Module timed out after 30000ms".
  • Traceability: F-004

TC-ERROR-006: InvalidInputError preserves message

  • Priority: P0
  • Type: Unit
  • Preconditions: ErrorMapper instance created
  • Test Data: error = InvalidInputError("Arguments must be a JSON object")
  • Steps:
  • Call mapper.to_mcp_error(error).
  • Assert text equals "Invalid input: Arguments must be a JSON object".
  • Expected Result: "Invalid input: Arguments must be a JSON object".
  • Traceability: F-004

TC-ERROR-007: CallDepthExceededError returns generic message

  • Priority: P0
  • Type: Unit
  • Preconditions: ErrorMapper instance created
  • Test Data: error = CallDepthExceededError(depth=10, max_depth=5, call_chain=["a", "b", "c"])
  • Steps:
  • Call mapper.to_mcp_error(error).
  • Assert text equals "Call depth limit exceeded".
  • Assert text does NOT contain the call chain.
  • Expected Result: "Call depth limit exceeded". No internal call chain exposed.
  • Traceability: F-004

TC-ERROR-008: CircularCallError returns generic message

  • Priority: P0
  • Type: Unit
  • Preconditions: ErrorMapper instance created
  • Test Data: error = CircularCallError(module_id="a.module", call_chain=["a.module", "b.module", "a.module"])
  • Steps:
  • Call mapper.to_mcp_error(error).
  • Assert text equals "Circular call detected".
  • Expected Result: "Circular call detected". No call chain exposed.
  • Traceability: F-004

TC-ERROR-009: CallFrequencyExceededError returns generic message

  • Priority: P0
  • Type: Unit
  • Preconditions: ErrorMapper instance created
  • Test Data: error = CallFrequencyExceededError(module_id="spam.module", count=100, max_repeat=10, call_chain=["spam.module"])
  • Steps:
  • Call mapper.to_mcp_error(error).
  • Assert text equals "Call frequency limit exceeded".
  • Expected Result: "Call frequency limit exceeded".
  • Traceability: F-004

TC-ERROR-010: Unexpected exception returns sanitized message

  • Priority: P0
  • Type: Unit
  • Preconditions: ErrorMapper instance created
  • Test Data: error = RuntimeError("Connection to database failed at 10.0.0.5:5432")
  • Steps:
  • Call mapper.to_mcp_error(error).
  • Assert text equals "Internal error occurred".
  • Assert "10.0.0.5" does NOT appear in text.
  • Assert "database" does NOT appear in text.
  • Expected Result: Exactly "Internal error occurred". No internal details leaked.
  • Traceability: F-004 (AC4)

TC-ERROR-011: All error results have isError=True

  • Priority: P0
  • Type: Unit
  • Preconditions: ErrorMapper instance created
  • Test Data: One instance of each error type:
    errors = [
        ModuleNotFoundError("x"),
        SchemaValidationError("v", errors=[]),
        ACLDeniedError("c", "t"),
        ModuleTimeoutError("x", 1000),
        InvalidInputError("bad"),
        CallDepthExceededError(5, 3, []),
        CircularCallError("x", []),
        CallFrequencyExceededError("x", 10, 5, []),
        RuntimeError("unexpected"),
    ]
    
  • Steps:
  • For each error, call mapper.to_mcp_error(error).
  • Assert result.isError is True for every result.
  • Expected Result: All 9 results have isError=True.
  • Traceability: F-004 (AC5)

5.5 MCP Server Factory (TC-SERVER-xxx)

Test File: tests/unit/server/test_factory.py


TC-SERVER-001: Build single tool from ModuleDescriptor

  • Priority: P0
  • Type: Unit
  • Preconditions: MCPServerFactory instance created
  • Test Data:
    descriptor = ModuleDescriptor(
        module_id="image.resize",
        description="Resize an image to the specified dimensions",
        input_schema={
            "type": "object",
            "properties": {
                "width": {"type": "integer"},
                "height": {"type": "integer"}
            },
            "required": ["width", "height"]
        },
        output_schema={},
        annotations=ModuleAnnotations(readonly=False, destructive=False, idempotent=True, requires_approval=False, open_world=True)
    )
    
  • Steps:
  • Call factory.build_tool(descriptor).
  • Assert tool.name equals "image.resize".
  • Assert tool.description equals "Resize an image to the specified dimensions".
  • Assert tool.inputSchema has type: object with width and height properties.
  • Assert tool.annotations.idempotent_hint is True.
  • Expected Result: types.Tool with name, description, inputSchema, and annotations all correctly mapped from the descriptor.
  • Traceability: F-001, F-002, F-005

TC-SERVER-002: Build tools from registry with multiple modules

  • Priority: P0
  • Type: Unit
  • Preconditions: MCPServerFactory instance, mock Registry returning 3 module IDs
  • Test Data: Mock registry with modules ["image.resize", "text.summarize", "system.ping"], each returning a valid ModuleDescriptor.
  • Steps:
  • Call factory.build_tools(registry).
  • Assert returned list has length 3.
  • Assert tool names match the module IDs.
  • Expected Result: List of 3 types.Tool objects with correct names.
  • Traceability: F-001, F-005

TC-SERVER-003: serve() accepts Executor instance

  • Priority: P1
  • Type: Unit
  • Preconditions: Mock Executor with .registry property returning mock Registry
  • Test Data:
    executor = MagicMock(spec=Executor)
    executor.registry = mock_registry
    
  • Steps:
  • Call serve(executor) (mocking transport to avoid blocking).
  • Assert Executor's registry is used for tool building.
  • Assert Executor is used for call routing.
  • Expected Result: serve() extracts registry from executor and uses executor for call_async routing.
  • Traceability: F-005 (AC5), F-014

TC-SERVER-004: Empty registry produces empty tool list with warning

  • Priority: P0
  • Type: Unit
  • Preconditions: MCPServerFactory instance, mock Registry with list() returning []
  • Test Data: registry.list.return_value = []
  • Steps:
  • Call factory.build_tools(registry) with caplog fixture.
  • Assert returned list is empty.
  • Assert WARNING log contains "No modules registered" or similar.
  • Expected Result: Empty list [] returned. Warning logged.
  • Traceability: F-005 (AC7)

TC-SERVER-005: Tool with None annotations uses defaults

  • Priority: P1
  • Type: Unit
  • Preconditions: MCPServerFactory instance
  • Test Data:
    descriptor = ModuleDescriptor(
        module_id="system.ping",
        description="Ping",
        input_schema={},
        output_schema={},
        annotations=None
    )
    
  • Steps:
  • Call factory.build_tool(descriptor).
  • Assert tool.annotations.read_only_hint is False.
  • Assert tool.annotations.open_world_hint is True.
  • Expected Result: Tool uses MCP default annotation values.
  • Traceability: F-002 (AC5)

TC-SERVER-006: Create server with custom name and version

  • Priority: P1
  • Type: Unit
  • Preconditions: MCPServerFactory instance
  • Test Data: name="my-tools", version="2.0.0"
  • Steps:
  • Call factory.create_server(name="my-tools", version="2.0.0").
  • Assert server is created (returns Server instance).
  • Expected Result: Server instance created with custom name and version.
  • Traceability: F-005 (AC4)

TC-SERVER-007: Build tools with tag filter

  • Priority: P2
  • Type: Unit
  • Preconditions: MCPServerFactory, mock Registry where list(tags=["image"]) returns ["image.resize"]
  • Test Data: tags=["image"]
  • Steps:
  • Call factory.build_tools(registry, tags=["image"]).
  • Assert registry.list was called with tags=["image"].
  • Assert only "image.resize" tool is in result.
  • Expected Result: Filtered list with 1 tool.
  • Traceability: F-018

TC-SERVER-008: Build tools with prefix filter

  • Priority: P2
  • Type: Unit
  • Preconditions: MCPServerFactory, mock Registry where list(prefix="image.") returns ["image.resize", "image.crop"]
  • Test Data: prefix="image."
  • Steps:
  • Call factory.build_tools(registry, prefix="image.").
  • Assert only image-prefixed tools returned.
  • Expected Result: List with 2 tools: image.resize and image.crop.
  • Traceability: F-018

TC-SERVER-009: Server startup logs tool count and transport

  • Priority: P1
  • Type: Unit
  • Preconditions: Mock serve() internals with caplog
  • Test Data: Registry with 5 modules, transport="stdio"
  • Steps:
  • Trigger server startup logging.
  • Assert INFO log contains tool count (e.g., "5 tools registered").
  • Assert INFO log contains transport type (e.g., "stdio").
  • Expected Result: Log message like "Starting apcore-mcp server with 5 tools via stdio transport".
  • Traceability: F-016

TC-SERVER-010: serve() rejects invalid transport name

  • Priority: P0
  • Type: Unit
  • Preconditions: None
  • Test Data: transport="websocket"
  • Steps:
  • Call serve(registry, transport="websocket").
  • Assert ValueError is raised.
  • Expected Result: ValueError with message indicating "websocket" is not a valid transport. Allowed: stdio, streamable-http, sse.
  • Traceability: F-005

5.6 OpenAI Converter (TC-OPENAI-xxx)

Test File: tests/unit/converters/test_openai.py


TC-OPENAI-001: Basic conversion produces correct structure

  • Priority: P0
  • Type: Unit
  • Preconditions: OpenAIConverter instance, mock Registry with one module
  • Test Data:
    descriptor = ModuleDescriptor(
        module_id="image.resize",
        description="Resize an image to the specified dimensions",
        input_schema={"type": "object", "properties": {"width": {"type": "integer"}, "height": {"type": "integer"}}, "required": ["width", "height"]},
        output_schema={}
    )
    
  • Steps:
  • Call converter.convert_descriptor(descriptor).
  • Assert result has "type": "function".
  • Assert result["function"]["name"] equals "image__resize" (dots normalized).
  • Assert result["function"]["description"] equals the original description.
  • Assert result["function"]["parameters"] equals the input_schema.
  • Expected Result:
    {
        "type": "function",
        "function": {
            "name": "image__resize",
            "description": "Resize an image to the specified dimensions",
            "parameters": {"type": "object", "properties": {"width": {"type": "integer"}, "height": {"type": "integer"}}, "required": ["width", "height"]}
        }
    }
    
  • Traceability: F-008 (AC1, AC2, AC3, AC4, AC5, AC6)

TC-OPENAI-002: Module ID normalization replaces dots with double underscores

  • Priority: P0
  • Type: Unit
  • Preconditions: ModuleIDNormalizer instance
  • Test Data:
    test_cases = [
        ("image.resize", "image__resize"),
        ("comfyui.workflow.execute", "comfyui__workflow__execute"),
        ("simple", "simple"),
        ("a.b.c.d.e", "a__b__c__d__e"),
    ]
    
  • Steps:
  • For each (input, expected), call normalizer.normalize(input).
  • Assert result equals expected.
  • Expected Result: All dot-separated module IDs converted to double-underscore format. IDs without dots are unchanged.
  • Traceability: F-008

TC-OPENAI-003: Module ID denormalization reverses normalization

  • Priority: P0
  • Type: Unit
  • Preconditions: ModuleIDNormalizer instance
  • Test Data:
    test_cases = [
        ("image__resize", "image.resize"),
        ("comfyui__workflow__execute", "comfyui.workflow.execute"),
        ("simple", "simple"),
    ]
    
  • Steps:
  • For each (input, expected), call normalizer.denormalize(input).
  • Assert result equals expected.
  • Expected Result: Double underscores converted back to dots.
  • Traceability: F-008

TC-OPENAI-004: Empty registry returns empty list

  • Priority: P0
  • Type: Unit
  • Preconditions: OpenAIConverter, mock Registry with list() returning []
  • Test Data: Empty registry
  • Steps:
  • Call converter.convert_registry(registry).
  • Assert result is [].
  • Expected Result: Empty list [].
  • Traceability: F-008 (AC8)

TC-OPENAI-005: embed_annotations=True appends annotation suffix

  • Priority: P1
  • Type: Unit
  • Preconditions: OpenAIConverter, descriptor with destructive=True, idempotent=True
  • Test Data:
    descriptor = ModuleDescriptor(
        module_id="file.delete",
        description="Delete a file",
        input_schema={"type": "object", "properties": {"path": {"type": "string"}}, "required": ["path"]},
        output_schema={},
        annotations=ModuleAnnotations(readonly=False, destructive=True, idempotent=True, requires_approval=True, open_world=True)
    )
    
  • Steps:
  • Call converter.convert_descriptor(descriptor, embed_annotations=True).
  • Assert result["function"]["description"] contains "Delete a file".
  • Assert description contains "[Annotations:".
  • Assert description contains "destructive=true" and "idempotent=true" and "requires_approval=true".
  • Expected Result: Description equals "Delete a file\n\n[Annotations: destructive=true, idempotent=true, requires_approval=true]".
  • Traceability: F-011 (AC1, AC3, AC4)

TC-OPENAI-006: embed_annotations=False does not modify description

  • Priority: P1
  • Type: Unit
  • Preconditions: OpenAIConverter, descriptor with annotations
  • Test Data: Same descriptor as TC-OPENAI-005
  • Steps:
  • Call converter.convert_descriptor(descriptor, embed_annotations=False).
  • Assert description equals exactly "Delete a file" with no suffix.
  • Expected Result: "Delete a file" -- unmodified.
  • Traceability: F-011 (AC2)

TC-OPENAI-007: strict=True adds strict field and modifies schema

  • Priority: P1
  • Type: Unit
  • Preconditions: OpenAIConverter
  • Test Data:
    descriptor = ModuleDescriptor(
        module_id="image.resize",
        description="Resize",
        input_schema={
            "type": "object",
            "properties": {
                "width": {"type": "integer"},
                "height": {"type": "integer"},
                "format": {"type": "string", "default": "png"}
            },
            "required": ["width", "height"]
        },
        output_schema={}
    )
    
  • Steps:
  • Call converter.convert_descriptor(descriptor, strict=True).
  • Assert result["function"]["strict"] is True.
  • Assert parameters["additionalProperties"] is False.
  • Assert "format" is in parameters["required"].
  • Assert parameters["properties"]["format"]["type"] includes "null" (nullable).
  • Assert "default" key is not in parameters["properties"]["format"].
  • Expected Result: strict: true added. Schema has additionalProperties: false, all properties required, optional format becomes nullable, default removed.
  • Traceability: F-012 (AC1, AC3)

TC-OPENAI-008: strict=False does not add strict field

  • Priority: P1
  • Type: Unit
  • Preconditions: OpenAIConverter
  • Test Data: Same descriptor as TC-OPENAI-007
  • Steps:
  • Call converter.convert_descriptor(descriptor, strict=False).
  • Assert "strict" key is NOT in result["function"].
  • Assert schema is unmodified.
  • Expected Result: No strict key. Schema preserves default values and original required list.
  • Traceability: F-012 (AC2)

TC-OPENAI-009: Output directly usable with OpenAI API format

  • Priority: P0
  • Type: Unit
  • Preconditions: OpenAIConverter, registry with 2 modules
  • Test Data: Two descriptors: image.resize and text.summarize
  • Steps:
  • Call converter.convert_registry(registry).
  • Assert result is a list.
  • Assert length is 2.
  • For each item, assert item["type"] == "function".
  • For each item, assert "name" in item["function"].
  • For each item, assert "description" in item["function"].
  • For each item, assert "parameters" in item["function"].
  • Expected Result: List of 2 dicts, each with type: "function" and function containing name, description, parameters. Directly passable to openai.chat.completions.create(tools=...).
  • Traceability: F-008 (AC7, AC9)

TC-OPENAI-010: Tag filtering returns only matching modules

  • Priority: P2
  • Type: Unit
  • Preconditions: OpenAIConverter, registry with list(tags=["image"]) returning ["image.resize"]
  • Test Data: tags=["image"]
  • Steps:
  • Call converter.convert_registry(registry, tags=["image"]).
  • Assert result has 1 item with name "image__resize".
  • Expected Result: Single tool in list.
  • Traceability: F-017

TC-OPENAI-011: Prefix filtering returns only matching modules

  • Priority: P2
  • Type: Unit
  • Preconditions: OpenAIConverter, registry with list(prefix="comfyui.") returning ["comfyui.workflow"]
  • Test Data: prefix="comfyui."
  • Steps:
  • Call converter.convert_registry(registry, prefix="comfyui.").
  • Assert result has 1 item.
  • Expected Result: Single tool in list with name "comfyui__workflow".
  • Traceability: F-017

TC-OPENAI-012: No dependency on openai package

  • Priority: P0
  • Type: Unit
  • Preconditions: None
  • Test Data: N/A
  • Steps:
  • Import apcore_mcp.converters.openai module.
  • Assert no import openai in the module source.
  • Call to_openai_tools(registry).
  • Assert result is list[dict] (plain Python types only).
  • Expected Result: Function works without openai package installed. Returns plain dicts.
  • Traceability: F-008 (AC9)

5.7 Transport Manager (TC-TRANSPORT-xxx)

Test File: tests/unit/server/test_transport.py


TC-TRANSPORT-001: stdio transport starts successfully

  • Priority: P0
  • Type: Unit
  • Preconditions: TransportManager instance, mock Server, mock stdio_server
  • Test Data: Default parameters (no host/port needed for stdio)
  • Steps:
  • Mock mcp.server.stdio.stdio_server context manager.
  • Call await transport.run_stdio(server, init_options).
  • Assert stdio_server was called.
  • Assert server.run was called with read_stream and write_stream.
  • Expected Result: stdio transport lifecycle initiated correctly.
  • Traceability: F-006

TC-TRANSPORT-002: stdio transport handles graceful shutdown

  • Priority: P1
  • Type: Unit
  • Preconditions: TransportManager instance, mock Server
  • Test Data: N/A
  • Steps:
  • Start stdio transport.
  • Simulate read stream closing (EOF).
  • Assert no exception raised and method returns cleanly.
  • Expected Result: Clean shutdown without exceptions.
  • Traceability: F-006 (AC4)

TC-TRANSPORT-003: stdio is the default transport

  • Priority: P0
  • Type: Unit
  • Preconditions: None
  • Test Data: serve(registry) called without transport parameter
  • Steps:
  • Verify serve() function signature has transport: str = "stdio".
  • Mock internals to verify stdio transport is selected.
  • Expected Result: Default transport is "stdio".
  • Traceability: F-005 (AC1), F-006

TC-TRANSPORT-004: Streamable HTTP transport starts with host and port

  • Priority: P0
  • Type: Unit
  • Preconditions: TransportManager instance, mock Server
  • Test Data: host="127.0.0.1", port=8000
  • Steps:
  • Call await transport.run_streamable_http(server, init_options, host="127.0.0.1", port=8000).
  • Assert HTTP server started on the specified host and port.
  • Expected Result: HTTP transport started on 127.0.0.1:8000.
  • Traceability: F-007 (AC1, AC3)

TC-TRANSPORT-005: Invalid port raises ValueError

  • Priority: P0
  • Type: Unit
  • Preconditions: TransportManager instance
  • Test Data: port=0, port=65536, port=-1
  • Steps:
  • For each invalid port, call transport.run_streamable_http(server, init_options, port=port).
  • Assert ValueError is raised each time.
  • Expected Result: ValueError for port 0, 65536, and -1.
  • Traceability: F-007

TC-TRANSPORT-006: Empty host raises ValueError

  • Priority: P1
  • Type: Unit
  • Preconditions: TransportManager instance
  • Test Data: host=""
  • Steps:
  • Call transport.run_streamable_http(server, init_options, host="", port=8000).
  • Assert ValueError is raised.
  • Expected Result: ValueError indicating host must not be empty.
  • Traceability: F-007

TC-TRANSPORT-007: SSE transport starts successfully

  • Priority: P1
  • Type: Unit
  • Preconditions: TransportManager instance, mock Server
  • Test Data: host="127.0.0.1", port=8000
  • Steps:
  • Call await transport.run_sse(server, init_options, host="127.0.0.1", port=8000).
  • Assert SSE server started.
  • Expected Result: SSE transport lifecycle initiated.
  • Traceability: F-010

TC-TRANSPORT-008: SSE transport logs deprecation warning

  • Priority: P1
  • Type: Unit
  • Preconditions: TransportManager with caplog
  • Test Data: N/A
  • Steps:
  • Call await transport.run_sse(...).
  • Assert WARNING log contains "deprecated" or "SSE".
  • Expected Result: Deprecation warning logged.
  • Traceability: F-010 (AC2)

TC-TRANSPORT-009: Default host and port values

  • Priority: P1
  • Type: Unit
  • Preconditions: Inspect TransportManager API
  • Test Data: N/A
  • Steps:
  • Assert run_streamable_http default host is "127.0.0.1".
  • Assert run_streamable_http default port is 8000.
  • Assert same defaults for run_sse.
  • Expected Result: Default host "127.0.0.1", default port 8000.
  • Traceability: F-007 (AC3)

5.8 CLI Module (TC-CLI-xxx)

Test File: tests/unit/test_cli.py


TC-CLI-001: Basic usage with --extensions-dir

  • Priority: P0
  • Type: Unit
  • Preconditions: Mock Registry.discover(), mock serve()
  • Test Data: args = ["--extensions-dir", "/tmp/test_extensions"] (directory exists)
  • Steps:
  • Mock os.path.isdir to return True for /tmp/test_extensions.
  • Run CLI main() with args.
  • Assert Registry was created with extensions_dir="/tmp/test_extensions".
  • Assert registry.discover() was called.
  • Assert serve(registry) was called.
  • Expected Result: Registry created, discover() called, serve() called with default transport.
  • Traceability: F-009 (AC1)

TC-CLI-002: --transport flag accepts all valid values

  • Priority: P0
  • Type: Unit
  • Preconditions: Mock serve()
  • Test Data:
    transports = ["stdio", "streamable-http", "sse"]
    
  • Steps:
  • For each transport, run CLI with ["--extensions-dir", "/tmp/ext", "--transport", transport].
  • Assert serve() was called with transport=transport.
  • Expected Result: All three transport values accepted without error.
  • Traceability: F-009 (AC2)

TC-CLI-003: --host and --port flags configure network transports

  • Priority: P0
  • Type: Unit
  • Preconditions: Mock serve()
  • Test Data: args = ["--extensions-dir", "/tmp/ext", "--transport", "streamable-http", "--host", "0.0.0.0", "--port", "9000"]
  • Steps:
  • Run CLI main() with args.
  • Assert serve() called with host="0.0.0.0" and port=9000.
  • Expected Result: serve() receives custom host and port.
  • Traceability: F-009 (AC3)

TC-CLI-004: --help displays usage information

  • Priority: P0
  • Type: Unit
  • Preconditions: None
  • Test Data: args = ["--help"]
  • Steps:
  • Run CLI main() with ["--help"].
  • Capture stdout.
  • Assert output contains "--extensions-dir", "--transport", "--host", "--port".
  • Expected Result: Help text displayed with all flag descriptions. SystemExit(0) raised.
  • Traceability: F-009 (AC4)

TC-CLI-005: Non-existent extensions-dir exits with error

  • Priority: P0
  • Type: Unit
  • Preconditions: None
  • Test Data: args = ["--extensions-dir", "/nonexistent/path/to/extensions"]
  • Steps:
  • Run CLI main() with args.
  • Assert SystemExit with code 1.
  • Assert stderr contains error message about directory not existing.
  • Expected Result: Exit code 1 with clear error message.
  • Traceability: F-009 (AC5)

TC-CLI-006: Invalid transport flag exits with error

  • Priority: P1
  • Type: Unit
  • Preconditions: None
  • Test Data: args = ["--extensions-dir", "/tmp/ext", "--transport", "websocket"]
  • Steps:
  • Run CLI main() with args.
  • Assert SystemExit with code 2 (argparse error).
  • Expected Result: Exit code 2 with argparse error message.
  • Traceability: F-009

TC-CLI-007: No modules discovered logs warning but starts server

  • Priority: P1
  • Type: Unit
  • Preconditions: Mock Registry.discover() returning 0, mock serve()
  • Test Data: args = ["--extensions-dir", "/tmp/empty_dir"]
  • Steps:
  • Mock registry.discover() to return 0.
  • Run CLI main() with args and caplog.
  • Assert WARNING log about no modules discovered.
  • Assert serve() was still called.
  • Expected Result: Warning logged, server starts with zero tools.
  • Traceability: F-009 (AC6)

TC-CLI-008: --log-level flag sets logging level

  • Priority: P1
  • Type: Unit
  • Preconditions: Mock serve()
  • Test Data: args = ["--extensions-dir", "/tmp/ext", "--log-level", "DEBUG"]
  • Steps:
  • Run CLI main() with args.
  • Assert logging level for apcore_mcp logger set to DEBUG.
  • Expected Result: Logger configured at DEBUG level.
  • Traceability: F-009, F-016

TC-CLI-009: --name and --version flags set server identity

  • Priority: P1
  • Type: Unit
  • Preconditions: Mock serve()
  • Test Data: args = ["--extensions-dir", "/tmp/ext", "--name", "my-server", "--version", "3.0.0"]
  • Steps:
  • Run CLI main() with args.
  • Assert serve() called with server_name="my-server" and server_version="3.0.0".
  • Expected Result: Custom server name and version passed to serve().
  • Traceability: F-009

TC-CLI-010: Invalid port exits with error

  • Priority: P1
  • Type: Unit
  • Preconditions: None
  • Test Data: args = ["--extensions-dir", "/tmp/ext", "--transport", "streamable-http", "--port", "99999"]
  • Steps:
  • Run CLI main() with args.
  • Assert SystemExit with code 1.
  • Expected Result: Exit code 1 with error about invalid port.
  • Traceability: F-009

5.9 Dynamic Registry Listener (TC-DYNAMIC-xxx)

Test File: tests/unit/server/test_listener.py


TC-DYNAMIC-001: Register new module adds tool to list

  • Priority: P1
  • Type: Unit
  • Preconditions: RegistryListener started with mock Registry and MCPServerFactory
  • Test Data:
    module_id = "new.tool"
    descriptor = ModuleDescriptor(module_id="new.tool", description="New tool", input_schema={}, output_schema={})
    factory.build_tool.return_value = types.Tool(name="new.tool", description="New tool", inputSchema={"type": "object", "properties": {}})
    
  • Steps:
  • Simulate registry register event: call listener._on_register("new.tool", mock_module).
  • Assert factory.build_tool was called.
  • Assert listener.tools dict contains "new.tool".
  • Expected Result: Tool added to listener's internal tools dict.
  • Traceability: F-015 (AC1)

TC-DYNAMIC-002: Unregister module removes tool from list

  • Priority: P1
  • Type: Unit
  • Preconditions: RegistryListener with one existing tool "old.tool" in tools dict
  • Test Data: module_id = "old.tool"
  • Steps:
  • Pre-populate listener._tools["old.tool"] = mock_tool.
  • Call listener._on_unregister("old.tool", mock_module).
  • Assert "old.tool" is NOT in listener.tools.
  • Expected Result: Tool removed from tools dict.
  • Traceability: F-015 (AC2)

TC-DYNAMIC-003: Register when get_definition returns None

  • Priority: P1
  • Type: Unit
  • Preconditions: RegistryListener, mock Registry with get_definition returning None
  • Test Data: module_id = "ghost.tool"
  • Steps:
  • Call listener._on_register("ghost.tool", mock_module).
  • Assert "ghost.tool" is NOT in listener.tools.
  • Assert warning logged.
  • Expected Result: Tool NOT added. Warning logged. No crash.
  • Traceability: F-015

TC-DYNAMIC-004: Unregister non-existent module is silent

  • Priority: P1
  • Type: Unit
  • Preconditions: RegistryListener with empty tools dict
  • Test Data: module_id = "nonexistent.tool"
  • Steps:
  • Call listener._on_unregister("nonexistent.tool", mock_module).
  • Assert no exception raised.
  • Expected Result: No exception. Silent no-op.
  • Traceability: F-015

TC-DYNAMIC-005: Start registers event callbacks

  • Priority: P1
  • Type: Unit
  • Preconditions: RegistryListener, mock Registry
  • Test Data: N/A
  • Steps:
  • Call listener.start().
  • Assert registry.on("register", ...) was called.
  • Assert registry.on("unregister", ...) was called.
  • Expected Result: Both event callbacks registered on the Registry.
  • Traceability: F-015

TC-DYNAMIC-006: Stop causes callbacks to no-op

  • Priority: P1
  • Type: Unit
  • Preconditions: RegistryListener started
  • Test Data: N/A
  • Steps:
  • Call listener.start().
  • Call listener.stop().
  • Simulate register event.
  • Assert tools dict is unchanged.
  • Expected Result: After stop(), register events are ignored.
  • Traceability: F-015

TC-DYNAMIC-007: Concurrent register and unregister are thread-safe

  • Priority: P1
  • Type: Unit
  • Preconditions: RegistryListener started
  • Test Data: 50 register events and 50 unregister events fired concurrently from different threads
  • Steps:
  • Launch 50 threads each calling _on_register(f"tool.{i}", mock).
  • Launch 50 threads each calling _on_unregister(f"tool.{i}", mock).
  • Join all threads.
  • Assert no exceptions raised.
  • Assert tools dict is in a consistent state (no partial entries).
  • Expected Result: No race conditions, no exceptions. Tools dict is consistent.
  • Traceability: F-015 (AC4)

6. Integration Test Cases (TC-INT-xxx)

Test Directory: tests/integration/


TC-INT-001: Full MCP flow -- Registry to tool list to tool call to result

  • Priority: P0
  • Type: Integration
  • Preconditions: Real SchemaConverter, AnnotationMapper, ErrorMapper, MCPServerFactory, ExecutionRouter. Mock Executor with call_async returning {"status": "ok"}.
  • Test Data:
    descriptor = ModuleDescriptor(
        module_id="image.resize",
        description="Resize an image",
        input_schema={"type": "object", "properties": {"width": {"type": "integer"}}, "required": ["width"]},
        output_schema={},
        annotations=ModuleAnnotations(readonly=False, destructive=False, idempotent=True, requires_approval=False, open_world=True)
    )
    mock_registry.list.return_value = ["image.resize"]
    mock_registry.get_definition.return_value = descriptor
    
  • Steps:
  • Create MCPServerFactory and build tools from registry.
  • Assert tool list has 1 tool with name "image.resize".
  • Create ExecutionRouter with mock executor.
  • Call router.handle_call("image.resize", {"width": 800}).
  • Assert result is success with {"status": "ok"}.
  • Expected Result: End-to-end flow: schema converted, tool built, call routed, result returned.
  • Traceability: F-001, F-002, F-003, F-005

TC-INT-002: Same flow works over all three transports

  • Priority: P0
  • Type: Integration
  • Preconditions: Full apcore-mcp stack with mock module
  • Test Data: Transport values: ["stdio", "streamable-http", "sse"]
  • Steps:
  • For each transport, start server (with timeout/background).
  • Connect MCP client (mock or SDK client).
  • List tools.
  • Call a tool.
  • Assert result matches expected.
  • Shut down server.
  • Expected Result: Identical behavior across all three transports.
  • Traceability: F-006, F-007, F-010

TC-INT-003: Error flow -- MCP client calls non-existent tool

  • Priority: P0
  • Type: Integration
  • Preconditions: Server running with mock registry (1 module: image.resize)
  • Test Data:
    tool_name = "nonexistent.tool"
    arguments = {"key": "value"}
    
  • Steps:
  • Build tools from registry (only image.resize).
  • Call router.handle_call("nonexistent.tool", {"key": "value"}).
  • Assert result.isError is True.
  • Assert text contains "Module not found: nonexistent.tool".
  • Expected Result: Error result with module not found message.
  • Traceability: F-003 (AC2), F-004

TC-INT-004: to_openai_tools() roundtrip -- format matches OpenAI spec

  • Priority: P0
  • Type: Integration
  • Preconditions: Registry with 3 modules of varying complexity
  • Test Data: Three descriptors with simple, nested ($ref), and empty schemas.
  • Steps:
  • Call to_openai_tools(registry).
  • Assert result is a list of 3 dicts.
  • Validate each dict against OpenAI tool schema structure.
  • Assert all module IDs are normalized (no dots).
  • Assert all parameters are valid JSON Schema.
  • Expected Result: Output is valid OpenAI tools format. All $refs are inlined. All module IDs are normalized.
  • Traceability: F-008

TC-INT-005: Executor passthrough with ACL enforcement

  • Priority: P1
  • Type: Integration
  • Preconditions: Real Executor with ACL configured, real Registry with 2 modules
  • Test Data:
    # Module "public.tool" -- allowed
    # Module "private.tool" -- denied by ACL
    executor.call_async("public.tool", {}) -> {"result": "ok"}
    executor.call_async("private.tool", {}) -> raises ACLDeniedError
    
  • Steps:
  • Create ExecutionRouter with real Executor.
  • Call router.handle_call("public.tool", {}) -- assert success.
  • Call router.handle_call("private.tool", {}) -- assert isError=True with "Access denied".
  • Expected Result: ACL enforcement works through the full stack.
  • Traceability: F-014, F-003 (AC4)

TC-INT-006: Dynamic registration -- add module while server components are running

  • Priority: P1
  • Type: Integration
  • Preconditions: RegistryListener started, MCPServerFactory, mock Registry
  • Test Data:
    new_descriptor = ModuleDescriptor(
        module_id="new.module",
        description="Dynamically added module",
        input_schema={"type": "object", "properties": {"x": {"type": "integer"}}},
        output_schema={}
    )
    
  • Steps:
  • Start with empty tool list.
  • Simulate registry register event for "new.module".
  • Assert listener.tools now contains "new.module".
  • Build tool list from listener -- assert 1 tool present.
  • Expected Result: New tool appears in the tool list after dynamic registration.
  • Traceability: F-015

7. E2E Test Cases (TC-E2E-xxx)

Test Directory: tests/e2e/


TC-E2E-001: CLI server startup, MCP client connection, tool execution, shutdown

  • Priority: P0
  • Type: E2E
  • Preconditions: Test extensions directory with at least 1 mock apcore module, MCP client library available
  • Test Data:
    extensions_dir = "/tmp/test_extensions"  # contains a simple mock module
    cli_args = ["python", "-m", "apcore_mcp", "--extensions-dir", extensions_dir]
    
  • Steps:
  • Create a temp directory with a mock apcore module.
  • Start python -m apcore_mcp --extensions-dir <temp_dir> as a subprocess.
  • Connect MCP client via stdio (reading subprocess stdout, writing to stdin).
  • Send tools/list request.
  • Assert response contains the mock module as a tool.
  • Send tools/call request for the mock module.
  • Assert response contains expected output.
  • Terminate subprocess (SIGTERM).
  • Assert clean shutdown (exit code 0).
  • Expected Result: Full lifecycle works: startup, discovery, connection, tool listing, tool call, shutdown.
  • Traceability: F-005, F-006, F-009

TC-E2E-002: HTTP transport server with tool execution

  • Priority: P0
  • Type: E2E
  • Preconditions: Test extensions directory, available port
  • Test Data:
    cli_args = ["python", "-m", "apcore_mcp", "--extensions-dir", extensions_dir,
                "--transport", "streamable-http", "--port", "18765"]
    
  • Steps:
  • Start server subprocess with streamable-http transport.
  • Wait for server to be ready (poll health or retry connect).
  • Connect MCP client via HTTP to http://127.0.0.1:18765.
  • List tools.
  • Call a tool.
  • Assert success response.
  • Shut down server.
  • Expected Result: Full HTTP transport lifecycle works.
  • Traceability: F-007, F-009

TC-E2E-003: Hot-reload -- add module while server is running

  • Priority: P1
  • Type: E2E
  • Preconditions: Server running with initial modules, programmatic access to Registry
  • Test Data: Initial: 2 modules. After add: 3 modules.
  • Steps:
  • Start server with 2 modules.
  • Connect MCP client. List tools -- assert 2 tools.
  • Register a new module to the registry.
  • Wait for tool list change notification (or re-list tools).
  • List tools -- assert 3 tools.
  • Call the new tool -- assert success.
  • Expected Result: Dynamically added module appears as a callable tool.
  • Traceability: F-015

TC-E2E-004: Multi-module registry with mixed annotations

  • Priority: P1
  • Type: E2E
  • Preconditions: Registry with modules having varied annotation combinations
  • Test Data:
    modules = [
        ("reader.get", ModuleAnnotations(readonly=True, destructive=False, idempotent=True, requires_approval=False, open_world=False)),
        ("writer.delete", ModuleAnnotations(readonly=False, destructive=True, idempotent=False, requires_approval=True, open_world=True)),
        ("worker.process", None),  # No annotations
    ]
    
  • Steps:
  • Start server with 3 modules.
  • Connect MCP client and list tools.
  • For reader.get: assert readOnlyHint=True, idempotentHint=True.
  • For writer.delete: assert destructiveHint=True.
  • For worker.process: assert default annotation values.
  • Expected Result: All annotation combinations correctly mapped per module.
  • Traceability: F-002

TC-E2E-005: Health check endpoint on HTTP transport

  • Priority: P2
  • Type: E2E
  • Preconditions: Server running with streamable-http transport, 3 modules registered
  • Test Data: N/A
  • Steps:
  • Start server on streamable-http.
  • Send GET request to http://127.0.0.1:<port>/health.
  • Assert HTTP 200 response.
  • Parse JSON body.
  • Assert status field present.
  • Assert tools_count equals 3.
  • Assert uptime_seconds is a positive number.
  • Expected Result: JSON body {"status": "ok", "tools_count": 3, "uptime_seconds": <positive float>}.
  • Traceability: F-019

TC-E2E-006: MCP Resource exposure for documented modules

  • Priority: P2
  • Type: E2E
  • Preconditions: Server with one module having documentation field and one without
  • Test Data:
    doc_module = ModuleDescriptor(module_id="api.info", description="Info", documentation="Full API documentation text here.", input_schema={}, output_schema={})
    no_doc_module = ModuleDescriptor(module_id="api.ping", description="Ping", documentation=None, input_schema={}, output_schema={})
    
  • Steps:
  • Start server with both modules.
  • Connect MCP client and list resources.
  • Assert resource docs://api.info exists with documentation text.
  • Assert no resource for docs://api.ping.
  • Expected Result: Only modules with non-empty documentation are exposed as MCP Resources.
  • Traceability: F-020

TC-E2E-007: Claude Desktop configuration integration (manual verification)

  • Priority: P0
  • Type: E2E (Manual)
  • Preconditions: Claude Desktop installed, apcore-mcp installed
  • Test Data:
    {
        "mcpServers": {
            "apcore-test": {
                "command": "python",
                "args": ["-m", "apcore_mcp", "--extensions-dir", "/path/to/test/extensions"]
            }
        }
    }
    
  • Steps:
  • Create test extensions directory with 2 mock modules.
  • Add configuration to Claude Desktop claude_desktop_config.json.
  • Restart Claude Desktop.
  • Open a new conversation.
  • Verify tools appear in Claude's tool list.
  • Invoke a tool via natural language prompt.
  • Verify tool result is returned correctly.
  • Expected Result: Claude Desktop discovers and can invoke apcore modules as tools.
  • Traceability: F-006

TC-E2E-008: Verify server with zero modules starts with warning

  • Priority: P1
  • Type: E2E
  • Preconditions: Empty extensions directory
  • Test Data: args = ["--extensions-dir", "/tmp/empty_extensions"]
  • Steps:
  • Create empty temp directory.
  • Start server via CLI.
  • Connect MCP client.
  • List tools -- assert empty list.
  • Assert server logs contain warning about no modules.
  • Expected Result: Server starts successfully with 0 tools and logs a warning.
  • Traceability: F-005 (AC7), F-009 (AC6)

8. Performance Test Cases (TC-PERF-xxx)

Test File: tests/performance/test_benchmarks.py


TC-PERF-001: Schema conversion for 100 modules under 100ms

  • Priority: P0
  • Type: Performance
  • Preconditions: SchemaConverter, MCPServerFactory, 100 mock ModuleDescriptors
  • Test Data: 100 ModuleDescriptors with schemas of 5-10 properties each, including 20% with $ref nodes.
  • Steps:
  • Create 100 ModuleDescriptor instances programmatically.
  • Start timer.
  • Call factory.build_tools(registry) where registry returns all 100.
  • Stop timer.
  • Assert elapsed time < 100ms.
  • Expected Result: 100-module tool building completes in under 100ms.
  • Traceability: F-001, F-005

TC-PERF-002: Tool call routing overhead under 5ms

  • Priority: P0
  • Type: Performance
  • Preconditions: ExecutionRouter with mock Executor. call_async returns instantly (no-op).
  • Test Data: tool_name = "test.tool", arguments = {"key": "value"}
  • Steps:
  • Mock executor.call_async to return {"result": "ok"} with zero processing time.
  • Run router.handle_call(...) 1000 times.
  • Compute average time per call (total / 1000).
  • Assert average overhead < 5ms.
  • Expected Result: Average routing overhead (handle_call minus executor call) is under 5ms.
  • Traceability: F-003

TC-PERF-003: Memory overhead under 10MB for 100 modules

  • Priority: P0
  • Type: Performance
  • Preconditions: 100 ModuleDescriptors with realistic schemas (10 properties each)
  • Test Data: 100 programmatically generated descriptors.
  • Steps:
  • Start tracemalloc.
  • Take snapshot_before.
  • Call factory.build_tools(registry) with 100 modules.
  • Take snapshot_after.
  • Compute memory difference.
  • Assert difference < 10MB (10 * 1024 * 1024 bytes).
  • Expected Result: Memory increase from building 100 tool definitions is under 10MB.
  • Traceability: F-005

TC-PERF-004: 10 concurrent tool calls handled correctly

  • Priority: P1
  • Type: Performance
  • Preconditions: ExecutionRouter with mock Executor. call_async has 10ms simulated delay.
  • Test Data: 10 different tool calls with distinct arguments.
  • Steps:
  • Launch 10 concurrent router.handle_call() tasks via asyncio.gather().
  • Assert all 10 complete within 500ms (parallel, not sequential).
  • Assert all 10 results are successful.
  • Assert no result cross-contamination.
  • Expected Result: All 10 calls succeed in parallel. Total time is near 10ms (parallel), not 100ms (sequential).
  • Traceability: F-003, F-007 (AC4)

TC-PERF-005: Large schema with 50+ properties converts correctly

  • Priority: P1
  • Type: Performance
  • Preconditions: SchemaConverter instance
  • Test Data: Schema with 50 properties, 5 nested objects, 3 $ref nodes:
    properties = {f"field_{i}": {"type": "string"} for i in range(50)}
    properties["nested_obj"] = {"$ref": "#/$defs/NestedConfig"}
    input_schema = {
        "type": "object",
        "properties": properties,
        "required": [f"field_{i}" for i in range(10)],
        "$defs": {
            "NestedConfig": {"type": "object", "properties": {"x": {"type": "integer"}, "y": {"type": "integer"}}}
        }
    }
    
  • Steps:
  • Start timer.
  • Call converter.convert_input_schema(descriptor).
  • Stop timer.
  • Assert elapsed time < 50ms.
  • Assert result has 51 properties (50 + nested_obj inlined).
  • Assert no $defs in result.
  • Expected Result: Large schema converted correctly and quickly (under 50ms).
  • Traceability: F-001

TC-PERF-006: to_openai_tools for 100 modules under 200ms

  • Priority: P1
  • Type: Performance
  • Preconditions: 100 ModuleDescriptors, OpenAIConverter
  • Test Data: 100 modules with varied schemas.
  • Steps:
  • Start timer.
  • Call to_openai_tools(registry).
  • Stop timer.
  • Assert elapsed < 200ms.
  • Assert result has 100 items.
  • Expected Result: 100-module OpenAI conversion under 200ms.
  • Traceability: F-008

TC-PERF-007: Memory scales linearly with module count

  • Priority: P2
  • Type: Performance
  • Preconditions: tracemalloc, varying module counts
  • Test Data: Module counts: 10, 50, 100, 500
  • Steps:
  • For each count N, measure memory used by build_tools(registry_with_N_modules).
  • Compute memory per tool (total / N).
  • Assert memory per tool is roughly constant (< 100KB per tool).
  • Assert total for 500 modules < 50MB.
  • Expected Result: Linear scaling. Per-tool memory is approximately constant.
  • Traceability: F-005

9. Security Test Cases (TC-SEC-xxx)

Test File: tests/security/test_security.py


TC-SEC-001: ACL denied calls do not leak caller identity

  • Priority: P0
  • Type: Security
  • Preconditions: ExecutionRouter with mock Executor raising ACLDeniedError
  • Test Data:
    error = ACLDeniedError(caller_id="admin_user_42", target_id="secret.module")
    
  • Steps:
  • Configure mock executor to raise the error.
  • Call router.handle_call("secret.module", {}).
  • Assert result.content[0].text does NOT contain "admin_user_42".
  • Assert text does NOT contain "secret.module".
  • Assert text equals exactly "Access denied".
  • Expected Result: No sensitive information in the error response.
  • Traceability: F-004 (AC3)

TC-SEC-002: Unexpected exceptions do not leak stack traces

  • Priority: P0
  • Type: Security
  • Preconditions: ExecutionRouter with mock Executor raising various exceptions
  • Test Data:
    exceptions = [
        RuntimeError("Connection refused to postgres://admin:secret@db:5432/prod"),
        FileNotFoundError("/etc/shadow"),
        PermissionError("Cannot access /root/.ssh/id_rsa"),
        ValueError("Invalid token: eyJhbGciOiJIUzI1NiJ9..."),
    ]
    
  • Steps:
  • For each exception, call router.handle_call(...).
  • Assert result.content[0].text equals "Internal error occurred".
  • Assert response text does NOT contain any of: "postgres", "secret", "/etc/shadow", "/root/.ssh", "eyJhbG".
  • Expected Result: All sensitive details are stripped. Only "Internal error occurred" returned.
  • Traceability: F-004 (AC4)

TC-SEC-003: Malformed JSON arguments handled safely

  • Priority: P1
  • Type: Security
  • Preconditions: ExecutionRouter with mock Executor that raises SchemaValidationError for bad input
  • Test Data:
    malformed_inputs = [
        {"width": "<script>alert(1)</script>"},
        {"width": "'; DROP TABLE users; --"},
        {"width": "{{7*7}}"},
        {"width": "\x00\x01\x02"},
    ]
    
  • Steps:
  • For each input, call router.handle_call("image.resize", input).
  • Assert either schema validation error or success (depending on type coercion).
  • Assert no unhandled exception.
  • Assert response does NOT echo back the malicious input unescaped.
  • Expected Result: All malformed inputs handled gracefully. No injection or crash.
  • Traceability: F-003, F-004

TC-SEC-004: Oversized input handled gracefully

  • Priority: P1
  • Type: Security
  • Preconditions: ExecutionRouter with mock Executor
  • Test Data:
    arguments = {"data": "A" * 10_000_000}  # 10MB string
    
  • Steps:
  • Call router.handle_call("data.process", arguments).
  • Assert no memory exhaustion crash.
  • Assert either success or error result returned.
  • Expected Result: Handled without crash. Executor's validation may reject or accept.
  • Traceability: F-003

TC-SEC-005: Error mapper never exposes call chain details

  • Priority: P0
  • Type: Security
  • Preconditions: ErrorMapper
  • Test Data:
    errors_with_chains = [
        CallDepthExceededError(depth=10, max_depth=5, call_chain=["module.a", "module.b", "module.c", "module.d"]),
        CircularCallError(module_id="module.x", call_chain=["module.x", "module.y", "module.x"]),
        CallFrequencyExceededError(module_id="spam", count=100, max_repeat=10, call_chain=["spam", "spam"]),
    ]
    
  • Steps:
  • For each error, call mapper.to_mcp_error(error).
  • Assert result text does NOT contain "module.a", "module.b", "module.x", "module.y".
  • Assert text does NOT contain "call_chain".
  • Expected Result: Call chain internals never exposed in MCP responses.
  • Traceability: F-004

TC-SEC-006: Default HTTP host is localhost only

  • Priority: P1
  • Type: Security
  • Preconditions: Inspect serve() and TransportManager defaults
  • Test Data: N/A
  • Steps:
  • Assert serve() default host parameter is "127.0.0.1".
  • Assert TransportManager.run_streamable_http default host is "127.0.0.1".
  • Assert TransportManager.run_sse default host is "127.0.0.1".
  • Expected Result: All defaults bind to localhost only. No accidental network exposure.
  • Traceability: F-007 (AC3)

10. Test Data Specification

10.1 Sample ModuleDescriptor Instances

Simple Descriptor (image.resize)

SIMPLE_DESCRIPTOR = ModuleDescriptor(
    module_id="image.resize",
    name="Image Resize",
    description="Resize an image to the specified dimensions",
    documentation="Resizes the input image to the target width and height using bicubic interpolation.",
    input_schema={
        "type": "object",
        "title": "ImageResizeInput",
        "properties": {
            "width": {"type": "integer", "description": "Target width in pixels"},
            "height": {"type": "integer", "description": "Target height in pixels"},
            "format": {"type": "string", "default": "png", "enum": ["png", "jpg", "webp"]}
        },
        "required": ["width", "height"]
    },
    output_schema={
        "type": "object",
        "properties": {
            "status": {"type": "string"},
            "path": {"type": "string"}
        },
        "required": ["status", "path"]
    },
    version="1.0.0",
    tags=["image", "transform"],
    annotations=ModuleAnnotations(
        readonly=False,
        destructive=False,
        idempotent=True,
        requires_approval=False,
        open_world=True
    ),
    examples=[]
)

Complex Descriptor with $ref (workflow.execute)

COMPLEX_DESCRIPTOR = ModuleDescriptor(
    module_id="workflow.execute",
    name="Execute Workflow",
    description="Execute a workflow with parameters",
    documentation=None,
    input_schema={
        "type": "object",
        "properties": {
            "workflow_name": {"type": "string"},
            "parameters": {"$ref": "#/$defs/WorkflowParams"}
        },
        "required": ["workflow_name", "parameters"],
        "$defs": {
            "WorkflowParams": {
                "type": "object",
                "properties": {
                    "seed": {"type": "integer", "default": 42},
                    "steps": {"type": "integer", "default": 20}
                }
            }
        }
    },
    output_schema={},
    version="2.0.0",
    tags=["workflow"],
    annotations=ModuleAnnotations(
        readonly=False,
        destructive=True,
        idempotent=False,
        requires_approval=True,
        open_world=True
    ),
    examples=[]
)

Edge Case -- Empty Schema (system.ping)

EMPTY_SCHEMA_DESCRIPTOR = ModuleDescriptor(
    module_id="system.ping",
    name="System Ping",
    description="Health check endpoint",
    documentation=None,
    input_schema={},
    output_schema={"type": "object", "properties": {"status": {"type": "string"}}},
    version="1.0.0",
    tags=["system"],
    annotations=None,
    examples=[]
)

Edge Case -- Destructive with Approval (file.delete)

DESTRUCTIVE_DESCRIPTOR = ModuleDescriptor(
    module_id="file.delete",
    name="File Delete",
    description="Permanently delete a file",
    documentation="Deletes the specified file from disk. This action is irreversible.",
    input_schema={
        "type": "object",
        "properties": {
            "path": {"type": "string", "description": "File path to delete"},
            "force": {"type": "boolean", "default": False}
        },
        "required": ["path"]
    },
    output_schema={"type": "object", "properties": {"deleted": {"type": "boolean"}}},
    version="1.0.0",
    tags=["file", "danger"],
    annotations=ModuleAnnotations(
        readonly=False,
        destructive=True,
        idempotent=True,
        requires_approval=True,
        open_world=False
    ),
    examples=[]
)

Edge Case -- Read-Only (data.query)

READONLY_DESCRIPTOR = ModuleDescriptor(
    module_id="data.query",
    name="Data Query",
    description="Query data from the database",
    documentation=None,
    input_schema={
        "type": "object",
        "properties": {
            "table": {"type": "string"},
            "limit": {"type": "integer", "default": 100}
        },
        "required": ["table"]
    },
    output_schema={},
    version="1.0.0",
    tags=["data", "read"],
    annotations=ModuleAnnotations(
        readonly=True,
        destructive=False,
        idempotent=True,
        requires_approval=False,
        open_world=False
    ),
    examples=[]
)

10.2 Expected MCP Tool Definition (for image.resize)

EXPECTED_MCP_TOOL = types.Tool(
    name="image.resize",
    description="Resize an image to the specified dimensions",
    inputSchema={
        "type": "object",
        "title": "ImageResizeInput",
        "properties": {
            "width": {"type": "integer", "description": "Target width in pixels"},
            "height": {"type": "integer", "description": "Target height in pixels"},
            "format": {"type": "string", "default": "png", "enum": ["png", "jpg", "webp"]}
        },
        "required": ["width", "height"]
    },
    annotations=ToolAnnotations(
        read_only_hint=False,
        destructive_hint=False,
        idempotent_hint=True,
        open_world_hint=True
    )
)

10.3 Expected OpenAI Tool Definition (for image.resize)

EXPECTED_OPENAI_TOOL = {
    "type": "function",
    "function": {
        "name": "image__resize",
        "description": "Resize an image to the specified dimensions",
        "parameters": {
            "type": "object",
            "title": "ImageResizeInput",
            "properties": {
                "width": {"type": "integer", "description": "Target width in pixels"},
                "height": {"type": "integer", "description": "Target height in pixels"},
                "format": {"type": "string", "default": "png", "enum": ["png", "jpg", "webp"]}
            },
            "required": ["width", "height"]
        }
    }
}

10.4 Expected OpenAI Tool with strict=True (for image.resize)

EXPECTED_OPENAI_STRICT_TOOL = {
    "type": "function",
    "function": {
        "name": "image__resize",
        "description": "Resize an image to the specified dimensions",
        "parameters": {
            "type": "object",
            "properties": {
                "width": {"type": "integer", "description": "Target width in pixels"},
                "height": {"type": "integer", "description": "Target height in pixels"},
                "format": {"type": ["string", "null"], "enum": ["png", "jpg", "webp", None]}
            },
            "required": ["width", "height", "format"],
            "additionalProperties": False
        },
        "strict": True
    }
}

11. Test Execution Plan

Phase 1 (Week 1): Core Adapters

Day Activity Test Cases
Day 1 SchemaConverter unit tests TC-SCHEMA-001 to TC-SCHEMA-012
Day 2 AnnotationMapper unit tests TC-ANNOT-001 to TC-ANNOT-010
Day 3 ErrorMapper unit tests TC-ERROR-001 to TC-ERROR-011
Day 4 OpenAIConverter unit tests + ModuleIDNormalizer TC-OPENAI-001 to TC-OPENAI-012
Day 5 Review, fix failures, achieve 90% coverage for adapters/converters All Phase 1 tests pass

Phase 1 Exit Criteria: All 45 adapter/converter unit tests pass. >= 90% coverage on adapters/ and converters/.

Phase 2 (Week 2): Server Components + Integration

Day Activity Test Cases
Day 1 ExecutionRouter unit tests TC-EXEC-001 to TC-EXEC-012
Day 2 MCPServerFactory unit tests TC-SERVER-001 to TC-SERVER-010
Day 3 TransportManager + CLI unit tests TC-TRANSPORT-001 to TC-TRANSPORT-009, TC-CLI-001 to TC-CLI-010
Day 4 RegistryListener unit tests TC-DYNAMIC-001 to TC-DYNAMIC-007
Day 5 Integration tests TC-INT-001 to TC-INT-006

Phase 2 Exit Criteria: All unit + integration tests pass. >= 90% coverage on server/. Integration flows verified.

Phase 3 (Week 3): E2E + Performance + Security + Polish

Day Activity Test Cases
Day 1 E2E tests (stdio, HTTP) TC-E2E-001 to TC-E2E-004
Day 2 E2E tests (dynamic, annotations, resources) TC-E2E-005 to TC-E2E-008
Day 3 Performance tests TC-PERF-001 to TC-PERF-007
Day 4 Security tests TC-SEC-001 to TC-SEC-006
Day 5 Claude Desktop manual verification, regression, coverage report TC-E2E-007 (manual), full regression

Phase 3 Exit Criteria: All automated tests pass. Performance benchmarks met. Security tests pass. Manual Claude Desktop verification documented. >= 90% total line coverage.


12. Quality Gates

12.1 Definition of Done -- Per Phase

Phase Criteria
Phase 1 All TC-SCHEMA, TC-ANNOT, TC-ERROR, TC-OPENAI tests pass; >= 90% coverage on adapters/converters
Phase 2 All TC-EXEC, TC-SERVER, TC-TRANSPORT, TC-CLI, TC-DYNAMIC, TC-INT tests pass; >= 90% coverage on server/
Phase 3 All TC-E2E, TC-PERF, TC-SEC tests pass; >= 90% total coverage; Claude Desktop verified

12.2 Release Criteria

  1. All P0 test cases pass (100% pass rate).
  2. All P1 test cases pass (>= 95% pass rate; any failures documented with workarounds).
  3. Line coverage >= 90% on src/apcore_mcp/ (measured by pytest-cov).
  4. All performance benchmarks met (TC-PERF-001 through TC-PERF-005).
  5. All security tests pass (TC-SEC-001 through TC-SEC-006).
  6. Claude Desktop integration manually verified (TC-E2E-007).
  7. No known P0 or P1 bugs at release time.
  8. Zero test flakiness -- all tests pass 3 consecutive runs.

12.3 Regression Strategy

Event Regression Suite
Every commit Unit tests (fast, < 30 seconds)
Every pull request Unit + Integration + Security (< 3 minutes)
Nightly CI Full suite: Unit + Integration + E2E + Perf + Security
Dependency update (apcore) Full suite + manual MCP client check
Dependency update (mcp SDK) Full suite + manual Claude Desktop check
Pre-release Full suite 3x + Claude Desktop manual verification

13. Appendix

13.1 Glossary

Term Definition
TDD Test-Driven Development -- tests written before implementation
MCP Model Context Protocol -- Anthropic's tool integration protocol
SUT System Under Test
Mock Test double that simulates behavior of a dependency
Fixture Reusable test setup/teardown code managed by pytest
P0/P1/P2 Priority tiers: P0 = must have, P1 = should have, P2 = nice to have
Coverage Percentage of source code lines executed during test runs
Flaky test Test that intermittently passes/fails without code changes
Regression Re-running existing tests to detect newly introduced bugs

13.2 Test Case ID Naming Convention

TC-<COMPONENT>-<NNN>

Components:
  SCHEMA    = Schema Converter (adapters/schema.py)
  ANNOT     = Annotation Mapper (adapters/annotations.py)
  EXEC      = Execution Router (server/router.py)
  ERROR     = Error Mapper (adapters/errors.py)
  SERVER    = MCP Server Factory (server/factory.py)
  OPENAI    = OpenAI Converter (converters/openai.py)
  TRANSPORT = Transport Manager (server/transport.py)
  CLI       = CLI Module (__main__.py)
  DYNAMIC   = Dynamic Registry Listener (server/listener.py)
  INT       = Integration tests
  E2E       = End-to-end tests
  PERF      = Performance tests
  SEC       = Security tests

NNN = Zero-padded sequential number (001, 002, ...)

13.3 References

Reference Description
docs/prd-apcore-mcp.md Product Requirements Document v1.0
docs/tech-design-apcore-mcp.md Technical Design Document v1.0
IEEE 829 Standard for Software Test Documentation
ISTQB Foundation Level Syllabus International Software Testing Qualifications Board
Google Testing Blog Best practices for test strategy and test pyramid
pytest documentation https://docs.pytest.org/
pytest-asyncio documentation https://pytest-asyncio.readthedocs.io/
MCP Specification https://modelcontextprotocol.io/
OpenAI Function Calling Docs https://platform.openai.com/docs/guides/function-calling

End of Test Plan & Test Cases Document