System Architecture: The WPipe Engine

This document details the internal design and engineering philosophy behind wpipe v2.4.0-LTS. Our goal is to provide a zero-boilerplate orchestration engine that remains robust under industrial loads.

Philosophy: "Code is the Config"

Unlike XML or YAML-heavy orchestrators, WPipe treats your Python code as the source of truth. We provide a thin but powerful layer of resiliency and observability over pure logic.

1. The “Warehouse” Model

The core mental model of WPipe is the Warehouse.

  • Accumulation: Instead of steps passing specific arguments to each other, every step updates a global “Warehouse” dictionary.

  • Context Awareness: Every task has visibility into all previous results, allowing for complex decision-making without complex parameter passing.

  • Atomic Updates: WPipe ensures that context updates are merged safely, even in high-concurrency scenarios.

WPipe Architecture Diagram

2. Core Internal Layers

WPipe is structured into four specialized layers:

🧠 Orchestration Layer

The Pipeline and PipelineAsync engines. They manage the DAG execution, handle step registration, and coordinate the flow between components.

💾 Persistence Layer (WSQLite)

A unified abstraction over SQLite that forces WAL (Write-Ahead Logging) mode. This ensures that tracking logs and metrics never block your primary data tasks.

📊 Observability Layer

Includes the ResourceMonitor (CPU/RAM tracking) and the PipelineTracker. It captures performance data and forensic error details.

🌐 Integration Layer

The APIClient and Dashboard. It allows pipelines to communicate with central command centers and provides real-time visual feedback.

3. Persistence Strategy: WSQLite

One of the most critical components of v2.4.0-LTS is the WSQLite unification.

  • Zero Raw SQL: All internal tracking (logs, steps, metrics) is handled through Pydantic-mapped models.

  • Thread Safety: Connection pooling and locking mechanisms are built-in to prevent Database is locked errors during parallel execution.

  • WAL Mode: Mandatory Write-Ahead Logging allows simultaneous reads (Dashboard) and writes (Pipeline) with zero latency.

4. Resiliency & Reliability

WPipe is “Resilient by Design”. This is achieved through two main features:

4.1 Smart Checkpointing

Checkpoints are not just periodic saves. They are logic-gated milestones. You can define a checkpoint that only triggers if a specific data validation passes, ensuring you never save a “corrupt” state.

4.2 Forensic Error Capture

When a step fails, the LogGestor captures: * The exact state of the Warehouse. * The file path and line number of the exception. * The system metrics (RAM/CPU) at the moment of failure.

5. Concurrency Model

WPipe provides a dual-executor model to bypass the limitations of Python:

Executor

Backend

Best for

IO-Bound

ThreadPoolExecutor

API calls, DB queries, Web scraping.

CPU-Bound

ProcessPoolExecutor

Image processing, AI inference, Heavy math.

Async

asyncio loop

High-concurrency network tasks.

6. Module Hierarchy

wpipe/
├── pipe/               # Core Engine (Sync/Async/DAG)
├── parallel/           # Concurrency Executors
├── sqlite/             # WSQLite Unification
├── tracking/           # Metrics & Alerts
├── resource_monitor/   # CPU/RAM Probing
├── checkpoint/         # Resiliency Logic
└── dashboard/          # Visual Intelligence

Next Steps