System Design

Architecture Notes

A concrete view of the systems behind Buzo: device onboarding, edge inference, agent memory, cloud orchestration and sensor-aware workflows.

Buzo Architecture

Hybrid edge + cloud system for voice, vision, memory and physical-world action.

Device IOEdge inferenceAgent runtimeCloud reasoningIntegrations

Memory Systems

Local short-term state plus semantic long-term memory for agent continuity.

Session memoryVector retrievalSummariesUser preferencesSafety context

Agent Framework

Tool-aware planning loop that decides when to answer, remember, alert or call an integration.

Intent routingTool callingReasoningPlanningObservability

Edge Inference Pipeline

Low-latency local models for wake, transcription, detection and first-pass interpretation, evolved from the earlier pocket_ai R&D pipeline.

Whisper TinyYOLO11ONNX depthSpatial reasoningPhi-3Qwen

Cloud Orchestration

Escalates expensive reasoning, knowledge retrieval and integrations to cloud services.

LLMsVision modelsKnowledge APIsNotificationsAudit logs

BLE Onboarding Flow

Phone provisions Wi-Fi, identity and device settings over BLE before the device joins cloud flows.

DiscoveryPairingWi-Fi setupToken exchangeDevice registration

Sensor Stack

A modular stack for camera, microphone and future environmental sensors.

CameraMicrophoneMotionPresenceSmart-home inputs

Android Assistant Prototype

Native Android agent loop proving mobile assistant primitives before they move into Buzo hardware.

STTLLM tool callingTTSAccessibility controlMemoryNotifications

Qwen Hardware Accelerator Research

Software stack exploration for Qwen3-8B on hardwired analog compute-in-memory silicon.

3-bit weights8-bit activationsGQALoRA adaptersOpenAI-compatible API

Sequence Diagrams

Voice request

Wake word -> Whisper Tiny -> intent routing -> local memory -> cloud LLM when needed -> spoken response

Vision alert

Camera frame -> YOLO11 detection -> spatial reasoning -> edge rule -> Telegram alert -> memory event

BLE setup

Mobile app -> BLE discovery -> credential transfer -> device registration -> cloud sync

Memory write

Conversation event -> summarizer -> embedding -> local index -> cloud backup policy

Android hands

User request -> LLM tool call -> accessibility read/tap/type/swipe -> result returned to agent loop

Hardwired Qwen path

FP16 Qwen3 -> 3-bit ROM quantization -> golden simulator -> LoRA fusion -> OpenAI-compatible serving