Vision & Media Intelligence: Unified Content Orchestration

The Intent: Bridging the Vision-to-Signal Gap

Large-scale video and vision data often remain “dark”—unstructured and unsearchable. The intent of this platform was to bridge the gap between raw visual data and actionable business signals. This began as a “scrappy” feasibility study—Eric’s Vibe-Coded Prototype—which proved that Gemini could handle one-hour user study videos. I identified this as a strategic multiplier and architected its transition into a production-grade 9-pass agentic pipeline, creating a sovereign research infrastructure that automates high-fidelity insight extraction while maintaining human-in-the-loop trust.

The platform orchestrates complex multi-modal pipelines that handle:

NextGen Video Analysis (9-Pass Pipeline): Automating transcription, persona extraction, and strategic synthesis for UXR teams.
Latent Content Grounding: Deep-linking every insight to exact video timestamps for visceral stakeholder impact.
Real-Time Analytics (ScaleCM): Translating computer vision signals from retail environments into live dashboard metrics.

The Build: From Vibe-Coded Prototype to Platform

The evolution of NextGen Video followed a path of increasing architectural complexity:

Initial Feasibility: Manually verified Gemini’s context window for long-form video extraction.
LLM-Assisted Prototyping: Leveraged Claude (Opus) as a development partner to iterate on Google’s multimodal API snippets.
Infrastructure Hardening: Implemented a Google Cloud Storage (GCS) middleman for large file handling and deployed the system as twin Cloud Run services (UI + Backend).
Prompt Decomposition: Solved “janky” analysis by breaking the single-prompt approach into a multi-step pipeline: Transcription -> Persona ID -> Use Case Extraction -> Walkthrough Analysis -> JSON Structuring.

Figure 3.1: NextGen Video — Automated Multi-Modal Transcription & Diarization

Technical Deep Dive: The 9-Pass Agentic Architecture

Figure 3.2: Pipeline Orchestration & Forensic Trace View

1. The Prompt Depot (Standardization)

Transitioned from hard-coded strings to a managed directory of Specialized Agents. A DispatchAgent identifies the pass and fetches the corresponding persona-driven prompt:

The Persona Extractor: Identifies “Prompters,” “Citizen Developers,” and “Agent Scalers.”
The Walkthrough Architect: Documents prototype interactions with 100% temporal coverage.
The Strategic Synthesis Agent: Generates prioritized P0-P2 recommendations linked to verbatim evidence.
The LLM Judge: Performs automated QA, auditing quotes for evidence strength and priority calibration.

2. Human-in-the-Loop (Trust)

Introduced Pause Points to mitigate “AI Black Box” risks:

Checkpoint A: Manual review of transcript accuracy and speaker diarization before persona extraction.
Checkpoint B: Validation of extracted “Aha!” moments before final synthesis.
UI Interventions: Added “Thumbs Up/Down” and “Edit” buttons to allow researchers to correct hallucinations and nudge timestamps in real-time.

3. Forensic Mitigation Layer

In architecting the NextGen pipeline, I implemented specific guardrails to ensure high-fidelity signals:

Speaker Hallucinations: Addressed “speaker collapse” during cross-talk where Gemini merges distinct personas into a single “Super User.”
The Leading Question Trap: Configured the Insight Agent to distinguish between genuine user delight and “polite” answers to researcher leading questions.
Timestamp Drift: Implemented HITL “nudge” tools to correct the 5-10 second drift often seen in LLM-extracted timestamps, ensuring insights start exactly when the user begins speaking.

Figure 3.3: Automated Insight Extraction & Evidence Grounding

4. Actionable Integration (Velocity)

Direct integration with Buganizer. Validated pain points are exported with a single click, automatically populating engineering tickets with AI-generated descriptions, verbatim quotes, and direct video deep-links.

ScaleCM: The Strategic Benchmark Dashboard

The ScaleCM project evolved from a realization that Google’s release-critical benchmarking was trapped in “spreadsheet hell.” In a series of architectural consultations, I identified that the ScaleCM Program—Google’s mandatory usability benchmark for all product releases—was bottlenecked by a massive, multi-tab “Golden Sheet” that was trying to perform project management, vendor tracking, and executive reporting simultaneously.

The Intervention: From Spreadsheet to Sovereign Truth

My approach was to move beyond a “one-off prototype” and build a formal Benchmark Pipeline:

Consolidated Visibility: Architected a high-fidelity dashboard that provides leadership (VP/Director level) with an instant view of Product Health Scores and CUJ (Core User Journey) success rates.
Architecture Shift (Source vs. Surface): To minimize process disruption for researchers, I kept the spreadsheet as the initial “Source of Truth” for data entry but built a web-based Control Plane to serve as the immersive “Surface of Truth” for stakeholders.
The Hub Vision: Centralized scattered resources—study plans, vendor reports, and CUJ definitions—into a single hub, rescuing organizational visibility from fragmented documentation.

The Evolution: Toward an Agentic Control Plane

The ScaleCM dashboard was designed to scale through three distinct architectural phases:

V1: The Visibility Layer: Immediate consolidation of statistics, health scores, and Task Completion Rates (TCR) across all product pillars (Gemini, Agents, APIs).
V2: Actionable Integration: Bidirectional syncing between benchmark failures and Buganizer, ensuring that products cannot “flip the bit” to GA without addressing critical P0/P1 usability bugs.
V3: Agentic Synthesis: Implementing LLM-driven pipelines to ingest lengthy vendor reports and generate executive summaries, automatically cross-checking CUJ consistency across study iterations.

Artifact Proof: ScaleCM Retail Dashboard

Figure 3.4: ScaleCM — Main KPI Scoreboard & Trend Analysis

Figure 3.5: Product Pillar Deep-Dive & Health Metrics

Figure 3.6: Resource Directory & Organizational Knowledge Base

Vision & Media Intelligence: Unified Content Orchestration

The Intent: Bridging the Vision-to-Signal Gap

The Build: From Vibe-Coded Prototype to Platform