Broke up the GPU monolith. Lower latency, lower bill, deploys that don't need a war room.

Hyperstate AI ran an AI-assisted music production platform. Creators uploaded audio and worked with a producer-style assistant (the Louis Bell persona) that kept full context across sessions. The startup ran out of funding after launch.

AI Fullstack Build 2025 Wind Down

Get in touch

01 Overview

Overview

02 What's the challenge?

What's the challenge?

A single GPU-heavy server handled audio processing, lyrics, transcription, MIDI generation, and the producer agent. Deploys were manual. The compute bill grew faster than usage. The architecture carried a demo but couldn’t carry the launch. No horizontal scaling, no failure isolation, one bad deploy took the whole product offline.

03 What call did we make?

One server doing everything wasn't going to survive launch.

Audio processing, lyrics, transcription, generation, all stacked on a single GPU box, deployed by hand. It carried the demo and folded under real load. We split it into focused services with clean responsibility boundaries, dockerised every component, and swapped the heaviest in-house libraries for lightweight, scalable hosted alternatives. Same product surface, fraction of the compute bill, deploys nobody has to babysit.

04 What We Did

What We Did

Drove the transition of GPU-heavy integrations out of the monolithic single-server setup into a distributed, microservice-oriented backend with clearly separated responsibilities (agents, audio processing, and generation services), each with its own scaling envelope. Drove the modernisation of the deployment stack from manual deploys to a dockerised, orchestrated infrastructure across every component. Drove the migration from local, self-managed, compute-heavy audio, lyrics, and transcription libraries to lightweight, scalable alternatives, cutting latency and the compute bill at the same time. On top of that: Django REST API, thirdweb-linked JWT auth, project and sample management, MIDI generation workflows, Neo4j knowledge graph, PostgreSQL. Same product surface, infrastructure that carries growth instead of fighting it.

05 Outcomes

Outcomes

GPU Monolith → Microservices

Latency & Cost Both Down

Selected Screens

06 What We Learned

What We Learned

Heavy ML work doesn’t belong in your web request path. The moment audio, lyrics, and transcription each pull a model into the same box, every load spike takes the whole product down with it. The win is boring infrastructure: separate services for separate compute profiles, orchestrated deploys, hosted alternatives for the libraries you have no business owning.

Tech Stack

Python / Django OpenAI API Neo4j PostgreSQL Docker

PromptID

PromptID is an AI-native EdTech platform for employers and universities. It examines learners by analysing the train of thought, not by rewarding memorisation. …

Open case study →

Want outcomes like this?

Tell us what you're building. We'll tell you whether we're the right team for it.

Get in touch