Description
Summary We're building production AI agents that reason over data, call tools, coordinate across services, and produce reliable outputs inside real workflows. This is not a chatbot project. We need agents that can handle conditional branching, partial failures, human-in-the-loop escalation, and run at volume without ballooning cost. You'll work on designing and implementing agent workflows from the ground up - prompts, tools, state, orchestration, and evals. What the work looks like: - Building multi-step agents that call external tools and route conditionally based on what those tools return - Designing retry and escalation logic so failures degrade gracefully instead of silently - Structuring LLM outputs so downstream code can reason over them reliably - Keeping cost and latency under control across high-volume runs - not just single-turn correctness - Adding observability if possible. - Writing evals to catch regressions - Should be able to work on US timezone Tech stack: - TypeScript - Mastra for agent orchestration (or equivalent - explain your alternative) - Claude or OpenAI API - PostgreSQL for state and queue management You're a fit if you: - Have shipped at least one LLM agent or tool-calling workflow in a real environment - Think about agent flows as conditional graphs, not fixed prompt chains - Are strong in TypeScript for backend/API work, not just frontend - Have opinions on cost management for LLM pipelines at volume - Can design human-in-the-loop handoffs, not just the happy path - Communicate clearly about tradeoffs and what you'd do differently - Nice to have: Mastra experience, evals and regression testing for agent behavior, voice or multi-modal integrations Before you apply: This post was last updated on June 26. When you read this, write today's actual date and your local time at the very top of your proposal as Read: [date] [time]. The date above is intentionally off - we use this to confirm a real person read it. To apply, please answer: - Describe an AI agent or LLM workflow you've built. What tools did it call, and how did the flow change based on what those tools returned? - Have you used Mastra or a similar orchestration framework? If not, what would you use and why? - How do you keep LLM API costs under control when a workflow runs on hundreds or thousands of inputs?