Define what your agent knows, which tools it can use, and how it reasons. Pick from 19+ frontier LLM models, attach MCP servers for live data access, and deploy agents that stream answers in real time — no programming required.
Every agent gets its own model assignment. Use a lightweight model for simple FAQ bots and a frontier model for complex reasoning — optimizing cost and quality across your organization.
Anthropic's most capable model for complex analysis, long-form content generation, and nuanced reasoning. Ideal for agents that need to synthesize large documents, write detailed reports, or handle multi-step logic chains where accuracy is paramount.
The best balance of intelligence and speed in the Claude family. Sonnet handles most enterprise tasks — summarization, data extraction, customer support, code review — at a fraction of Opus cost with response times under two seconds.
Anthropic's fastest and most affordable model. Deploy Haiku for high-volume, low-latency use cases like chat widgets, classification tasks, and simple Q&A agents where sub-second response time matters more than deep reasoning.
OpenAI's flagship model offering state-of-the-art performance across coding, math, and creative writing. Excels at tool use and structured output generation. A strong choice for agents that need to produce JSON, call APIs, or write production-quality code.
Optimized for instruction-following and long-context tasks with a 1M-token context window. GPT-4.1 is well-suited for agents that process lengthy documents, codebases, or conversation histories where retaining every detail matters.
OpenAI's reasoning-optimized models that think step by step before answering. Use them for agents that tackle math problems, multi-constraint planning, scientific analysis, or any task where chain-of-thought reasoning dramatically improves accuracy.
Plus GPT-4o, GPT-4o-mini, o3-mini, and more. New models are added within days of release — your agents always have access to the latest capabilities. Learn how to choose the right model →
Each agent in Orckai is defined by four building blocks: a system prompt that sets its personality and instructions, a model selection that determines its intelligence, tools that give it real-world capabilities, and an optional knowledge base for domain-specific answers.
{{variable}} syntax for dynamic personalization.Not every task requires the same level of reasoning. Orckai gives you two distinct runtime engines so you can match the execution mode to the complexity of the job — keeping things fast when they should be fast, and thorough when they need to be thorough.
An AI agent without tools is just a chatbot. Orckai lets you attach three categories of tools to any agent, turning it into an autonomous worker that can query databases, call APIs, search the web, send emails, and manipulate files — all governed by the permissions you define.
Watch your agent think in real time. Every response streams token by token via Server-Sent Events. Every conversation is logged. Every token is counted and costed.
Agent responses are delivered via Server-Sent Events (SSE) — the same protocol used by ChatGPT and Claude. Users see tokens appear in real time instead of waiting for a complete response. Streaming works across the web UI, embedded widgets, and the public API, giving every interface a responsive, conversational feel.
Every agent conversation is stored and resumable. Users can pick up where they left off, and agents maintain full context from prior turns. Conversation history is organization-scoped with row-level isolation, so multi-tenant deployments keep each team's data completely separate. Browse, search, and audit past conversations from the admin panel.
Orckai tracks input tokens, output tokens, and estimated cost for every agent execution. View per-agent, per-user, and per-organization breakdowns. Set up alerts when usage exceeds thresholds. Detailed metrics help you right-size model selections — switch an over-provisioned agent from Opus to Sonnet and see the cost impact immediately.