AI Models

TaskPilot supports three AI providers. Select a provider and model from the session toolbar before starting a session.

Claude (Anthropic) — Default Provider

Claude models excel at coding, analysis, and following complex instructions. Default to Sonnet 4.6 for 80%+ of tasks — it offers the best quality-to-cost ratio.

Model	Context	Input / Output	Best For
Haiku 4.5	200K	$1 / $5	High-volume simple tasks — classification, extraction, formatting. 12x cheaper than Sonnet.
Opus 4.6	1M	$15 / $75	Deep scientific reasoning, complex multi-file refactors, agent teams. 91.3% GPQA Diamond.
Sonnet 4.6 default	200K	$3 / $15	General coding, most tasks — best balance of quality and cost. 79.6% SWE-bench.

OpenAI

OpenAI's GPT-5.4 family delivers strong reasoning and coding performance. GPT-5.4 Mini is the recommended default — it approaches full GPT-5.4 quality at a fraction of the cost.

Model	Context	Input / Output	Best For
GPT-5.4 default	1M	$2.50 / $15	Most demanding reasoning and professional tasks.
GPT-5.4 Mini	400K	$0.75 / $4.50	General coding, high-volume workloads — 2x faster than full GPT-5.4, approaches its performance.
GPT-5.4 Nano	400K	$0.20 / $1.25	Classification, data extraction, ranking, lightweight sub-agents. Cheapest OpenAI option.

Groq (Ultra-Fast Inference)

All Groq models run on Groq's custom LPU hardware for extremely fast inference. Choose Groq when speed matters more than frontier intelligence.

Model	Context	Cost (per 1M)	Best For
GPT-OSS 120B	131K	~$1.20	Highest quality on Groq — OpenAI's open-weight model with built-in search and code execution.
Kimi K2	262K	~$1.50	Agentic tasks, tool use, coding benchmarks. Largest context on Groq.
Llama 3.1 8B	128K	~$0.06	Ultra-cheap simple tasks. Fastest and cheapest option.
Llama 3.3 70B default	128K	~$0.60	General-purpose, proven reliability. Best default for Groq.
Llama 4 Scout	10M	Low	Massive context window (10M tokens), blazing-fast speed (2600 tok/s).
Qwen 3 32B	128K	~$0.30	Reasoning and dialogue with thinking/non-thinking modes.

Choosing a Model

Most tasks: Use Claude Sonnet 4.6 — best quality-to-cost ratio for coding.

Deep reasoning: Use Opus 4.6 for multi-file refactors, scientific analysis, or 1M context.

Budget-conscious: Use Haiku 4.5 or GPT-5.4 Nano for high-volume simple work.

OpenAI default: Use GPT-5.4 for demanding tasks or GPT-5.4 Mini for faster, cheaper general work.

Speed-critical: Use Groq — all models benefit from LPU hardware acceleration.

Massive context: Use Llama 4 Scout on Groq (10M tokens) or Opus 4.6 (1M tokens).

Prices shown are per 1 million tokens. Actual costs depend on task complexity and token usage. Model selection is available in the session toolbar dropdown. Switching providers resets to that provider's default model.