API
Chat APIs
Two compatible endpoint shapes — /v1/chat/completions for OpenAI SDKs and /v1/messages for the Anthropic SDK. Same auth, same router, same bill.
thk_live_….OpenAI-compatible · /v1/chat/completions
Drop-in compatible with the OpenAI Chat Completions schema. Auth header is Authorization: Bearer thk_….
Non-streaming
Returns the full response in a single JSON body. Best for short completions and synchronous request paths.
import requests
response = requests.post(
url="https://tokenharbor.ai/v1/chat/completions",
headers={
"Authorization": "Bearer thk_live_PASTE_YOURS_HERE",
"Content-Type": "application/json",
},
json={
"model": "tokenharbor/qwen3-max",
"messages": [{"role": "user", "content": "Hi"}],
},
)
print(response.json()["choices"][0]["message"]["content"])Invoke-RestMethod). The aliased curl is actually Invoke-WebRequest, and even curl.exe mangles the JSON quoting — that's the usual cause of a bad range specification error, or an invalid_api_key when a long key gets a stray space or line-break. Always paste your key in full, with no spaces.Streaming
Pass stream: true and the server emits Server-Sent Events so tokens arrive as they're generated.
import requests
with requests.post(
url="https://tokenharbor.ai/v1/chat/completions",
headers={
"Authorization": "Bearer thk_live_PASTE_YOURS_HERE",
"Content-Type": "application/json",
},
json={
"model": "tokenharbor/qwen3-max",
"messages": [{"role": "user", "content": "Write a haiku about the sea."}],
"stream": True,
},
stream=True,
) as r:
for line in r.iter_lines():
if line:
print(line.decode("utf-8"))Claude-compatible · /v1/messages
Anthropic Messages API shape. Point the official @anthropic-ai/sdk or anthropic Python client at Token Harbor by setting base_url (or baseURL). Auth accepts either x-api-key (Anthropic default) or Authorization: Bearer — pass your thk_live_ key in either header.
Non-streaming
Returns a single message object with the assistant's reply in content[0].text. max_tokens is required by the Anthropic schema.
from anthropic import Anthropic
client = Anthropic(
api_key="thk_live_PASTE_YOURS_HERE",
base_url="https://tokenharbor.ai",
)
message = client.messages.create(
model="tokenharbor/qwen3-max",
max_tokens=1024,
messages=[{"role": "user", "content": "Hi"}],
)
print(message.content[0].text)Streaming
The Anthropic SDK exposes a typed event stream (message_start, content_block_delta, message_stop, etc.). Our adapter emits the same events so SDK handlers keep working.
from anthropic import Anthropic
client = Anthropic(
api_key="thk_live_PASTE_YOURS_HERE",
base_url="https://tokenharbor.ai",
)
with client.messages.stream(
model="tokenharbor/qwen3-max",
max_tokens=1024,
messages=[{"role": "user", "content": "Write a haiku about the sea."}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)Prompt handling & steering
How we treat your prompt depends on the model you pick:
- A specific model id (e.g.
tokenharbor/qwen3-max,anthropic/claude-*) — pure pass-through. Yoursystemandmessagesare forwarded byte-for-byte; we never add or rewrite anything. th-orchestra(smart routing) — we add a small steering directive to improve behaviour (e.g. plan → todo → execute for agent tasks, or role-specific guidance). To turn it off and get pure routing with an untouched prompt, send the headerX-TH-Steering: off(or body"th_steering": "off").
Every response includes an X-TH-Steering-Applied header so you always know what was applied: none, off, or the role/workflow that was injected.