• Home
  • Coding & Dev
  • How I Deployed Hermes Agent on Railway with Telegram (And Every Gotcha I Hit Along the Way)
How I Deployed Hermes Agent on Railway with Telegram (And Every Gotcha I Hit Along the Way)
By Tessa Kriesel profile image Tessa Kriesel
11 min read

How I Deployed Hermes Agent on Railway with Telegram (And Every Gotcha I Hit Along the Way)

10 undocumented gotchas for deploying Hermes agent on Railway with Telegram—from the wrong gateway command to zombie MCP processes to the volume path that wipes your bot's memory on every deploy.

TL;DR for the skimmers

  • Hermes is NousResearch's self-improving AI agent framework—persistent memory, skills system, multi-platform messaging, multiple LLM providers
  • You can run it as a persistent Docker service on Railway with Telegram as your interface
  • The docs don't cover Railway/Docker deployment, so I had to figure most of this out through trial, error, and log-reading
  • There are 10 specific gotchas that will wreck your deploy if you don't know about them
  • Full working files at the end of this post

Why I built this

When I found Hermes—NousResearch's self-improving agent framework—I knew I wanted it in the cloud. I didn't want to think about whether my laptop was on. I wanted to message it from my phone at any hour and have it just work.

The problem? There is essentially zero documentation for deploying Hermes to a persistent cloud service like Railway. The install docs assume you're running it locally. The Docker support exists but isn't documented in any deployment context. And when you're dealing with a gateway-based agent that uses volumes, environment variables, MCP subprocess servers, and Telegram webhooks, the failure surface is enormous.

This post is the full deployment story—every gotcha, every fix, every file—so you don't have to spend three days in Railway logs the way I did.


What Hermes actually is

Before I get into the deployment mechanics, let me quickly explain what Hermes is and why it's worth the deployment friction.

Hermes is NousResearch's AI agent framework. What makes it different from a simple chatbot or wrapper:

  • Self-improving skills system: Hermes creates skills from experience and improves them over time. You can also load external skill directories.
  • Persistent memory: MEMORY.md and USER.md build up across sessions. The agent actually remembers you.
  • Multi-platform messaging: Telegram, Discord, Slack, WhatsApp, Signal, email—you pick your interface.
  • Multi-LLM support: Anthropic, OpenRouter, Nous Portal, and more. You can configure fallback providers.
  • MCP support: Connect external tools via Model Context Protocol servers.
  • Cron jobs, kanban, session search, hooks: Full agentic infrastructure baked in.
  • Config lives in ~/.hermes/: Main files are config.yaml, .env, and SOUL.md.

Getting it running on Railway

This section covers only what's Railway-specific. For general Hermes setup—skills, SOUL.md content, memory configuration—refer to the Hermes repo. What follows is the deployment layer on top of that.

Prerequisites

Before you start:

  • Railway account with the CLI installed (npm install -g @railway/cli)
  • Railway CLI logged in: railway login
  • A Telegram bot token (create one via @BotFather)
  • Your Telegram user ID (message @userinfobot)
  • An Anthropic API key (or whichever LLM provider you're using)
  • Docker installed locally if you want to test the build before deploying

Step 1: Create your repo structure

your-agent/
├── Dockerfile
├── entrypoint.sh
├── railway.json
├── .env.example
├── .gitignore
├── AGENTS.md          # Optional — auto-injected context every session
└── hermes/
    ├── cli-config.yaml
    └── SOUL.md

The hermes/ directory is your agent's config source. The entrypoint copies these files into the Railway volume on every startup, so changes you commit and deploy take effect immediately.

Step 2: Create the core files

Create Dockerfile, entrypoint.sh, railway.json, and hermes/cli-config.yaml using the full working files later in this post. Come back to this step once you've read the gotchas—they'll explain some choices in those files that would otherwise seem arbitrary.

The one file you'll want to create now is hermes/SOUL.md. This is your agent's identity—who it is, what it knows, how it behaves. The Hermes repo covers what goes in SOUL.md. Create it before your first deploy so the agent starts with a defined persona rather than a blank slate.

Step 3: Set up the Telegram bot

  1. Message @BotFather on Telegram
  2. Send /newbot and follow the prompts
  3. Copy the token it gives you—that's TELEGRAM_BOT_TOKEN
  4. Message @userinfobot to get your numeric user ID—that's TELEGRAM_ALLOWED_USERS and TELEGRAM_HOME_CHANNEL (they're the same value for a personal DM setup)

Step 4: Initialize the Railway project

From your repo root:

# Initialize and link to Railway
railway init

# Create the service with an initial deploy
railway up --detach

# Link to the service by name
railway service "your-service-name"

Step 5: Set environment variables

railway variables set ANTHROPIC_API_KEY=sk-ant-...
railway variables set TELEGRAM_BOT_TOKEN=...
railway variables set TELEGRAM_ALLOWED_USERS=123456789
railway variables set TELEGRAM_HOME_CHANNEL=123456789
railway variables set TELEGRAM_HOME_CHANNEL_NAME="Your Name DM"
railway variables set HERMES_HUMAN_DELAY_MODE=natural
railway variables set HERMES_ACCEPT_HOOKS=1

Add any additional secrets for MCP servers you're connecting:

railway variables set NOTION_API_KEY=...
railway variables set NOUS_API_KEY=...   # if using Nous Portal fallback

Step 6: Add the persistent volume

This is the most important step and the order matters—add the volume before your first real deploy.

railway volume add --mount-path /root/.hermes

The volume must be mounted at /root/.hermes—the entire Hermes home directory, not a subdirectory. See Gotcha 2 for why this matters.

Step 7: Redeploy and verify

railway up --detach

Watch the build logs in the Railway dashboard. A successful startup looks like:

Hermes config ready. Starting gateway...
⚕ Hermes Gateway Starting...
Messaging platforms + cron scheduler

Once you see the gateway banner, send a message to your Telegram bot. It should respond.

Step 8: Set up auto-deploy on push

Railway doesn't automatically redeploy when you push to GitHub unless you connect the repo through their UI. The easiest workaround is a git post-push hook:

# Create the hook
cat > .git/hooks/post-push << 'EOF'
#!/bin/bash
echo "Deploying to Railway..."
railway up --detach
EOF

chmod +x .git/hooks/post-push

Now git push triggers a Railway deploy automatically.


The 10 gotchas

I want to share the gotchas because they are the entire reason this post exists. None of these are documented anywhere. The first six will affect everyone doing this setup. The rest depend on your configuration choices—but if you're adding MCP servers or fallback providers, you'll hit those too.


Gotcha 1: gateway start vs gateway run

The Hermes docs (and some examples floating around) reference hermes gateway start. Don't use that in Docker. It's for host machine services.

Inside a Docker container, you will get this error:

Service start is not applicable inside a Docker container.
Or run the gateway directly: hermes gateway run

Fix: In your entrypoint.sh, use exec hermes gateway run—not start.

exec hermes gateway run

The exec matters too—it replaces the shell process so tini can properly manage the process tree.


Gotcha 2: Volume mount path was wrong

This one cost me the most time. I initially mounted the Railway Volume at /root/.hermes/memories—just the memories subdirectory—thinking that was the only thing I needed to persist.

Wrong. Hermes stores a lot more than memories outside that subdirectory:

  • Gateway state
  • Session history (state.db)
  • Cron jobs
  • The home channel setup state

Every deploy wiped all of that, so the bot would come back up and immediately ask me to "set your home channel" as if it had never been configured. Completely stateless on every restart.

Fix: Mount the volume at /root/.hermes—the entire Hermes home directory.

railway volume add --mount-path /root/.hermes

Gotcha 3: Wrong MCP config key

I burned an embarrassing amount of time on this one. My cli-config.yaml had:

mcp:
  servers:
    my-tool:
      url: "https://..."

Hermes does not recognize this structure. The correct top-level key is mcp_servers:—flat, not nested under mcp:.

Fix:

mcp_servers:
  my-tool:
    url: "https://..."

No error is thrown for the wrong key—Hermes just silently ignores the MCP servers entirely. You only notice because your tools don't show up.


Gotcha 4: config.yaml vs cli-config.yaml

The sample file in the Hermes repo is named cli-config.yaml.example. Naturally, I named my file cli-config.yaml. That works for CLI usage.

But the gateway reads config.yaml at runtime. The CLI and the gateway use different config loaders, and they look for different filenames.

Fix: Write your config to both files in entrypoint.sh:

cp /app/hermes/cli-config.yaml "$HERMES_HOME/cli-config.yaml"
cp /app/hermes/cli-config.yaml "$HERMES_HOME/config.yaml"

Maintain one source file in your repo and copy it to both locations on startup. Config changes propagate correctly regardless of which loader is running.


Gotcha 5: MCP secrets not reaching Hermes

I had all my MCP secrets set as Railway environment variables. They were there in the Railway dashboard, visibly set, verified. Hermes couldn't see them.

The issue: Hermes reads secrets from ~/.hermes/.env, not directly from the system environment. My entrypoint.sh was only writing a handful of variables to that file—not the MCP secrets.

Fix: Write ALL secrets to ~/.hermes/.env in entrypoint.sh:

cat > "$HERMES_HOME/.env" << EOF
ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY:-}
TELEGRAM_BOT_TOKEN=${TELEGRAM_BOT_TOKEN:-}
TELEGRAM_ALLOWED_USERS=${TELEGRAM_ALLOWED_USERS:-}
TELEGRAM_HOME_CHANNEL=${TELEGRAM_HOME_CHANNEL:-}
TELEGRAM_HOME_CHANNEL_NAME=${TELEGRAM_HOME_CHANNEL_NAME:-}
MY_MCP_SECRET=${MY_MCP_SECRET:-}
NOTION_API_KEY=${NOTION_API_KEY:-}
HERMES_HUMAN_DELAY_MODE=${HERMES_HUMAN_DELAY_MODE:-natural}
HERMES_ACCEPT_HOOKS=${HERMES_ACCEPT_HOOKS:-1}
NOUS_API_KEY=${NOUS_API_KEY:-}
EOF

The :- pattern uses the environment variable if set, empty string if not. Keeps the file clean even if some optional secrets aren't configured yet.


Gotcha 6: Agent is completely silent after deploy

After fixing the gateway run command, the bot would start, logs looked clean, no errors—and then complete silence. Send a Telegram message, nothing comes back. No errors, no timeouts, no indication anything was wrong.

The cause: TELEGRAM_HOME_CHANNEL was not set. When Hermes doesn't know its home channel, it enters a waiting-for-setup state and ignores incoming messages entirely. It's not an error state—it's just waiting.

Fix: Set TELEGRAM_HOME_CHANNEL to your Telegram user ID as a Railway environment variable. For a personal DM bot, this is the same as your user ID. Find it by messaging @userinfobot on Telegram.

railway variables set TELEGRAM_HOME_CHANNEL=123456789

Gotcha 7: No tini = zombie MCP processes

Hermes runs MCP stdio servers as subprocesses—for example, the Notion MCP runs as npx @notionhq/notion-mcp-server. Without a proper init system as PID 1, these subprocesses become zombies when they exit. They accumulate over time and eventually cause problems.

The upstream Hermes Dockerfile uses tini explicitly for exactly this reason. I missed that detail initially.

Fix: Install tini in your Dockerfile and use it as the entrypoint. Critically, do NOT set a startCommand in railway.json—Railway's startCommand overrides the Docker ENTRYPOINT, which means tini never runs. Leave startCommand out and let Railway use the ENTRYPOINT directly.

RUN apt-get install -y tini
ENTRYPOINT ["/usr/bin/tini", "-g", "--", "/entrypoint.sh"]
{
  "build": { "builder": "DOCKERFILE" },
  "deploy": {
    "restartPolicyType": "ON_FAILURE",
    "restartPolicyMaxRetries": 10
  }
}

The -g flag means tini will send signals to the entire process group, which matters for subprocess cleanup.


Gotcha 8: stdio MCP servers fail to start in the container

If you're using stdio-based MCP servers (like the Notion MCP via npx), the package has to be downloaded on first invocation. In a Railway container, this download can fail entirely—leaving the MCP server in an infinite retry loop with no useful error message. You won't see a crash. You'll just see your MCP tools never appear and Notion never responds.

Fix: Pre-install the package in the Dockerfile so it's already there when the container starts:

RUN npm install -g @notionhq/notion-mcp-server

After this, npx finds the package locally and starts immediately. Do this for every stdio MCP server you configure.


Gotcha 9: api_max_retries too high with fallback providers

I had api_max_retries: 3 in my config. What this means in practice: if Anthropic hits an error or rate limit, Hermes retries the same provider 3 times before trying the fallback. That's potentially 3 failed requests with exponential backoff before you failover.

Fix: Set api_max_retries: 1 when using fallback providers:

agent:
  api_max_retries: 1

One failure on the primary, immediately tries the fallback.


Gotcha 10: Will the volume hide the Hermes install?

When running as root on Linux (Docker default), the Hermes install script puts code at /usr/local/lib/hermes-agent—not at ~/.hermes. So mounting a volume at /root/.hermes does NOT hide or overwrite the installed Hermes code. It only covers user config and data.

I mention this because I spent time wondering whether my volume mount was somehow replacing the install. You don't need to reinstall Hermes on every container start. The install is baked into the image. The volume handles your data only.


The full working setup

Dockerfile

FROM python:3.13-slim

RUN apt-get update && apt-get install -y --no-install-recommends \
    curl bash git nodejs npm build-essential tini \
    && rm -rf /var/lib/apt/lists/*

# Pre-install any stdio MCP servers to avoid cold-start downloads
RUN npm install -g @notionhq/notion-mcp-server

# Install Hermes — pinned to a specific commit for reproducibility
# Update this SHA when you want to upgrade Hermes
ARG HERMES_COMMIT=c23a87bc163b188abc7e40fbdccf07a9739231c3
RUN curl -fsSL "https://raw.githubusercontent.com/NousResearch/hermes-agent/${HERMES_COMMIT}/scripts/install.sh" \
    | bash -s -- --skip-setup

ENV PATH="/root/.local/bin:${PATH}"

WORKDIR /app
COPY . .

# Sanity check — verify install succeeded
RUN hermes --version

COPY entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh

EXPOSE 8443

# tini as PID 1 to handle zombie subprocess cleanup
ENTRYPOINT ["/usr/bin/tini", "-g", "--", "/entrypoint.sh"]

entrypoint.sh

#!/bin/bash
set -euo pipefail

HERMES_HOME="${HERMES_HOME:-$HOME/.hermes}"

# Create all directories Hermes expects (volume may be empty on first deploy)
mkdir -p "$HERMES_HOME/memories" \
         "$HERMES_HOME/skills" \
         "$HERMES_HOME/sessions" \
         "$HERMES_HOME/cron" \
         "$HERMES_HOME/cron/output" \
         "$HERMES_HOME/hooks" \
         "$HERMES_HOME/logs"

# Write config to BOTH filenames — CLI uses cli-config.yaml, gateway uses config.yaml
cp /app/hermes/cli-config.yaml "$HERMES_HOME/cli-config.yaml"
cp /app/hermes/cli-config.yaml "$HERMES_HOME/config.yaml"

# Write SOUL.md — agent identity/persona
cp /app/hermes/SOUL.md "$HERMES_HOME/SOUL.md"

# Write ALL secrets to ~/.hermes/.env — Hermes reads from here, not system env
cat > "$HERMES_HOME/.env" << EOF
ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY:-}
TELEGRAM_BOT_TOKEN=${TELEGRAM_BOT_TOKEN:-}
TELEGRAM_ALLOWED_USERS=${TELEGRAM_ALLOWED_USERS:-}
TELEGRAM_HOME_CHANNEL=${TELEGRAM_HOME_CHANNEL:-}
TELEGRAM_HOME_CHANNEL_NAME=${TELEGRAM_HOME_CHANNEL_NAME:-}
NOTION_API_KEY=${NOTION_API_KEY:-}
HERMES_HUMAN_DELAY_MODE=${HERMES_HUMAN_DELAY_MODE:-natural}
HERMES_ACCEPT_HOOKS=${HERMES_ACCEPT_HOOKS:-1}
NOUS_API_KEY=${NOUS_API_KEY:-}
EOF
chmod 600 "$HERMES_HOME/.env"

echo "Hermes config ready. Starting gateway..."
exec hermes gateway run

hermes/cli-config.yaml

model:
  default: "anthropic/claude-sonnet-4-6"
  provider: "anthropic"

fallback_providers:
  - provider: "nous-api"
    model: "hermes-4-405B"
  - provider: "nous-api"
    model: "hermes-4-70B"

# Note: model names vary by Nous Portal plan. Verify yours at portal.nousresearch.com
# before deploying — a wrong model name causes silent fallback failures.

terminal:
  backend: "local"
  timeout: 180

memory:
  memory_enabled: true
  user_profile_enabled: true
  memory_char_limit: 4000
  user_char_limit: 2000
  nudge_interval: 10
  flush_min_turns: 6

session_reset:
  mode: both
  idle_minutes: 1440
  at_hour: 4

group_sessions_per_user: true

agent:
  max_turns: 60
  reasoning_effort: "medium"
  api_max_retries: 1

tool_loop_guardrails:
  warnings_enabled: true
  hard_stop_enabled: true

platform_toolsets:
  cli: [hermes-cli]
  telegram: [hermes-telegram, session_search]

# IMPORTANT: top-level key is mcp_servers, NOT mcp.servers
mcp_servers:
  notion:
    command: "npx"
    args: ["-y", "@notionhq/notion-mcp-server"]
    env:
      NOTION_API_KEY: "${NOTION_API_KEY}"
  # Add HTTP MCP servers the same way — secrets go in Railway vars and .env
  # my-http-tool:
  #   url: "https://your-mcp-endpoint.com/mcp"
  #   headers:
  #     Authorization: "Bearer ${MY_MCP_SECRET}"

railway.json

{
  "$schema": "https://railway.app/railway.schema.json",
  "build": {
    "builder": "DOCKERFILE"
  },
  "deploy": {
    "restartPolicyType": "ON_FAILURE",
    "restartPolicyMaxRetries": 10
  }
}

No startCommand here intentionally. Railway's startCommand overrides Docker's ENTRYPOINT—which means tini never runs if you add one. Leave it out and Railway uses the ENTRYPOINT from the Dockerfile.


Railway setup walkthrough

# 1. Initialize Railway project
railway init

# 2. Initial deploy to create the service
railway up --detach

# 3. Link to the service
railway service "your-service-name"

# 4. Set environment variables
railway variables set ANTHROPIC_API_KEY=sk-ant-...
railway variables set TELEGRAM_BOT_TOKEN=...
railway variables set TELEGRAM_ALLOWED_USERS=123456789
railway variables set TELEGRAM_HOME_CHANNEL=123456789
railway variables set TELEGRAM_HOME_CHANNEL_NAME="Your Name DM"
railway variables set HERMES_HUMAN_DELAY_MODE=natural
railway variables set HERMES_ACCEPT_HOOKS=1
railway variables set NOUS_API_KEY=...
railway variables set NOTION_API_KEY=...

# Optional: webhook mode is more efficient than polling for hosted deployments
# railway variables set TELEGRAM_WEBHOOK_URL=https://your-service.up.railway.app/telegram
# railway variables set TELEGRAM_WEBHOOK_SECRET=$(openssl rand -hex 32)

# 5. Add persistent volume — MUST be /root/.hermes, not a subdirectory
railway volume add --mount-path /root/.hermes

# 6. Redeploy to pick up the volume
railway up --detach

Getting your Telegram IDs:

  • Create a bot via @BotFatherTELEGRAM_BOT_TOKEN
  • Message @userinfobot → your user ID for TELEGRAM_ALLOWED_USERS and TELEGRAM_HOME_CHANNEL
  • For a personal DM setup, both values are the same

Auto-deploy on push:

# .git/hooks/post-push
#!/bin/bash
railway up --detach
chmod +x .git/hooks/post-push

What I'd do differently

Start with the volume at the right path from day one. The /root/.hermes vs /root/.hermes/memories mistake wasted the most time because the symptom—bot asking to set home channel—looked like a configuration problem, not a persistence problem.

Verify MCP tools are loading before assuming they work. Ask Hermes directly: "What MCP tools do you have available?" If it lists nothing, you've got a config key problem or a secrets problem. Don't assume silence means they're loading quietly.

Set TELEGRAM_HOME_CHANNEL before your first deploy. The silent-bot symptom with no logs is deeply confusing. Just have this set from the start.


Quick reference: the 10 gotchas

# Problem Fix
1 gateway start fails in Docker Use hermes gateway run
2 Bot resets on every deploy Mount volume at /root/.hermes, not a subdirectory
3 MCP servers silently ignored Config key is mcp_servers:, not mcp.servers:
4 Gateway ignores your config Copy config to both cli-config.yaml AND config.yaml
5 MCP secrets not available Write all secrets to ~/.hermes/.env in entrypoint
6 Bot starts but never responds Set TELEGRAM_HOME_CHANNEL env var
7 Zombie subprocess accumulation Install and use tini as PID 1
8 MCP stdio server cold-start delay Pre-install packages in Dockerfile
9 Slow LLM failover Set api_max_retries: 1 when using fallback providers
10 Will volume hide the Hermes install? No—install is at system path, volume is for data only

Written by Tessa's agent.

By Tessa Kriesel profile image Tessa Kriesel
Updated on
Coding & Dev