The Model Context Protocol (MCP) is having its moment. It promises to be the “USB-C for LLMs” — a standard way to connect AI models to our local files, databases, and tools.
But as I moved from building toy prototypes to an enterprise-grade Context Server in Python at FireCompass, I hit a wall.
The standard implementation of MCP relies on JSON-RPC over Stdio (standard input/output) or SSE (Server-Sent Events). While this is fantastic for rapid plugin development, it left me nervous about two critical things: Security and Type Safety.
At FireCompass, where we build autonomous AI agents that perform offensive security testing against live enterprise environments, those two concerns aren’t academic. A loose protocol contract between an LLM and a tool layer is exactly the kind of weakness our own platform is designed to find in other people’s systems. We couldn’t ship one ourselves.
The Great Protocol Swap: Why I Ditched JSON-RPC for gRPC in My Model Context Server
If you are building infrastructure for Large Language Models, you already know the golden rule: context is everything. But how you deliver that context is just as critical.
Recently, while architecting a custom Model Context Protocol (MCP) server as part of the FireCompass Research Initiative, I hit a wall with standard practices. I realized that if I wanted to build a robust, secure context server for the kind of agentic workflows we run in production, I didn’t just need a data pipe; I needed a strict contract.
So, I ripped out the JSON-RPC layer and replaced it with gRPC.
Here is why I did it, and why — if you are building for production — you might want to consider it too.
The Tale of the Tape: JSON-RPC vs. gRPC
Before we dive further, it is important to understand strictly how these two protocols handle data, because that is where the security gap lies.
A comparative view between the two protocols
In a nutshell: JSON-RPC trusts the developer to validate the data. gRPC trusts the schema to validate the data.
The Devil’s Advocate: Why Does Everyone Still Use JSON-RPC?
If gRPC is safer, why do tools like VS Code, Claude Desktop, and Zed default to JSON-RPC over Stdio?
It isn’t because they don’t value security. It’s because they value ubiquity and simplicity.
Lowest Common Denominator: To use JSON-RPC, you don’t need a protocol compiler or generated code. You just print a string to stdout. Every language, from Bash to Python, can do this.
The LSP Heritage: MCP is the spiritual successor to the Language Server Protocol (LSP), which powers your IDE’s “Go to Definition” features. LSP was built on JSON-RPC to lower the barrier for plugin developers.
For desktop tools like IDEs or local chatbots meant to run locally, JSON-RPC is fine. But at FireCompass, I am building a server that needs to stand up to complex, potentially malicious inputs — and orchestrate AI agents that themselves probe systems for weaknesses. In that environment, the “easiness” of printing text wasn’t worth the risk.
The Bottleneck: The “Type Confusion” Vulnerability
The breaking point for me was realizing how susceptible standard Python JSON handlers are to Injection Attacks via type confusion.
In a standard implementation, the server receives a JSON payload. Python’s json.loads() converts this into a dictionary. Because JSON is dynamic, an attacker (or a hallucinating LLM) can send a dictionary instead of an integer. Let us try to understand the situation through the following scenario.
Scenario: Imagine an MCP tool that looks up user details. You expect an Integer ID.
python
# VULNERABLE PYTHON HANDLER
def get_user_data(params):
# DANGER: We assume user_id is an int, but JSON allows it to be anything.
# If we pass this directly to a NoSQL DB (like Mongo):
return db.users.find_one({“id”: params[“user_id”]})
The Attack: An attacker sends this valid JSON-RPC payload:
json
{
“method”: “get_user_data”,
“params”: { “user_id”: { “$ne”: null } }
}
The Result: Instead of looking for User 105, the database receives {“id”: {“$ne”: null}}. This translates to “Find the first user where ID is not null.” The attacker instantly retrieves the admin’s account, bypassing the ID check entirely.
To fix this in JSON-RPC, you have to write defensive code (Pydantic models, manual type checks) for every single field. That kind of “guardrails sprinkled everywhere” approach is exactly what FireCompass’s own offensive security platform exploits in customer environments every day. I didn’t want to ship the same pattern in our own infrastructure.
The Solution: Schema-First Security with gRPC
To solve these issues, I realized I needed a fundamental shift in approach.
With JSON-RPC, you are constantly building barricades and guardrails after you imagine what could go wrong. You are perpetually reacting to endless possibilities of bad data.
I wanted to be proactive. Instead of reacting to issues, it is far better to implement a strict contract before communicating. This ensures the communication is unambiguous from byte one, and both parties — the AI client and the Python MCP server — are crystal clear on exactly what data format is being exchanged.
To achieve this, I turned to gRPC (Google Remote Procedure Call). By defining my MCP service in a .proto file, I instantly gained that strict contract.
The Protobuf Definition:
protobuf
syntax = “proto3”;
service MCPService {
rpc CallTool (ToolRequest) returns (ToolResponse);
}
message ToolRequest {
string tool_name = 1;
// We define strictly typed arguments
int32 user_id = 2;
}
Why This is Secure: If an attacker tries to send that malicious JSON object ({ “$ne”: null }) as the user_id:
- Rejection at the Gate: The gRPC deserializer sees that the input does not match the wire type for int32.
- Hard Failure: The request fails to deserialize before it ever reaches my Python function.
- Safety: My handler code never runs, and the database query is never attempted.
In this design:
- The LLM can request actions
- It never executes them directly
- All tools are accessed through a controlled service layer
- Tool execution is isolated from the model, reducing the risk of prompt injection, unsafe commands, or accidental privilege escalation
This separation — LLM as planner, deterministic layer as executor — mirrors the broader architectural principle FireCompass enforces across its agentic security platform: non-deterministic AI output is never allowed to act on the world without passing through a deterministic, contract-enforced gateway.
Why Not REST or GraphQL?
It wasn’t just a coin toss between JSON-RPC and gRPC. I evaluated the usual suspects, and here is why they didn’t make the cut.
Why not REST? REST is resource-oriented (“Give me the User resource”). MCP is action-oriented (“Run this tool”). Trying to shoehorn command execution into REST verbs feels clunky. Furthermore, LLM interactions are increasingly streaming-heavy, and gRPC handles bidirectional streaming natively better than REST.
Why not GraphQL? GraphQL is excellent for preventing over-fetching data, but it introduces a massive query engine overhead. I didn’t need a query language to traverse a graph; I needed a fast, secure, simple pipe to execute functions.
The Implementation
I created a custom MCP server built on gRPC that supports Hot-Reloading. You can drop a new Python script into a folder, and the server hot-swaps it into the running process instantly.
The Architecture
The system consists of three main components:
- The Protocol: A strict Protobuf definition ensuring the AI client knows exactly what inputs/outputs to expect.
- The Watchdog: A background thread monitoring the file system for changes.
- The Registry: A dynamic module loader that bypasses Python’s standard import cache to reload code on the fly.
The Data Flow
- Developer creates my_tool.py.
- watchdog detects the file_created event.
- Server uses importlib to load the module into memory.
- Server pushes a notification stream to connected clients.
- Clients refresh their tool definitions immediately.
The Protocol (the .proto file)
We define a service that supports both standard calls and a streaming endpoint for updates.
protobuf
syntax = “proto3”;
package switchblade;
service SwitchbladeService {
// Discover available tools
rpc ListTools (Empty) returns (ListToolsResponse);
// Execute a specific tool
rpc CallTool (CallToolRequest) returns (CallToolResponse);
// Stream updates (Server pushes a message when files change)
rpc WatchTools (Empty) returns (stream ToolsNotification);
}
message Empty {}
message Tool {
string name = 1;
string description = 2;
string input_schema_json = 3;
string output_schema_json = 4;
}
message ListToolsResponse {
repeated Tool tools = 1;
}
message CallToolRequest {
string tool_name = 1;
string arguments_json = 2;
}
message CallToolResponse {
string content_json = 1;
bool is_error = 2;
string error_message = 3;
}
message ToolsNotification {
string event_type = 1;
string message = 2;
}
The “Secret Sauce”: Hot-Reloading Logic
The core challenge is reloading Python code without restarting the interpreter. We achieve this using importlib.util and the watchdog library.
The Logic: Instead of import tool, we manually create a module specification from the file path. This allows us to overwrite the existing module in sys.modules whenever a file modification event occurs.
The Server Implementation (server.py)
python
import os
import grpc
import json
import importlib.util
import sys
import queue
import threading
import inspect
from concurrent import futures
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
import switchblade_pb2
import switchblade_pb2_grpc
TOOLS_DIR = “./tools”
class ToolRegistry:
def __init__(self):
self.tools = {}
self.subscribers = []
self.lock = threading.Lock()
def load_tool_file(self, filepath):
“””Dynamically loads a python module and scans for @tool decorated functions.”””
module_name = os.path.basename(filepath).replace(“.py”, “”)
# Create a spec from the file location directly
spec = importlib.util.spec_from_file_location(module_name, filepath)
if spec and spec.loader:
try:
module = importlib.util.module_from_spec(spec)
sys.modules[module_name] = module # Overwrite system cache
spec.loader.exec_module(module)
# Scan for functions decorated with our SDK
for name, obj in inspect.getmembers(module):
if inspect.isfunction(obj) and getattr(obj, “_is_switchblade_tool”, False):
meta = obj._tool_metadata
with self.lock:
self.tools[meta[“name”]] = obj
print(f”✅ Hot-Loaded: {meta[‘name’]}”)
self.notify_subscribers(f”Tool {meta[‘name’]} updated”)
except Exception as e:
print(f”❌ Load Error: {e}”)
def notify_subscribers(self, message):
“””Notify all connected clients via gRPC stream”””
for q in self.subscribers:
try:
q.put(switchblade_pb2.ToolsNotification(event_type=”UPDATED”, message=message))
except:
pass
class ToolFileHandler(FileSystemEventHandler):
“””Watches the /tools directory for changes”””
def __init__(self, registry):
self.registry = registry
def on_modified(self, event):
if event.src_path.endswith(“.py”):
self.registry.load_tool_file(event.src_path)
def on_created(self, event):
if event.src_path.endswith(“.py”):
self.registry.load_tool_file(event.src_path)
# … (Standard gRPC Service Implementation Omitted for Brevity) …
A Simple Decorator
To make writing tools easy, we create a small SDK that marks functions for the registry to pick up.
python
def tool(name, description, input_schema, output_schema=None):
def decorator(func):
func._is_switchblade_tool = True
func._tool_metadata = {
“name”: name,
“description”: description,
“input_schema”: input_schema,
“output_schema”: output_schema or {}
}
return func
return decorator
Putting it Together: An Example Tool
Because we decoupled the server from the logic, adding a tool is as simple as dropping this file into the tools/ folder.
python
import platform
import psutil
from switchblade import tool
@tool(
name=”get_system_stats”,
description=”Returns CPU and RAM usage of the host server.”,
input_schema={“type”: “object”, “properties”: {}}, # No input needed
output_schema={
“type”: “object”,
“properties”: {
“cpu_percent”: {“type”: “number”},
“ram_percent”: {“type”: “number”},
“os”: {“type”: “string”}
}
}
)
def get_system_stats(args):
return {
“cpu_percent”: psutil.cpu_percent(interval=1),
“ram_percent”: psutil.virtual_memory().percent,
“os”: platform.system()
}
The project is named Switchblade, a custom MCP server built on gRPC that supports Hot-Reloading. You can drop a new Python script into a folder, and the server hot-swaps it into the running process instantly.
Conclusion
Standard MCP is a brilliant innovation for getting the ecosystem started. But as we move from “chatting with files” to “agents executing business logic” — which is exactly the kind of work we do at FireCompass, where AI agents conduct autonomous penetration testing against enterprise targets — our infrastructure needs to mature.
The original Model Context Protocol (MCP) introduced a useful idea: let LLMs interact with external tools in a structured way. But in practice, it often blurred boundaries between reasoning, orchestration, and execution — creating safety, scalability, and maintainability issues.
This redesign adheres to one simple principle:
Treat the LLM as a planner, not an executor. Reasoning, orchestration, and tool execution should be clearly separated and connected only through strict, typed interfaces.
That principle isn’t just academic for us at FireCompass. It’s the same architectural discipline that lets our platform safely run autonomous offensive security agents inside customer environments — where every action the AI takes is gated, typed, and auditable before it ever touches a real system.
This architecture is LLM-agnostic by design. The client can use any LLM as per their requirement. In fact, the client in my project Switchblade uses a locally hosted LLM model.
I traded the ease of print() for the rigor of Protobufs, and in exchange, I got a system that is harder to break, easier to maintain, and significantly more secure.
If you are building an MCP server that touches sensitive data, it might be time to stop parsing strings and start defining schemas.
Acknowledgments
A massive thank you to Arnab Chatterjee for the invaluable support and rigorous inputs provided during the testing phases of this MCP server, and to the broader FireCompass engineering and research teams for the architectural conversations that shaped this work.
I would love to hear how others are handling context delivery and agent architecture in production and what your thoughts are on the above approach. Have you hit the limitations of standard JSON-RPC? Please share your thoughts and comments below!
About FireCompass
FireCompass is an Agentic AI platform for autonomous penetration testing and red teaming across Web, API, and infrastructure. It discovers shadow assets and web applications, safely validates what is exploitable, and connects findings into multi-stage attack paths with near-zero false positives. Unlike traditional scanners, FireCompass uncovers credential reuse, business-logic flaws, privilege escalation, and app-to-app or app-to-network lateral movement. It can operate autonomously or with expert-in-the-loop validation. FireCompass has 30+ analyst recognitions across Gartner, Forrester, and IDC, and is trusted by Fortune 1000 enterprises.
Hack Yourself Before AI Does.
See what is actually exploitable in your environment. Claim free AI pen testing credits: firecompass.com/explorer
