Chapter 14: Advanced Features
Beyond the Basics
Chapters 4–6 covered MCP’s three primitives: tools, resources, and prompts. Those are the headline features. But MCP has several advanced capabilities that become essential as you build more sophisticated integrations.
This chapter covers sampling, elicitation, roots, completion, and logging—the features that turn a simple tool server into a sophisticated AI integration.
Sampling: When Servers Need an LLM
Here’s a scenario: you’re building an MCP server that summarizes web pages. The server can fetch the web page, but it needs an LLM to actually summarize it. Does the server need its own LLM API key?
No. MCP has a feature called sampling that lets servers request LLM completions from the client. The server says “I have this text, please summarize it,” and the client routes the request to whatever LLM it’s using.
How Sampling Works
Server Client LLM
│ │ │
│── sampling/ │ │
│ createMessage ──────→│ │
│ │── API call ──────────→│
│ │←── Completion ────────│
│←── Result ─────────────│ │
The server sends a sampling/createMessage request to the client:
{
"jsonrpc": "2.0",
"id": "s1",
"method": "sampling/createMessage",
"params": {
"messages": [
{
"role": "user",
"content": {
"type": "text",
"text": "Summarize this article in 2-3 sentences:\n\n[article text here]"
}
}
],
"maxTokens": 200,
"modelPreferences": {
"hints": [
{ "name": "claude-sonnet-4-5-20250929" }
],
"speedPriority": 0.5,
"costPriority": 0.8,
"intelligencePriority": 0.3
},
"systemPrompt": "You are a concise summarizer."
}
}
The client:
- Receives the request
- May show it to the user for approval (sampling gives servers indirect LLM access—this should be gated)
- Sends it to the LLM
- Returns the result
{
"jsonrpc": "2.0",
"id": "s1",
"result": {
"role": "assistant",
"content": {
"type": "text",
"text": "The article discusses the recent breakthrough in quantum computing..."
},
"model": "claude-sonnet-4-5-20250929",
"stopReason": "endTurn"
}
}
Model Preferences
The server can hint at what kind of model it wants:
{
"modelPreferences": {
"hints": [
{ "name": "claude-sonnet-4-5-20250929" },
{ "name": "claude-haiku-4-5-20251001" }
],
"speedPriority": 0.8,
"costPriority": 0.9,
"intelligencePriority": 0.3
}
}
The hints are suggestions, not demands. The client chooses the actual model. The priority fields (0.0 to 1.0) express trade-offs: this request prioritizes speed and cost over intelligence, so a smaller, faster model would be appropriate.
When to Use Sampling
- Content transformation — Summarize, translate, reformat
- Intelligent processing — Extract entities, classify data, generate descriptions
- Agentic delegation — Let the server compose multi-step operations where some steps need LLM reasoning
Security Considerations
Sampling gives servers indirect access to the LLM. A malicious server could:
- Generate harmful content through the LLM
- Use the LLM to process stolen data
- Rack up LLM API costs
Hosts SHOULD:
- Require user approval for sampling requests
- Show the server’s prompt to the user
- Set limits on token usage
- Log all sampling requests
Elicitation: Asking the User
Sometimes a server needs information from the user—not the LLM, the actual human. Maybe it needs to confirm a destructive action, choose between options, or provide credentials.
Elicitation lets servers send questions to the user through the client. The client presents the question in its UI, collects the answer, and returns it to the server.
How Elicitation Works
Server sends an elicitation request:
{
"jsonrpc": "2.0",
"id": "e1",
"method": "elicitation/create",
"params": {
"message": "Which database environment should I connect to?",
"requestedSchema": {
"type": "object",
"properties": {
"environment": {
"type": "string",
"enum": ["development", "staging", "production"],
"description": "Target environment"
}
},
"required": ["environment"]
}
}
}
The client shows this to the user (perhaps as a dropdown or radio buttons), and returns their choice:
{
"jsonrpc": "2.0",
"id": "e1",
"result": {
"action": "accept",
"content": {
"environment": "staging"
}
}
}
If the user declines:
{
"jsonrpc": "2.0",
"id": "e1",
"result": {
"action": "decline"
}
}
Use Cases
- Configuration choices — “Which project should I work with?”
- Confirmation — “This will delete 500 records. Continue?”
- Input collection — “What’s your API key for this service?”
- Disambiguation — “I found 3 matching records. Which one?”
Schema Support
The requestedSchema uses JSON Schema to define what input the server expects. This lets clients render appropriate UI:
- String → text field
- String with enum → dropdown
- Boolean → checkbox
- Number → number input
Roots: Workspace Context
Roots tell servers about the client’s workspace—what directories or repositories are relevant to the current session. This helps servers scope their operations to the right context.
How Roots Work
When a server wants to know about the workspace, it sends a roots/list request:
{
"jsonrpc": "2.0",
"id": "r1",
"method": "roots/list"
}
The client responds:
{
"jsonrpc": "2.0",
"id": "r1",
"result": {
"roots": [
{
"uri": "file:///home/user/projects/my-app",
"name": "My App"
},
{
"uri": "file:///home/user/projects/shared-lib",
"name": "Shared Library"
}
]
}
}
If the workspace changes (user opens a different folder, adds a workspace), the client sends a notification:
{
"jsonrpc": "2.0",
"method": "notifications/roots/list_changed"
}
Why Roots Matter
Without roots, a filesystem server has to guess where to look. With roots, it knows:
- What directories are relevant
- What the user is working on
- Where to scope searches and operations
A Git server uses roots to know which repositories to show. A lint server uses roots to know which files to check. A build server uses roots to know what to compile.
Completion: Autocomplete for Arguments
MCP supports autocompletion for prompt arguments and resource URI template parameters. When a user is filling in a prompt’s arguments, the client can request completions:
{
"jsonrpc": "2.0",
"id": 1,
"method": "completion/complete",
"params": {
"ref": {
"type": "ref/prompt",
"name": "query_table"
},
"argument": {
"name": "table_name",
"value": "us"
}
}
}
Response:
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"completion": {
"values": ["users", "user_sessions", "user_preferences"],
"hasMore": false,
"total": 3
}
}
}
This works for resource template parameters too:
{
"params": {
"ref": {
"type": "ref/resource",
"uri": "postgres://localhost/mydb/tables/{table}/schema"
},
"argument": {
"name": "table",
"value": "ord"
}
}
}
Implementation
Servers implement completion by providing context-aware suggestions:
@mcp.complete("query_table")
async def complete_table_name(argument: str, value: str) -> list[str]:
if argument == "table_name":
# Query actual database tables that match the prefix
conn = get_connection()
tables = conn.execute(
"SELECT name FROM sqlite_master WHERE type='table' AND name LIKE ?",
(f"{value}%",)
).fetchall()
return [row[0] for row in tables]
return []
Logging: Server Diagnostics
Servers can send structured log messages to the client. This isn’t just “print debugging”—it’s a proper logging channel that clients can filter, display, and record.
Sending Log Messages
{
"jsonrpc": "2.0",
"method": "notifications/message",
"params": {
"level": "info",
"logger": "weather-api",
"data": "Fetching weather for London (cache miss)"
}
}
Log Levels
MCP uses syslog severity levels:
| Level | Severity | Use For |
|---|---|---|
debug | Lowest | Detailed diagnostic information |
info | Low | Normal operation events |
notice | Medium | Normal but noteworthy events |
warning | Medium-High | Something unexpected, but handled |
error | High | Something failed |
critical | Very High | System component failure |
alert | Very High | Immediate action needed |
emergency | Highest | System is unusable |
Setting Log Level
Clients can control verbosity:
{
"jsonrpc": "2.0",
"id": 1,
"method": "logging/setLevel",
"params": {
"level": "warning"
}
}
After this, the server should only send warning and above. This prevents verbose debug logging from overwhelming the client.
Implementation
from mcp.server.fastmcp import FastMCP, Context
mcp = FastMCP("my-server")
@mcp.tool()
async def complex_operation(data: str, ctx: Context) -> str:
"""Perform a complex operation."""
await ctx.debug("Starting operation")
await ctx.info(f"Processing {len(data)} bytes")
try:
result = await process(data)
await ctx.info(f"Operation complete: {result.summary}")
return str(result)
except TimeoutError:
await ctx.warning("Operation timed out, retrying...")
result = await process(data, timeout=60)
return str(result)
except Exception as e:
await ctx.error(f"Operation failed: {e}")
raise
Progress Reporting: Keeping Everyone Informed
For long-running operations, servers can report progress. This was covered briefly in Chapter 3, but it’s worth a closer look at implementation.
The Flow
- Client includes a progress token in the request:
{
"method": "tools/call",
"params": {
"name": "bulk_import",
"arguments": { "file": "data.csv" },
"_meta": { "progressToken": "import-1" }
}
}
- Server sends progress notifications:
{
"method": "notifications/progress",
"params": {
"progressToken": "import-1",
"progress": 250,
"total": 1000,
"message": "Importing row 250 of 1000..."
}
}
- When done, the server returns the normal tool result.
Implementation
@mcp.tool()
async def import_data(file_path: str, ctx: Context) -> str:
"""Import a large data file."""
rows = load_csv(file_path)
total = len(rows)
for i, row in enumerate(rows):
await insert_row(row)
# Report progress every 100 rows
if i % 100 == 0:
await ctx.report_progress(i, total, f"Importing row {i} of {total}")
await ctx.report_progress(total, total, "Import complete")
return f"Successfully imported {total} rows"
Indeterminate Progress
When you don’t know the total, omit it:
{
"method": "notifications/progress",
"params": {
"progressToken": "crawl-1",
"progress": 42,
"message": "Crawled 42 pages so far..."
}
}
The client can show a spinner or counter instead of a progress bar.
Combining Advanced Features
The real power comes from combining these features. Here’s a server that uses sampling, elicitation, roots, and progress reporting together:
@mcp.tool()
async def smart_refactor(pattern: str, replacement: str, ctx: Context) -> str:
"""Find and replace across the project, with AI-powered review of each change."""
# Use roots to know where to search
# (The client provides roots during initialization)
# Find all matches
matches = find_in_project(pattern)
if not matches:
return f"No matches found for '{pattern}'"
# Elicit confirmation from the user
# (elicitation would happen through the client's UI)
await ctx.info(f"Found {len(matches)} matches for '{pattern}'")
total = len(matches)
applied = 0
for i, match in enumerate(matches):
await ctx.report_progress(i, total, f"Reviewing {match.file}:{match.line}")
# Use sampling to have the LLM review each change
# (The server asks the client's LLM to evaluate the change)
context = get_surrounding_code(match.file, match.line, radius=5)
await ctx.info(f"Reviewing change in {match.file}")
# In a real implementation, you'd use sampling here
# to ask the LLM if the replacement makes sense in context
applied += 1
return f"Applied {applied} of {total} replacements"
Tasks: Long-Running Operations (Experimental)
The 2025-11-25 spec revision introduced tasks, an experimental feature for operations that take too long for a simple request-response cycle.
The Problem Tasks Solve
Some tool calls finish in milliseconds (calculator, string manipulation). Others take minutes or hours (data pipeline, ML training, complex analysis). Without tasks, the client has to hold an open connection the entire time, and if it drops, the work is lost.
How Tasks Work
Tasks are durable state machines. Instead of waiting for a tool to finish, the server immediately returns a task ID. The client can then poll for status or wait for completion.
Step 1: Client makes a task-augmented request
{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "train_model",
"arguments": { "dataset": "training_data.csv" },
"task": {
"ttl": 3600
}
}
}
The task field with a ttl (time-to-live in seconds) tells the server to run this as a task.
Step 2: Server immediately returns a task reference
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"taskId": "task-abc123",
"status": "working",
"pollInterval": 5000,
"message": "Model training started"
}
}
The client doesn’t wait for the tool to finish. It gets a task ID and a suggested polling interval.
Step 3: Client polls for status
{
"jsonrpc": "2.0",
"id": 2,
"method": "tasks/get",
"params": { "taskId": "task-abc123" }
}
Step 4: When the task completes, fetch the result
{
"jsonrpc": "2.0",
"id": 3,
"method": "tasks/result",
"params": { "taskId": "task-abc123" }
}
Task Status Lifecycle
working → completed
working → failed
working → cancelled
working → input_required → working → ...
Tasks can also enter an input_required state, indicating the server needs more information (via elicitation) before proceeding.
Task Operations
| Method | Purpose |
|---|---|
tasks/get | Poll task status |
tasks/result | Get the final result (blocks until terminal state) |
tasks/list | List all tasks (paginated) |
tasks/cancel | Cancel a running task |
Tool-Level Task Support
Tools declare their task support in their definition:
{
"name": "train_model",
"execution": {
"taskSupport": "required"
}
}
Options: "required" (must use tasks), "optional" (tasks available but not required), "forbidden" (tasks not supported).
Tasks are still experimental and may change, but they address a real gap in the protocol for long-running operations.
Elicitation URL Mode
The 2025-11-25 spec also added URL mode to elicitation. The original form mode (described above) is great for simple questions, but it can’t handle sensitive data like passwords or OAuth flows.
URL mode lets the server direct the user to a URL for out-of-band interaction:
{
"jsonrpc": "2.0",
"id": "e2",
"method": "elicitation/create",
"params": {
"message": "Please authenticate with your identity provider",
"requestedSchema": {
"type": "url",
"url": "https://auth.example.com/login?session=abc123"
}
}
}
The client opens the URL in the user’s browser. The server is notified when the user completes the flow via a notifications/elicitation/complete notification. The sensitive data (credentials, tokens) never passes through the MCP client—it goes directly from the user’s browser to the authentication server.
This is critical for security: form mode MUST NOT be used for passwords, tokens, or other sensitive data.
Summary
MCP’s advanced features transform servers from simple tool wrappers into sophisticated AI integrations:
- Sampling — Servers can request LLM completions without their own API keys
- Elicitation — Servers can ask the human for input and confirmation (form mode for simple data, URL mode for sensitive flows)
- Roots — Servers can discover the client’s workspace context
- Completion — Servers can provide autocomplete for arguments
- Logging — Servers can send structured diagnostic messages
- Progress — Servers can report progress on long operations
- Tasks — (Experimental) Durable state machines for long-running operations that survive connection drops
These features are all optional—a simple tool server doesn’t need any of them. But as your servers grow in sophistication, they become increasingly valuable.
Next: making sure it all works.