1
0
md/20251011.md
2025-10-15 14:54:04 +08:00

22 KiB
Raw Blame History

vllm 崩溃问题排查

问题原因

vllm在处理请求时调用了 get_structured_output_key 函数,在处理 structured_outputs 参数时,由于不满足所有的情况,导致了抛出No valid structured output parameter found异常,该异常未被 EngineCore 捕获导致引擎崩溃APIServer 发现引擎死了,自己也退出了进程。

使用抓包工具 tshark 监控 http 请求

在合适的目录下执行下面的命令tshark 需要 root 权限,请使用 root 账号执行

安装 tshark已执行完毕
sudo apt-get install tshark 
创建一个 shell 脚本
vim hook_vllm_gpt-oss-120b.sh

脚本内容如下:

#!/bin/bash

# ========================================
# 监控本地 v1/chat/completions 接口的 HTTP 请求与响应
# 仅保留最近 20 条日志(含流式响应,如 SSE
# 使用 tshark + TCP 流跟踪
# ========================================

# 配置
INTERFACE="lo"                    # 本地回环接口
PORT="8077"
ENDPOINT="/v1/chat/completions" 
LOG_FILE="/hook/chat_completions.log"
TEMP_LOG="/hook/chat_completions.tmp"
PID_FILE="/hook/hook_vllm_gpt-oss-120b.pid"

# 检查是否已运行
if [ -f "$PID_FILE" ]; then
    if ps -p $(cat "$PID_FILE") > /dev/null 2>&1; then
        echo "【错误】监控脚本已在运行 (PID: $(cat $PID_FILE))"
        exit 1
    else
        rm -f "$PID_FILE"
    fi
fi

# 记录 PID
echo $$ > "$PID_FILE"

# 清空日志
> "$LOG_FILE"

echo "✅ 开始监控 http://127.0.0.1:$PORT$ENDPOINT"
echo "📝 日志将保存到: $LOG_FILE"
echo "⏳ 仅保留最近 20 条,按 Ctrl+C 停止"

# 信号处理:清理 PID 文件
trap 'rm -f "$PID_FILE"; echo "⏹️ 监控已停止"; exit 0' SIGINT SIGTERM

# 使用 tshark 跟踪 TCP 流
sudo tshark \
  -i "$INTERFACE" \
  -f "tcp port $PORT and host 127.0.0.1" \
  -q \
  -z "follow,tcp,ascii" \
  2>/dev/null | \
stdbuf -oL awk -v endpoint="$ENDPOINT" -v log="$LOG_FILE" -v temp="$TEMP" '
BEGIN {
    RS = "\n\n"
    ORS = ""
    in_request = 0
    buffer = ""
    count = 0
}

# 分割流,识别每条 TCP 流
{
    if (match($0, /GET|POST|PUT|DELETE|HTTP/) && index($0, endpoint)) {
        # 提取时间戳tshark 输出第一行包含时间)
        if (match($0, /Following.*on port [0-9]+$/)) {
            ts = substr($0, RSTART, RLENGTH)
            gsub(/.*on/, "on", ts)
        } else {
            ts = "unknown time"
        }

        # 提取请求行和头
        split($0, lines, /\n/)
        for (i=1; i<=length(lines); i++) {
            if (lines[i] ~ /(GET|POST|PUT|DELETE)/ && index(lines[i], endpoint)) {
                request_line = lines[i]
            }
            if (lines[i] ~ /Content-Type:/ || lines[i] ~ /Authorization:/ || lines[i] ~ /User-Agent:/) {
                headers = headers "\n  " lines[i]
            }
        }

        # 提取请求体(通常在空行后)
        body = ""
        in_body = 0
        for (i=1; i<=length(lines); i++) {
            if (lines[i] == "" || lines[i] ~ /HTTP\/[0-9.]/) {
                in_body = 1
                continue
            }
            if (in_body && lines[i] !~ /(No response found|Following)/) {
                body = body lines[i] "\n"
            }
        }

        # 提取响应部分HTTP/ 开头)
        response = ""
        for (i=1; i<=length(lines); i++) {
            if (lines[i] ~ /^HTTP\// && i > 1) {
                for (j=i; j<=length(lines); j++) {
                    if (lines[j] !~ /Following/) {
                        response = response lines[j] "\n"
                    }
                }
                break
            }
        }

        # 构造日志条目
        entry = "========================================\n"
        entry = entry "🕒 " ts "\n"
        entry = entry "📤 请求: " request_line "\n"
        if (headers != "") {
            entry = entry "📎 头部:\n" headers "\n"
        }
        if (body != "") {
            entry = entry "📦 请求体:\n" body "\n"
        }
        if (response != "") {
            entry = entry "📥 响应:\n" response "\n"
        }
        entry = entry "========================================\n\n"

        # 写入日志并保留最近 20 条
        cmd = "echo -e \"" entry "\" >> " log " && tail -n 200 " log " > " temp " && mv " temp " " log
        system(cmd)

        # 重置
        headers = ""
        body = ""
        response = ""
    }
}
'

# 正常退出时清理
rm -f "$PID_FILE"
赋予执行权限
chmod +x hook_vllm_gpt-oss-120b.sh
后台运行脚本
nohup /hook/hook_vllm_gpt-oss-120b.sh > /dev/null 2>&1 &
查看请求日志
tail -f /hook/chat_completions.log
停止脚本
pkill -f hook_vllm_gpt-oss-120b.sh

函数 get_structured_output_key 实现

分支release/v0.11.0

https://github.com/vllm-project/vllm/blob/releases/v0.11.0/vllm/v1/structured_output/request.py

def get_structured_output_key(sampling_params: SamplingParams) -> StructuredOutputKey:
    params = sampling_params.structured_outputs
    assert params is not None, "params can't be None."
    if params.json is not None:
        if not isinstance(params.json, str):
            json_str = json.dumps(params.json)
        else:
            json_str = params.json
        return (StructuredOutputOptions.JSON, json_str)
    elif params.json_object:
        return (StructuredOutputOptions.JSON_OBJECT, "")
    elif params.regex is not None:
        return (StructuredOutputOptions.REGEX, params.regex)
    elif params.choice is not None:
        if not isinstance(params.choice, str):
            json_str = json.dumps(params.choice)
        else:
            json_str = params.choice
        return (StructuredOutputOptions.CHOICE, json_str)
    elif params.grammar is not None:
        return (StructuredOutputOptions.GRAMMAR, params.grammar)
    elif params.structural_tag is not None:
        return (StructuredOutputOptions.STRUCTURAL_TAG, params.structural_tag)
    else:
        raise ValueError("No valid structured output parameter found")

分支release/v0.10.2

https://github.com/vllm-project/vllm/blob/releases/v0.10.2/vllm/v1/structured_output/request.py

def get_structured_output_key(
        sampling_params: SamplingParams) -> StructuredOutputKey:    
    params = sampling_params.guided_decoding
    assert params is not None, "params can't be None."
    if params.json is not None:
        if not isinstance(params.json, str):
            json_str = json.dumps(params.json)
        else:
            json_str = params.json
        return (StructuredOutputOptions.JSON, json_str)
    elif params.json_object:
        return (StructuredOutputOptions.JSON_OBJECT, "")
    elif params.regex is not None:
        return (StructuredOutputOptions.REGEX, params.regex)
    elif params.choice is not None:
        if not isinstance(params.choice, str):
            json_str = json.dumps(params.choice)
        else:
            json_str = params.choice
        return (StructuredOutputOptions.CHOICE, json_str)
    elif params.grammar is not None:
        return (StructuredOutputOptions.GRAMMAR, params.grammar)
    elif params.structural_tag is not None:
        return (StructuredOutputOptions.STRUCTURAL_TAG, params.structural_tag)
    else:
        raise ValueError("No valid structured output parameter found")

2个版本的唯一区别仅仅是提取SamplingParams的属性不一样,其它判断逻辑都是一致的

SamplingParams

https://github.com/vllm-project/vllm/blob/releases/v0.11.0/vllm/sampling_params.py

# 其它参数已省略
class SamplingParams(
    💡初始化逻辑是没问题的默认值是None
    structured_outputs: Optional[StructuredOutputsParams] = None
    guided_decoding: Optional[GuidedDecodingParams] = None

    @staticmethod
    def from_optional(
         💡默认值也是None
        structured_outputs: Optional[StructuredOutputsParams] = None,
    ) -> "SamplingParams":
        if guided_decoding is not None:
            warnings.warn(
                "guided_decoding is deprecated. This will be removed in "
                💡官方将在 v0.12.0 废弃 guided_decoding 参数使用 structured_outputs 参数替代 v0.11.0 版本做了兼容
                "v0.12.0 or v1.0.0, which ever is soonest. Please use "
                "structured_outputs instead.",
                DeprecationWarning,
                stacklevel=2)
            structured_outputs = guided_decoding
            guided_decoding = None
        return SamplingParams(
            structured_outputs=structured_outputs,
        )

StructuredOutputOptions

class StructuredOutputOptions(enum.Enum):
    JSON = enum.auto()
    JSON_OBJECT = enum.auto()
    REGEX = enum.auto()
    GRAMMAR = enum.auto()
    CHOICE = enum.auto()
    STRUCTURAL_TAG = enum.auto()

💡只支持这6种类型每个类型都对应 structured_outputs 下面的一个不同的参数。

崩溃日志片段

(EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710] EngineCore encountered a fatal error.
(EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710] Traceback (most recent call last):
(EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710]   File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 701, in run_engine_core
(EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710]     engine_core.run_busy_loop()
(EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710]   File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 728, in run_busy_loop
(EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710]     self._process_engine_step()
(EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710]   File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 754, in _process_engine_step
(EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710]     outputs, model_executed = self.step_fn()
(EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710]   File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 283, in step
(EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710]     scheduler_output = self.scheduler.schedule()
(EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710]   File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/core/sched/scheduler.py", line 359, in schedule
(EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710]     if structured_output_req and structured_output_req.grammar:
(EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710]   File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/structured_output/request.py", line 45, in grammar
(EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710]     completed = self._check_grammar_completion()
(EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710]   File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/structured_output/request.py", line 33, in _check_grammar_completion
(EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710]     self._grammar = self._grammar.result(timeout=0.0001)
(EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710]   File "/aisoft/conda/env/vllm2/lib/python3.10/concurrent/futures/_base.py", line 458, in result
(EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710]     return self.__get_result()
(EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710]   File "/aisoft/conda/env/vllm2/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
(EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710]     raise self._exception
(EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710]   File "/aisoft/conda/env/vllm2/lib/python3.10/concurrent/futures/thread.py", line 58, in run
(EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710]     result = self.fn(*self.args, **self.kwargs)
(EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710]   File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/structured_output/__init__.py", line 128, in _async_create_grammar
(EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710]     key = request.structured_output_request.structured_output_key  # type: ignore[union-attr]
(EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710]   File "/aisoft/conda/env/vllm2/lib/python3.10/functools.py", line 981, in __get__
(EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710]     val = self.func(instance)
(EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710]   File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/structured_output/request.py", line 58, in structured_output_key
(EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710]     return get_structured_output_key(self.sampling_params)
(EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710]   File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/structured_output/request.py", line 86, in get_structured_output_key
(EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710]     raise ValueError("No valid structured output parameter found")
(EngineCore_DP0 pid=2738693) ERROR 10-10 10:43:10 [core.py:710] ValueError: No valid structured output parameter found
(EngineCore_DP0 pid=2738693) Process EngineCore_DP0:
(EngineCore_DP0 pid=2738693) Traceback (most recent call last):
(APIServer pid=2738423) ERROR 10-10 10:43:10 [async_llm.py:480] AsyncLLM output_handler failed.
(APIServer pid=2738423) ERROR 10-10 10:43:10 [async_llm.py:480] Traceback (most recent call last):
(APIServer pid=2738423) ERROR 10-10 10:43:10 [async_llm.py:480]   File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 439, in output_handler
(APIServer pid=2738423) ERROR 10-10 10:43:10 [async_llm.py:480]     outputs = await engine_core.get_output_async()
(APIServer pid=2738423) ERROR 10-10 10:43:10 [async_llm.py:480]   File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 846, in get_output_async
(APIServer pid=2738423) ERROR 10-10 10:43:10 [async_llm.py:480]     raise self._format_exception(outputs) from None
(APIServer pid=2738423) ERROR 10-10 10:43:10 [async_llm.py:480] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
(EngineCore_DP0 pid=2738693)   File "/aisoft/conda/env/vllm2/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=2738693)     self.run()
(EngineCore_DP0 pid=2738693)   File "/aisoft/conda/env/vllm2/lib/python3.10/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=2738693)     self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=2738693)   File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 712, in run_engine_core
(EngineCore_DP0 pid=2738693)     raise e
(EngineCore_DP0 pid=2738693)   File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 701, in run_engine_core
(EngineCore_DP0 pid=2738693)     engine_core.run_busy_loop()
(EngineCore_DP0 pid=2738693)   File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 728, in run_busy_loop
(EngineCore_DP0 pid=2738693)     self._process_engine_step()
(EngineCore_DP0 pid=2738693)   File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 754, in _process_engine_step
(EngineCore_DP0 pid=2738693)     outputs, model_executed = self.step_fn()
(EngineCore_DP0 pid=2738693)   File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 283, in step
(EngineCore_DP0 pid=2738693)     scheduler_output = self.scheduler.schedule()
(EngineCore_DP0 pid=2738693)   File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/core/sched/scheduler.py", line 359, in schedule
(EngineCore_DP0 pid=2738693)     if structured_output_req and structured_output_req.grammar:
(EngineCore_DP0 pid=2738693)   File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/structured_output/request.py", line 45, in grammar
(EngineCore_DP0 pid=2738693)     completed = self._check_grammar_completion()
(EngineCore_DP0 pid=2738693)   File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/structured_output/request.py", line 33, in _check_grammar_completion
(EngineCore_DP0 pid=2738693)     self._grammar = self._grammar.result(timeout=0.0001)
(EngineCore_DP0 pid=2738693)   File "/aisoft/conda/env/vllm2/lib/python3.10/concurrent/futures/_base.py", line 458, in result
(EngineCore_DP0 pid=2738693)     return self.__get_result()
(EngineCore_DP0 pid=2738693)   File "/aisoft/conda/env/vllm2/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
(EngineCore_DP0 pid=2738693)     raise self._exception
(EngineCore_DP0 pid=2738693)   File "/aisoft/conda/env/vllm2/lib/python3.10/concurrent/futures/thread.py", line 58, in run
(EngineCore_DP0 pid=2738693)     result = self.fn(*self.args, **self.kwargs)
(EngineCore_DP0 pid=2738693)   File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/structured_output/__init__.py", line 128, in _async_create_grammar
(EngineCore_DP0 pid=2738693)     key = request.structured_output_request.structured_output_key  # type: ignore[union-attr]
(EngineCore_DP0 pid=2738693)   File "/aisoft/conda/env/vllm2/lib/python3.10/functools.py", line 981, in __get__
(EngineCore_DP0 pid=2738693)     val = self.func(instance)
(EngineCore_DP0 pid=2738693)   File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/structured_output/request.py", line 58, in structured_output_key
(EngineCore_DP0 pid=2738693)     return get_structured_output_key(self.sampling_params)
(EngineCore_DP0 pid=2738693)   File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/structured_output/request.py", line 86, in get_structured_output_key
(EngineCore_DP0 pid=2738693)     raise ValueError("No valid structured output parameter found")
(EngineCore_DP0 pid=2738693) ValueError: No valid structured output parameter found
(APIServer pid=2738423) ERROR 10-10 10:43:10 [serving_chat.py:1145] Error in chat completion stream generator.
(APIServer pid=2738423) ERROR 10-10 10:43:10 [serving_chat.py:1145] Traceback (most recent call last):
(APIServer pid=2738423) ERROR 10-10 10:43:10 [serving_chat.py:1145]   File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/entrypoints/openai/serving_chat.py", line 574, in chat_completion_stream_generator
(APIServer pid=2738423) ERROR 10-10 10:43:10 [serving_chat.py:1145]     async for res in result_generator:
(APIServer pid=2738423) ERROR 10-10 10:43:10 [serving_chat.py:1145]   File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 387, in generate
(APIServer pid=2738423) ERROR 10-10 10:43:10 [serving_chat.py:1145]     out = q.get_nowait() or await q.get()
(APIServer pid=2738423) ERROR 10-10 10:43:10 [serving_chat.py:1145]   File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/engine/output_processor.py", line 59, in get
(APIServer pid=2738423) ERROR 10-10 10:43:10 [serving_chat.py:1145]     raise output
(APIServer pid=2738423) ERROR 10-10 10:43:10 [serving_chat.py:1145]   File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 439, in output_handler
(APIServer pid=2738423) ERROR 10-10 10:43:10 [serving_chat.py:1145]     outputs = await engine_core.get_output_async()
(APIServer pid=2738423) ERROR 10-10 10:43:10 [serving_chat.py:1145]   File "/aisoft/conda/env/vllm2/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 846, in get_output_async
(APIServer pid=2738423) ERROR 10-10 10:43:10 [serving_chat.py:1145]     raise self._format_exception(outputs) from None
(APIServer pid=2738423) ERROR 10-10 10:43:10 [serving_chat.py:1145] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
[rank0]:[W1010 10:43:10.666166970 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
(APIServer pid=2738423) INFO:     Shutting down
(APIServer pid=2738423) INFO:     Waiting for application shutdown.
(APIServer pid=2738423) INFO:     Application shutdown complete.
(APIServer pid=2738423) INFO:     Finished server process [2738423]