Qwen3 大模型实战：使用 vLLM 部署与函数调用（Function Call）全攻略

本文详细介绍了如何从零开始部署和使用Qwen3-8B大语言模型。主要内容包括：1) 使用vLLM框架进行多GPU高效部署的完整脚本和参数说明；2) 通过OpenAI兼容接口与模型交互的Python示例；3) 函数调用的实战应用，展示如何定义工具和调用外部API；4) 结合LangChain构建高级应用的方法。文章强调了Qwen3-8B在推理、工具使用方面的优势，并提供了完整的代码示例。最后展望了Q

携梦问道

2291人浏览 · 2025-07-23 14:18:29

携梦问道 · 2025-07-23 14:18:29 发布

📌 文章摘要

本文将带你从零开始，深入掌握如何使用 Qwen3-8B 大语言模型，结合 vLLM 进行高性能部署，并通过 函数调用（Function Call） 实现模型与外部工具的智能联动。我们将详细讲解部署命令、调用方式、代码示例及实际应用场景，帮助你快速构建基于 Qwen3 的智能应用。

一、Qwen3 简介与部署环境准备

Qwen3 是通义千问系列的最新一代大语言模型，具备强大的自然语言理解和生成能力，尤其在函数调用、工具使用、推理等方面表现突出。为了实现高效部署，我们推荐使用 vLLM 框架，它不仅支持多卡并行推理，还能显著提升服务响应速度。

1.1 使用 vLLM 部署 Qwen3-8B

以下是一个完整的部署脚本，适用于多GPU环境（如4张A100）：

#!/bin/bash

# 设置可见的 GPU 设备（可根据实际设备调整）
export CUDA_VISIBLE_DEVICES=0,1,2,3

# 启动 vLLM 服务
vllm serve /home/model_weight/Qwen/Qwen3-8B \
    --chat-template ./qwen3_nonthinking.jinja \
    --max_model_len 40960 \
    --served-model-name Qwen3-8B \
    --gpu_memory_utilization 0.90 \
    --max_num_seqs 1024 \
    --tensor-parallel-size 4 \
    --api_key xiyunmu \
    --host 192.168.1.1 \
    --port 9015 \
    --trust_remote_code \
    --device cuda

✅ 参数说明：

--chat-template：指定聊天模板文件，用于控制输出格式。

--max_model_len：最大模型长度，支持长上下文。

--tensor-parallel-size：指定使用的 GPU 数量。

--api_key：用于身份验证。

--host 和 --port：设置服务监听地址和端口。

a custom chat template

二、使用 OpenAI 兼容接口调用 Qwen3

部署完成后，你可以通过 OpenAI 兼容接口与 Qwen3 交互。以下是一个使用 openai 客户端调用模型的 Python 示例：

2.1 基础调用示例

from openai import OpenAI

# 初始化客户端
client = OpenAI(
    api_key="xiyunmu",
    base_url="http://192.168.1.1:9015/v1"
)

# 发送请求
response = client.chat.completions.create(
    model="Qwen3-8B",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "什么是深度学习？"}
    ],
    temperature=0.7,
    top_p=0.8,
    presence_penalty=1.5,
    extra_body={
        "top_k": 20,
        "chat_template_kwargs": {"enable_thinking": False},
    }
)

# 打印结果
print(response.choices[0].message.content)

三、函数调用（Function Call）实战：构建智能助手

函数调用是 Qwen3 的一大亮点，它允许模型根据用户需求自动调用外部工具（如天气查询、数据库访问等），从而实现更复杂的任务。

3.1 工具定义示例

def get_current_temperature(location: str, unit: str = "celsius"):
    return {
        "temperature": 26.1,
        "location": location,
        "unit": unit,
    }

def get_temperature_date(location: str, date: str, unit: str = "celsius"):
    return {
        "temperature": 25.9,
        "location": location,
        "date": date,
        "unit": unit,
    }

TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "get_current_temperature",
            "description": "获取当前城市的温度",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"},
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
                },
                "required": ["location"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_temperature_date",
            "description": "获取指定日期的温度",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"},
                    "date": {"type": "string"},
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
                },
                "required": ["location", "date"]
            }
        }
    }
]

3.2 使用 vLLM 调用函数

import json
from openai import OpenAI

# 初始化客户端
client = OpenAI(
    api_key="xiyunmu",
    base_url="http://192.168.1.1:9015/v1"
)

messages = [
    {"role": "user", "content": "What's the temperature in San Francisco now? How about tomorrow? Current Date: 2024-09-30."}
]

response = client.chat.completions.create(
    model="Qwen3-8B",
    messages=messages,
    tools=TOOLS,
    temperature=0.7,
    top_p=0.8,
    max_tokens=512,
    extra_body={
        "repetition_penalty": 1.05,
        "chat_template_kwargs": {"enable_thinking": False}
    }
)

# 解析工具调用
tool_calls = response.choices[0].message.tool_calls
for call in tool_calls:
    func_name = call.function.name
    args = json.loads(call.function.arguments)
    result = eval(func_name)(**args)
    print(f"调用函数 {func_name}，参数：{args}，结果：{result}")

四、使用 LangChain 构建高级应用

LangChain 是构建 LLM 应用的流行框架，下面是如何使用它调用 Qwen3：

from langchain.chat_models import ChatOpenAI

model = ChatOpenAI(
    base_url="http://192.168.1.1:9015/v1",
    api_key="xiyunmu",
    model="Qwen3-8B",
    temperature=0.7
)

response = model.invoke("/no_think 明天天气怎么样")
print(response)