【Python】Edge TTS（文本转语音库）：微软新一代文本转语音引擎

文本转语音

浩水飞鸽(晴雨)

1039人浏览 · 2025-06-04 18:38:08

浩水飞鸽(晴雨) · 2025-06-04 18:38:08 发布

Edge TTS ：微软新一代文本转语音引擎(注意：Edge TTS使用需要联网)

Edge TTS 是微软基于其 Edge 浏览器内置的语音合成技术开发的免费开源库，提供业界领先的神经网络语音合成能力。

核心优势

顶级语音质量：
- 使用深度神经网络生成接近人声的语音
- 支持自然的情感表达和语调变化
- 提供媲美商业TTS服务的声音品质
丰富的语音选择：
- 支持100+种声音和40+种语言
- 中文提供多种方言和风格选择
- 包含不同年龄、性别和风格的声音角色

精细参数控制：

rate="+20%"   # 语速调整范围：-50% ~ +100%
pitch="+10Hz" # 音高调整范围：-100Hz ~ +100Hz
volume="+5%"  # 音量调整范围：-100% ~ +100%

安装与配置

# 安装核心库
pip install edge-tts

# 可选：安装播放功能依赖
pip install playsound  # 跨平台音频播放

完整使用指南

基础使用：文本转MP3

import asyncio
from edge_tts import Communicate

async def tts_conversion():
    communicate = Communicate(
        text="微软Edge TTS提供卓越的语音合成体验",
        voice="zh-CN-XiaoxiaoNeural",  # 选择声音
        rate="+10%",  # 加速10%
    )
    
    await communicate.save("output.mp3")  # 保存为MP3

asyncio.run(tts_conversion())

实时语音播放

from edge_tts import Communicate
import asyncio

async def live_tts():
    communicate = Communicate(
        text="正在实时播放合成语音",
        voice="zh-CN-YunxiNeural"
    )
    
    async for chunk in communicate.stream():
        if chunk["type"] == "audio":
            # 此处可连接音频设备实时输出
            pass
        elif chunk["type"] == "WordBoundary":
            print(f"单词边界: {chunk['offset']}ms")

asyncio.run(live_tts())

高级功能：SSML语音标记

ssml_text = """
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="zh-CN">
    <voice name="zh-CN-XiaoyiNeural">
        普通语音 <break time="500ms"/>
        <prosody rate="fast" pitch="high">加速的高音调语音</prosody>
        <prosody volume="loud">强调的重要内容!</prosody>
    </voice>
</speak>
"""

async def ssml_tts():
    communicate = Communicate(ssml=ssml_text)
    await communicate.save("ssml_output.mp3")

中文语音库推荐

声音ID	特点	适用场景
`zh-CN-XiaoxiaoNeural`	清晰女声，自然流畅	通用解说
`zh-CN-YunxiNeural`	温暖男声，带幽默感	内容创作
`zh-CN-YunyangNeural`	专业播音腔	新闻播报
`zh-CN-XiaoyiNeural`	年轻活泼女声	儿童内容
`zh-CN-liaoning-XiaobeiNeural`	东北口音	方言内容

命令行工具技巧

查看所有中文声音：

edge-tts --list-voices | grep "Chinese"

直接生成语音文件：

edge-tts --voice zh-CN-XiaoxiaoNeural --text "命令行直接生成" --write-media output.mp3

文本文件转换：

edge-tts --file input.txt --write-media output.mp3

高级应用场景

有声书生成：

with open("novel.txt", "r") as f:
    chapters = f.read().split("CHAPTER_SEPARATOR")

for i, chapter in enumerate(chapters):
    communicate = Communicate(text=chapter, voice="zh-CN-YunyangNeural")
    await communicate.save(f"chapter_{i+1}.mp3")

动态语音助手：

async def voice_response(query):
    # 根据查询内容选择不同声音
    voice = "zh-CN-XiaoxiaoNeural" if "女性" in query else "zh-CN-YunxiNeural"
    communicate = Communicate(text=generate_answer(query), voice=voice)
    
    # 流式传输到音频设备
    audio_stream = communicate.stream()
    async for chunk in audio_stream:
        if chunk["type"] == "audio":
            audio_device.play(chunk["data"])

性能优化技巧

并行处理：

async def batch_tts(texts):
    tasks = [Communicate(text=t).save(f"output_{i}.mp3") 
             for i, t in enumerate(texts)]
    await asyncio.gather(*tasks)

缓存机制：

from diskcache import Cache

cache = Cache("tts_cache")

async def cached_tts(text, voice):
    key = f"{voice}-{hash(text)}"
    if key not in cache:
        comm = Communicate(text=text, voice=voice)
        cache[key] = await comm.synthesize()
    return cache[key]

常见问题解决

Q: 遇到RuntimeError: Event loop is closed错误？

# 解决方案：使用以下事件循环管理
import nest_asyncio
nest_asyncio.apply()

Q: 如何解决代理问题？

Communicate(
    text="需要代理访问的内容",
    proxy="http://your-proxy:port"
)

Q: 长文本处理超时？

# 增加超时时间（默认60秒）
communicate = Communicate(text=long_text, timeout=180)

与其他工具集成

结合PyAudio实时输出：

import pyaudio

p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=24000, output=True)

async for chunk in communicate.stream():
    if chunk["type"] == "audio":
        stream.write(chunk["data"])

集成到Web应用：

from fastapi import FastAPI
from fastapi.responses import FileResponse

app = FastAPI()

@app.get("/tts")
async def tts_endpoint(text: str):
    communicate = Communicate(text=text, voice="zh-CN-XiaoxiaoNeural")
    await communicate.save("temp.mp3")
    return FileResponse("temp.mp3")

技术共进，成长同行——讯飞AI开发者社区

更多推荐

AI智能体—人工智能工作流与人工智能智能体：真正的区别是什么？

本文探讨了人工智能工作流与智能体的核心区别。工作流是静态、可预测的执行序列（如预处理→嵌入→搜索→总结），适合批处理作业；而智能体能动态感知环境、自主决策（感知→推理→决策循环），具备适应性和工具选择能力。关键差异在于：工作流遵循固定路径，智能体则能根据目标调整策略。架构上，工作流使用Airflow等工具，智能体依赖LangChain等框架。智能体的自主性使其更适用于动态环境，代表下一代AI应用方