1.命令行方式的测试

python examples/cmd/run.py "Your text 1." "Your text 2."

(chattts) duyicheng@duyicheng-computer:~/gitee/ChatTTS$ python examples/cmd/run.py "中华人民共和国" "美利坚合众国"
[+0800 20241206 16:31:53] [INFO] Command | run | Starting ChatTTS commandline demo...
[+0800 20241206 16:31:53] [INFO] Command | run | Namespace(spk=None, stream=False, source='local', custom_path='', texts=['中华人民共和国', '美利坚合众国'])
[+0800 20241206 16:31:53] [INFO] Command | run | Text input: ['中华人民共和国', '美利坚合众国']
[+0800 20241206 16:31:54] [INFO] Command | run | Initializing ChatTTS...
[+0800 20241206 16:31:54] [WARN] Command | run | Package nemo_text_processing not found!
[+0800 20241206 16:31:54] [WARN] Command | run | Run: conda install -c conda-forge pynini=2.1.5 && pip install nemo_text_processing
[+0800 20241206 16:31:54] [WARN] Command | run | Package WeTextProcessing not found!
[+0800 20241206 16:31:54] [WARN] Command | run | Run: conda install -c conda-forge pynini=2.1.5 && pip install WeTextProcessing
[+0800 20241206 16:31:54] [INFO] ChatTTS | dl | checking assets...
[+0800 20241206 16:31:59] [INFO] ChatTTS | dl | all assets are already latest.
[+0800 20241206 16:31:59] [INFO] ChatTTS | core | use device cuda:0
[+0800 20241206 16:32:00] [INFO] ChatTTS | core | vocos loaded.
[+0800 20241206 16:32:00] [INFO] ChatTTS | core | dvae loaded.
[+0800 20241206 16:32:01] [INFO] ChatTTS | core | embed loaded.
[+0800 20241206 16:32:01] [INFO] ChatTTS | core | gpt loaded.
[+0800 20241206 16:32:01] [INFO] ChatTTS | core | speaker loaded.
[+0800 20241206 16:32:02] [INFO] ChatTTS | core | decoder loaded.
[+0800 20241206 16:32:02] [INFO] ChatTTS | core | tokenizer loaded.
[+0800 20241206 16:32:02] [INFO] Command | run | Models loaded successfully.
[+0800 20241206 16:32:02] [INFO] Command | run | Use speaker:
蘁淰敝欀椋槀帗澀疜獝宔媸敿樒诙抟溫砤拢亹嫃柘箛旦缌绂苭笢甬伊硍晈沣癿夀斉撻羇觌勻焢謺冓氿蟄卷粭臩凂艣別媉櫅瘍虌匱漕晟涪吓淗澋澹列賋荍篫谭橷耕涕晒峨廧愮堝怽桶糅趌孜脷朢跀珃測讉孨咦紊唘烴葤莈凣蒅浑洎秘應艢槧嚁贘怡厠嚫樵誈簚絝凋详旽荳襻溤榆擒旊筛処葃譂糊潉唈糫嬬杫剓戝瞵磟寑葺嗿娫碻緆槿碵赍痂嘫筥湫楊楮磢搗焆嚔艱幑澕襚廓畀聞丷源垪疽坢囓攦糝愉襼腍毱咪糂殶奤擕冖人湘爌聠貵戔芓覃倘反翵毈嗌呦硿燢暄伎柎篵肋腋孋炸攂三儠剽甘糧螽胺薸孤笷扱簨獬囗姕癪摨湛袄现琹瑠汘婅巆栗蒣恜椾喚旡琥奫哌婱絋舵嗔洔政祚啹尭嫠烎椪抦舽涽湎師藎北虐羊垥儕宯瀡悉坹勖忒脏柿夻稴脮赹悋坸俷啪讁枢裱猵衡翉盞僂褌捑噖劻璯袜帨竬桧莚凮卋覣殃甅箰蚋籇豎继詤瘬螐傁蕂熨汗玠勭羺詥劗柜汆峦箏芉诗廵一纺禴漖肖珃嗯砣弋沗礻藃绤謀盖觠憵枯椝慝梤暐傦煦瘁咐瑞秇峝蠹讌攬俅扔襒労氌狞苁螭浰访莵漣賵狐謂煮姊莳歇袆擽瞬业該瑍烗磔赹旦氶甌刏萪皸嫨敮徍嗱诗知槶觭蔾纣誵秺玎澘级煙崅乩誡怄楽氦炄満肬竒艕檴椯煱蝟罟巨痲浱独庆漕袻佸曓秃梣胵柘援爿螊染榄灞刐碞腱北篍哗柿啥曋獙秩蝬蓱滉跅睼坤偍唕叨縀拋搼犧塵融琹脌晪厍跽簆紛埝動夠臜奮劝栫繻貂穾緵沜暶笶怽依豆襬瑵岆舸參宓嘤挷暸审冚柚烋营娮惠毱汰租絯紈覔觻丿畍笨勎乺異卄箳旹縀㴅
[+0800 20241206 16:32:02] [INFO] Command | run | Start inference.
text:   0%|▍                                                                                                                                                                                     | 1/384(max) [00:00,  2.22it/s]We detected that you are passing `past_key_values` as a tuple of tuples. This is deprecated and will be removed in v4.47. Please convert your cache or use an appropriate `Cache` class (https://huggingface.co/docs/transformers/kv_cache#legacy-cache-format)
text:   3%|█████▏                                                                                                                                                                               | 11/384(max) [00:00, 12.91it/s]
code:   3%|█████▋                                                                                                                                                                              | 64/2048(max) [00:02, 27.00it/s]
[+0800 20241206 16:32:08] [INFO] Command | run | Inference completed.
[+0800 20241206 16:32:09] [INFO] Command | run | Audio saved to output_audio_0.mp3
[+0800 20241206 16:32:09] [INFO] Command | run | Audio saved to output_audio_1.mp3
[+0800 20241206 16:32:09] [INFO] Command | run | Audio generation successful.
[+0800 20241206 16:32:09] [INFO] Command | run | ChatTTS process finished.
(chattts) duyicheng@duyicheng-computer:~/gitee/ChatTTS$

源码

import os, sys

if sys.platform == "darwin":
    os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1"

now_dir = os.getcwd()
sys.path.append(now_dir)

from typing import Optional, List
import argparse

import numpy as np

import ChatTTS

from tools.logger import get_logger
from tools.audio import pcm_arr_to_mp3_view
from tools.normalizer.en import normalizer_en_nemo_text
from tools.normalizer.zh import normalizer_zh_tn

logger = get_logger("Command")


def save_mp3_file(wav, index):
    data = pcm_arr_to_mp3_view(wav)
    mp3_filename = f"output_audio_{index}.mp3"
    with open(mp3_filename, "wb") as f:
        f.write(data)
    logger.info(f"Audio saved to {mp3_filename}")


def load_normalizer(chat: ChatTTS.Chat):
    # try to load normalizer
    try:
        chat.normalizer.register("en", normalizer_en_nemo_text())
    except ValueError as e:
        logger.error(e)
    except BaseException:
        logger.warning("Package nemo_text_processing not found!")
        logger.warning(
            "Run: conda install -c conda-forge pynini=2.1.5 && pip install nemo_text_processing",
        )
    try:
        chat.normalizer.register("zh", normalizer_zh_tn())
    except ValueError as e:
        logger.error(e)
    except BaseException:
        logger.warning("Package WeTextProcessing not found!")
        logger.warning(
            "Run: conda install -c conda-forge pynini=2.1.5 && pip install WeTextProcessing",
        )


def main(
    texts: List[str],
    spk: Optional[str] = None,
    stream: bool = False,
    source: str = "local",
    custom_path: str = "",
):
    logger.info("Text input: %s", str(texts))

    chat = ChatTTS.Chat(get_logger("ChatTTS"))
    logger.info("Initializing ChatTTS...")
    load_normalizer(chat)

    is_load = False
    if os.path.isdir(custom_path) and source == "custom":
        is_load = chat.load(source="custom", custom_path=custom_path)
    else:
        is_load = chat.load(source=source)

    if is_load:
        logger.info("Models loaded successfully.")
    else:
        logger.error("Models load failed.")
        sys.exit(1)

    if spk is None:
        spk = chat.sample_random_speaker()
    logger.info("Use speaker:")
    print(spk)

    logger.info("Start inference.")
    wavs = chat.infer(
        texts,
        stream,
        params_infer_code=ChatTTS.Chat.InferCodeParams(
            spk_emb=spk,
        ),
    )
    logger.info("Inference completed.")
    # Save each generated wav file to a local file
    if stream:
        wavs_list = []
    for index, wav in enumerate(wavs):
        if stream:
            for i, w in enumerate(wav):
                save_mp3_file(w, (i + 1) * 1000 + index)
            wavs_list.append(wav)
        else:
            save_mp3_file(wav, index)
    if stream:
        for index, wav in enumerate(np.concatenate(wavs_list, axis=1)):
            save_mp3_file(wav, index)
    logger.info("Audio generation successful.")


if __name__ == "__main__":
    r"""
    python -m examples.cmd.run \
        --source custom --custom_path ../../models/2Noise/ChatTTS 你好喲 ":)"
    """
    logger.info("Starting ChatTTS commandline demo...")
    parser = argparse.ArgumentParser(
        description="ChatTTS Command",
        usage='[--spk xxx] [--stream] [--source ***] [--custom_path XXX] "Your text 1." " Your text 2."',
    )
    parser.add_argument(
        "--spk",
        help="Speaker (empty to sample a random one)",
        type=Optional[str],
        default=None,
    )
    parser.add_argument(
        "--stream",
        help="Use stream mode",
        action="store_true",
    )
    parser.add_argument(
        "--source",
        help="source form [ huggingface(hf download), local(ckpt save to asset dir), custom(define) ]",
        type=str,
        default="local",
    )
    parser.add_argument(
        "--custom_path",
        help="custom defined model path(include asset ckpt dir)",
        type=str,
        default="",
    )
    parser.add_argument(
        "texts",
        help="Original text",
        default=["YOUR TEXT HERE"],
        nargs=argparse.REMAINDER,
    )
    args = parser.parse_args()
    logger.info(args)
    main(args.texts, args.spk, args.stream, args.source, args.custom_path)
    logger.info("ChatTTS process finished.")

Logo

技术共进,成长同行——讯飞AI开发者社区

更多推荐