多模态大模型LLaVA的介绍、部署、推理及Lora微调

本文主要详细介绍了LLaVA模型的架构，并做了不同版本的对比。同时针对LLaVA-v1.6-Mistral-7B，做了模型的部署、推理以及Lora微调工作。最后，总结了此过程中可能遇到的常见错误，分析并给出了解决办法。

LFM3320829529

1924人浏览 · 2025-06-03 23:01:18

LFM3320829529 · 2025-06-03 23:01:18 发布

前言

LLaVA 是一个由威斯康星大学麦迪逊分校、微软研究院和哥伦比亚大学的研究人员开发的开源的、端到端训练的大型多模态模型，结合了视觉编码器和语言模型，用于通用的视觉和语言理解。该模型可以执行多种任务，包括图像描述、图像查询、图像生成和视觉问答等‌。

一、模型介绍

1.1 架构

LLaVA主要由视觉编码器、语言模型和连接器组成，如下图：

1.1.1 视觉编码器

通常使用预训练好的 CLIP 的 ViT-L/14 模型（比如clip-vit-large-patch14-336）。它的作用是将输入的图像编码成一系列有意义的视觉特征（视觉 token）。这一步负责“看懂”图片。

1.1.2 大语言模型

作为核心的“大脑”，负责语言理解和生成、推理。LLaVA 主要基于 Vicuna 构建（Vicuna 本身是基于 LLaMA 微调的对话模型），后续版本也支持其他 LLM 如 LLaMA、Mistral 等。LLaVA-1.5 开始，LLM 部分也扮演了“视觉推理”的角色。

1.1.3 连接器

这是 LLaVA 的关键创新之一。它负责将视觉编码器输出的高维视觉特征“翻译”成 LLM 能够理解的语言模型空间中的特征向量（类似于文本 token）。早期的连接器是一个简单的线性层（全连接层）（如v1版本），后来也探索了MLP（如v1.5和v1.6版本）、Gated Cross-attention等更复杂的结构。连接器需要从头开始训练，以弥合视觉和语言模态之间的鸿沟。

大致的工作流程如下图：

1.2 版本

模型版本	输入图片分辨率	视觉编码器	大语言模型	连接器	关键改进
LLaVA	224×224	CLIP-ViT-L/14（224px训练）	Vicuna-7B	单层全连接层（FC）	初代多模态对齐框架
LLaVA-v1.5	336×336	CLIP-ViT-L-336px（336px微调）	Vicuna-1.5-7B Vicuna-1.5-13B	2层MLP（增强特征投影）	更高分辨率输入、数据多样性增强
LLaVA-v1.6	336×336	CLIP-ViT-L-336px（336px微调）	Vicuna-1.5-7B Vicuna-1.5-13B Mistral-7B Nous-Hermes-2-Yi-34B	2层MLP（优化参数，提升跨模态对齐）	动态分辨率支持、多样化LLM适配、专用场景数据扩展（OCR、图表、文档理解）

模型版本

输入图片分辨率

视觉编码器

大语言模型

连接器

关键改进

LLaVA

224×224

CLIP-ViT-L/14（224px训练）

Vicuna-7B

单层全连接层（FC）

初代多模态对齐框架

LLaVA-v1.5

336×336

CLIP-ViT-L-336px（336px微调）

Vicuna-1.5-7B

Vicuna-1.5-13B

2层MLP（增强特征投影）

更高分辨率输入、数据多样性增强

LLaVA-v1.6

336×336

CLIP-ViT-L-336px（336px微调）

Vicuna-1.5-7B

Vicuna-1.5-13B

Mistral-7B

Nous-Hermes-2-Yi-34B

2层MLP（优化参数，提升跨模态对齐）

动态分辨率支持、多样化LLM适配、专用场景数据扩展（OCR、图表、文档理解）

二、模型部署

本文以LLaVA-v1.6-Mistral-7B为例，开展模型的部署、推理及微调工作。

2.1 部署

2.1.1 LLaVA代码下载

首先，从github官网下载LLaVA代码（由于论文作者的源代码没有关于v1.6的微调代码，因此我们下载arielnlee大佬写好的项目）：

方法一：

git clone https://github.com/arielnlee/LLaVA-1.6-ft.git
cd LLaVA-1.6-ft

方法二：

直接进入github官网，将代码下载到本地：
https://github.com/arielnlee/LLaVA-1.6-ft.git

2.1.2 配置环境

在Linux系统（以Ubuntu 22.04为例）中，逐条运行以下命令，创建虚拟环境，并安装依赖包：

conda create -n llava python=3.10 -y
conda activate llava
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

此时系统有可能安装的是cpu版本的pytorch。如果确认是cpu版本，执行以下命令，可以安装对应的gpu版本（建议cuda版本安装为12.1）：

conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia   #安装pytorch，此命令会安装部分cuda
conda install -c "nvidia/label/cuda-12.1.0" cuda-toolkit   #安装剩余cuda组件

如果需要训练模型，则还需安装以下依赖包：

pip install -e ".[train]"
pip install flash-attn --no-build-isolation

最后，执行以下命令，确保所下载的LLaVA为最新版本：

git pull

2.2 推理

在开始推理之前，需要下载两部分预训练权重：

（1）从hugging Face官网下载LLaVA-v1.6-Mistral-7B的预训练权重：https://huggingface.co/liuhaotian/llava-v1.6-mistral-7b/tree/main

（2）从hugging Face官网下载视觉编码器clip-vit-large-patch14-336的权重，并修改LLaVA-v1.6-Mistral-7B预训练权重文件夹中的config.json文件中的“mm_vision_tower”为clip-vit-large-patch14-336权重文件夹的路径：https://huggingface.co/openai/clip-vit-large-patch14-336/tree/main

下面正式开始模型的推理。可以运行模型自带的eval脚本，也可以自己编码。下面以我编写的eval脚本为例，说明推理的大致流程：

import torch

from llava.model import LlavaMistralForCausalLM
from llava.mm_utils import tokenizer_image_token
from llava.mm_utils import process_images
from llava.constants import DEFAULT_IMAGE_TOKEN, IMAGE_TOKEN_INDEX
from llava.conversation import conv_templates, SeparatorStyle

from transformers import AutoTokenizer
from PIL import Image

DEFAULT_IMAGE_PATCH_TOKEN = "<im_patch>"
DEFAULT_IM_START_TOKEN = "<im_start>"
DEFAULT_IM_END_TOKEN = "<im_end>"

tokenizer = None
model = None
image_processor = None

#下载模型
def load_model(load_8bit=False,  device_map="auto", device="cuda", use_flash_attn=False):
    global tokenizer, model, image_processor
    
    kwargs = {"device_map": device_map}
    
    if device != "cuda":
        kwargs['device_map'] = {"": device}
    
    if load_8bit:
        kwargs['load_in_8bit'] = True
    else:
        kwargs['torch_dtype'] = torch.float16
    
    if use_flash_attn:
        kwargs['attn_implementation'] = 'flash_attention_2'
    
    # 预训练权重的路径
    base_model_path = "/home/mubei/桌面/llava/llava-v1.6-mistral-7b-hf2"   #保存下载的LLaVA-v1.6-Mistral-7B预训练权重的文件夹
    
    # 加载分词器
    tokenizer = AutoTokenizer.from_pretrained(base_model_path)
    
    # 加载模型
    model = LlavaMistralForCausalLM.from_pretrained(
        base_model_path,
        low_cpu_mem_usage=True,
        **kwargs
    )

    # 加载image_processor
    image_processor = None
    mm_use_im_start_end = getattr(model.config, "mm_use_im_start_end", False)
    mm_use_im_patch_token = getattr(model.config, "mm_use_im_patch_token", True)
    if mm_use_im_patch_token:
        tokenizer.add_tokens([DEFAULT_IMAGE_PATCH_TOKEN], special_tokens=True)
    if mm_use_im_start_end:
        tokenizer.add_tokens([DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN], special_tokens=True)
    model.resize_token_embeddings(len(tokenizer))

    vision_tower = model.get_vision_tower()
    if not vision_tower.is_loaded:
        vision_tower.load_model(device_map=device_map)
    if device_map != 'auto':
        vision_tower.to(device=device_map, dtype=torch.float16)
    image_processor = vision_tower.image_processor
    
    return model, tokenizer, image_processor

def main():

    # 下载模型
    model, tokenizer, image_processor = load_model(load_8bit=False)
    
    # 设置图像和提示
    image = Image.open("/home/mubei/桌面/llava/LLaVA-1.6-ft-main/mllm_demo_data/3.jpg").convert('RGB')
    prompt = "请描述这张图片"

    # 创建对话模板
    conv = conv_templates["mistral_direct"].copy()
    
    # 处理提示：添加图像标记
    prompt = DEFAULT_IMAGE_TOKEN + "\n" + prompt
    
    # 将提示添加到对话
    conv.append_message(conv.roles[0], prompt)
    conv.append_message(conv.roles[1], None)
    prompt = conv.get_prompt()
    
    # 处理图像
    image_tensor = process_images([image], image_processor, model.config).to(model.device, dtype=torch.float16)
    
    # 对文本进行分词
    input_ids = tokenizer_image_token(prompt, tokenizer, IMAGE_TOKEN_INDEX, return_tensors="pt").unsqueeze(0).cuda()

    # 生成attention_mask并设置pad_token_id
    attention_mask = input_ids.ne(tokenizer.pad_token_id).int().cuda()

    # 准备停止标记
    stop_str = conv.sep if conv.sep_style != SeparatorStyle.TWO else conv.sep2

    # 模型推理
    with torch.inference_mode():
        output_ids = model.generate(
            
            input_ids,
            attention_mask=attention_mask,
            images=image_tensor,
            image_sizes=[image.size],
            do_sample = 0,
            temperature=1e-5,
            top_p=None,
            num_beams=1,
            max_new_tokens=512,
            use_cache=None,
            pad_token_id=tokenizer.pad_token_id 
        )

    #解码输出
    outputs = tokenizer.decode(output_ids[0, input_ids.shape[1]:]).strip()
    outputs = outputs.split(stop_str)[0]
    
    print(f"输入: {prompt}")
    print(f"输出: {outputs}")

if __name__ == '__main__':
    main()

运行以上代码，可以得到以下输出：

可以看出，模型的描述基本属实，展示了LLaVA-v1.6-Mistral-7B优异的视觉问答能力。

2.3 Lora微调

2.3.1 数据集准备

LLaVA-v1.6-Mistral-7B模型训练所需的数据格式如下所示：

{
  "conversations": [
    {"from": "human", "value": "<image>\nWhat is in this image?"},
    {"from": "gpt", "value": "The image shows..."}
  ]
  "images": ["image.jpg"]
}

可以看出，“conversations”包含"from"和"value"字段，“images”则包含具体的图片。因此，我们将LLaVA官方提供的mllm_demo.json数据集改写成此种格式。具体数据集如下：

[
  {
    "conversations": [
      {
        "value": "<image>Who are they?",
        "from": "human"
      },
      {
        "value": "They're Kane and Gretzka from Bayern Munich.",
        "from": "gpt"
      },
      {
        "value": "What are they doing?<image>",
        "from": "human"
      },
      {
        "value": "They are celebrating on the soccer field.",
        "from": "gpt"
      }
    ],
    "images": [
      "1.jpg",
      "1.jpg"
    ]
  },
  {
    "conversations": [
      {
        "value": "<image>Who is he?",
        "from": "human"
      },
      {
        "value": "He's Thomas Muller from Bayern Munich.",
        "from": "gpt"
      },
      {
        "value": "Why is he on the ground?",
        "from": "human"
      },
      {
        "value": "Because he's sliding on his knees to celebrate.",
        "from": "gpt"
      }
    ],
    "images": [
      "2.jpg"
    ]
  },
  {
    "conversations": [
      {
        "value": "<image>Please describe this image",
        "from": "human"
      },
      {
        "value": "Chinese astronaut Gui Haichao is giving a speech.",
        "from": "gpt"
      },
      {
        "value": "What has he accomplished?",
        "from": "human"
      },
      {
        "value": "He was appointed to be a payload specialist on Shenzhou 16 mission in June 2022, thus becoming the first Chinese civilian of Group 3 in space on 30 May 2023. He is responsible for the on-orbit operation of space science experimental payloads.",
        "from": "gpt"
      }
    ],
    "images": [
      "3.jpg"
    ]
  },
  {
    "conversations": [
      {
        "value": "<image>他们是谁？",
        "from": "human"
      },
      {
        "value": "他们是拜仁慕尼黑的凯恩和格雷茨卡。",
        "from": "gpt"
      },
      {
        "value": "他们在做什么？<image>",
        "from": "human"
      },
      {
        "value": "他们在足球场上庆祝。",
        "from": "gpt"
      }
    ],
    "images": [
      "1.jpg",
      "1.jpg"
    ]
  },
  {
    "conversations": [
      {
        "value": "<image>他是谁？",
        "from": "human"
      },
      {
        "value": "他是来自拜仁慕尼黑的托马斯·穆勒。",
        "from": "gpt"
      },
      {
        "value": "他为什么在地上？",
        "from": "human"
      },
      {
        "value": "因为他正在双膝跪地滑行庆祝。",
        "from": "gpt"
      }
    ],
    "images": [
      "2.jpg"
    ]
  },
  {
    "conversations": [
      {
        "value": "<image>请描述这张图片",
        "from": "human"
      },
      {
        "value": "中国宇航员桂海潮正在讲话。",
        "from": "gpt"
      },
      {
        "value": "他取得过哪些成就？",
        "from": "human"
      },
      {
        "value": "他于2022年6月被任命为神舟十六号任务的有效载荷专家，从而成为2023年5月30日进入太空的首位平民宇航员。他负责在轨操作空间科学实验有效载荷。",
        "from": "gpt"
      }
    ],
    "images": [
      "3.jpg"
    ]
  }
]

2.3.2 微调

在微调之前，我们需要打开finetune_lora_llava_mistral.sh文件，修改LLaVA-v1.6-Mistral-7B预训练权重的路径、视觉编码器预训练权重的路径、数据集路径及其对应的图片文件夹路径。如下：

由于arielnlee大佬已添加并完善LLaVA-v1.6-Mistral-7B的Lora微调代码，因此我们可以直接调用。在终端直接运行以下命令，即可开始Lora的微调：

bash scripts/v1_6/finetune_lora_llava_mistral.sh

以下是微调过程的输出：

2.4 权重合并

在微调结束后，我们需要将训练之后得到的权重和原始的预训练权重合并，以得到最终的、可用于推理测试的权重。

使用以下代码，可以得到合并后的权重：

# 加载原始模型
base_model = LlavaMistralForCausalLM.from_pretrained(
    base_model_path,
    low_cpu_mem_usage=True,
    **kwargs
)
    
# 加载 LoRA 适配器
model = PeftModel.from_pretrained(
    base_model,
    finetuned_model_path,
    adapter_name="default",
    config=lora_config,
    is_trainable=False
)

# 合并模型
model = model.merge_and_unload()

因此，我们将以上代码添加到推理阶段，做Lora微调后的推理测试：

import torch

from llava.model import LlavaMistralForCausalLM
from llava.mm_utils import tokenizer_image_token
from llava.mm_utils import process_images
from llava.constants import DEFAULT_IMAGE_TOKEN, IMAGE_TOKEN_INDEX
from llava.conversation import conv_templates, SeparatorStyle

from transformers import AutoTokenizer
from PIL import Image

DEFAULT_IMAGE_PATCH_TOKEN = "<im_patch>"
DEFAULT_IM_START_TOKEN = "<im_start>"
DEFAULT_IM_END_TOKEN = "<im_end>"

tokenizer = None
model = None
image_processor = None

#下载模型
def load_model(load_8bit=False,  device_map="auto", device="cuda", use_flash_attn=False):
    global tokenizer, model, image_processor
    
    kwargs = {"device_map": device_map}
    
    if device != "cuda":
        kwargs['device_map'] = {"": device}
    
    if load_8bit:
        kwargs['load_in_8bit'] = True
    else:
        kwargs['torch_dtype'] = torch.float16
    
    if use_flash_attn:
        kwargs['attn_implementation'] = 'flash_attention_2'
    
    # 预训练权重的路径
    base_model_path = "/home/mubei/桌面/llava/llava-v1.6-mistral-7b-hf2"
    # 微调后生成的权重的路径
    finetuned_model_path = "/home/mubei/桌面/llava/llava-v1.6-mistral-7b-hf2/lora/train_2025-05-31-20-11-53/checkpoint-100"
    
    # 加载分词器
    tokenizer = AutoTokenizer.from_pretrained(base_model_path)
    
    # 加载原始模型
    base_model = LlavaMistralForCausalLM.from_pretrained(
        base_model_path,
        low_cpu_mem_usage=True,
        **kwargs
    )
    
    # 加载 LoRA 适配器
    model = PeftModel.from_pretrained(
        base_model,
        finetuned_model_path,
        adapter_name="default",
        config=lora_config,
        is_trainable=False
    )

    # 合并模型
    model = model.merge_and_unload()

    # 加载image_processor
    image_processor = None
    mm_use_im_start_end = getattr(model.config, "mm_use_im_start_end", False)
    mm_use_im_patch_token = getattr(model.config, "mm_use_im_patch_token", True)
    if mm_use_im_patch_token:
        tokenizer.add_tokens([DEFAULT_IMAGE_PATCH_TOKEN], special_tokens=True)
    if mm_use_im_start_end:
        tokenizer.add_tokens([DEFAULT_IM_START_TOKEN, DEFAULT_IM_END_TOKEN], special_tokens=True)
    model.resize_token_embeddings(len(tokenizer))

    vision_tower = model.get_vision_tower()
    if not vision_tower.is_loaded:
        vision_tower.load_model(device_map=device_map)
    if device_map != 'auto':
        vision_tower.to(device=device_map, dtype=torch.float16)
    image_processor = vision_tower.image_processor
    
    return model, tokenizer, image_processor

def main():

    # 下载模型
    model, tokenizer, image_processor = load_model(load_8bit=False)
    
    # 设置图像和提示
    image = Image.open("/home/mubei/桌面/llava/LLaVA-1.6-ft-main/mllm_demo_data/3.jpg").convert('RGB')
    prompt = "请描述这张图片"

    # 创建对话模板
    conv = conv_templates["mistral_direct"].copy()
    
    # 处理提示：添加图像标记
    prompt = DEFAULT_IMAGE_TOKEN + "\n" + prompt
    
    # 将提示添加到对话
    conv.append_message(conv.roles[0], prompt)
    conv.append_message(conv.roles[1], None)
    prompt = conv.get_prompt()
    
    # 处理图像
    image_tensor = process_images([image], image_processor, model.config).to(model.device, dtype=torch.float16)
    
    # 对文本进行分词
    input_ids = tokenizer_image_token(prompt, tokenizer, IMAGE_TOKEN_INDEX, return_tensors="pt").unsqueeze(0).cuda()

    # 生成attention_mask并设置pad_token_id
    attention_mask = input_ids.ne(tokenizer.pad_token_id).int().cuda()

    # 准备停止标记
    stop_str = conv.sep if conv.sep_style != SeparatorStyle.TWO else conv.sep2

    # 模型推理
    with torch.inference_mode():
        output_ids = model.generate(
            
            input_ids,
            attention_mask=attention_mask,
            images=image_tensor,
            image_sizes=[image.size],
            do_sample = 0,
            temperature=1e-5,
            top_p=None,
            num_beams=1,
            max_new_tokens=512,
            use_cache=None,
            pad_token_id=tokenizer.pad_token_id 
        )

    #解码输出
    outputs = tokenizer.decode(output_ids[0, input_ids.shape[1]:]).strip()
    outputs = outputs.split(stop_str)[0]
    
    print(f"输入: {prompt}")
    print(f"输出: {outputs}")

if __name__ == '__main__':
    main()

三、常见报错

3.1 环境依赖包下载错误

Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 35, in <module>
        File "/tmp/pip-install-fx3x4wsk/flash-attn_7983b6cadac94c4da9daa94f9301803d/setup.py", line 171, in <module>
          _, bare_metal_version = get_cuda_bare_metal_version(CUDA_HOME)
        File "/tmp/pip-install-fx3x4wsk/flash-attn_7983b6cadac94c4da9daa94f9301803d/setup.py", line 89, in get_cuda_bare_metal_version
          raw_output = subprocess.check_output([cuda_dir + "/bin/nvcc", "-V"], universal_newlines=True)
        File "/home/mubei/software/anaconda3/envs/llava/lib/python3.10/subprocess.py", line 421, in check_output
          return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
        File "/home/mubei/software/anaconda3/envs/llava/lib/python3.10/subprocess.py", line 503, in run
          with Popen(*popenargs, **kwargs) as process:
        File "/home/mubei/software/anaconda3/envs/llava/lib/python3.10/subprocess.py", line 971, in __init__
          self._execute_child(args, executable, preexec_fn, close_fds,
        File "/home/mubei/software/anaconda3/envs/llava/lib/python3.10/subprocess.py", line 1863, in _execute_child
          raise child_exception_type(errno_num, err_msg, err_filename)
      FileNotFoundError: [Errno 2] No such file or directory: '/home/mubei/software/anaconda3/envs/llava/bin/nvcc'
      
      
      torch.__version__  = 2.1.2
      
      
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

以上错误是由于flash-attn的版本不匹配原因引起的，需要指定flash-attn的版本进行安装：

pip install flash-attn==2.7.3 --no-build-isolation

3.2 模型导入错误

Traceback (most recent call last):
  File "/home/mubei/桌面/llava/LLaVA-main/example.py", line 6, in <module>
    from llava.model import LlavaMistralForCausalLM
  File "/home/mubei/桌面/llava/LLaVA-main/llava/__init__.py", line 1, in <module>
    from .model import LlavaLlamaForCausalLM
ImportError: cannot import name 'LlavaLlamaForCausalLM' from 'llava.model' (/home/mubei/桌面/llava/LLaVA-main/llava/model/__init__.py)

以上错误出现的原因千奇百怪，我遇到的原因是deepspeed版本太低，运行以下命令，升级deepspeed，即可解决此错误：

pip install peepspeed==0.16.9

3.3 微调错误

（1）numpy版本不匹配

Traceback (most recent call last):
  File "/home/mubei/桌面/llava/LLaVA-main/example.py", line 256, in <module>
    main()
  File "/home/mubei/桌面/llava/LLaVA-main/example.py", line 219, in main
    image_tensor = process_images([image], image_processor, model.config).to(model.device, dtype=torch.float16)
  File "/home/mubei/桌面/llava/LLaVA-main/llava/mm_utils.py", line 176, in process_images
    image = process_anyres_image(image, image_processor, model_cfg.image_grid_pinpoints)
  File "/home/mubei/桌面/llava/LLaVA-main/llava/mm_utils.py", line 143, in process_anyres_image
    image_patches = [processor.preprocess(image_patch, return_tensors='pt')['pixel_values'][0]
  File "/home/mubei/桌面/llava/LLaVA-main/llava/mm_utils.py", line 143, in <listcomp>
    image_patches = [processor.preprocess(image_patch, return_tensors='pt')['pixel_values'][0]
  File "/home/mubei/software/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/models/clip/image_processing_clip.py", line 326, in preprocess
    return BatchFeature(data=data, tensor_type=return_tensors)
  File "/home/mubei/software/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/feature_extraction_utils.py", line 78, in __init__
    self.convert_to_tensors(tensor_type=tensor_type)
  File "/home/mubei/software/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/feature_extraction_utils.py", line 188, in convert_to_tensors
    raise ValueError(
ValueError: Unable to create tensor, you should probably activate padding with 'padding=True' to have batched tensors with the same length.

以上错误我的原因是numpy版本不匹配（但运行代码时系统并没有提示）。需指定numpy的版本进行安装：

pip install numpy==1.24.4

（2）scipy版本不匹配

Traceback (most recent call last):
  File "/home/mubei/software/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1364, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/home/mubei/software/anaconda3/envs/llava/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/mubei/software/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/trainer.py", line 59, in <module>
    from .data.data_collator import DataCollator, DataCollatorWithPadding, default_data_collator
  File "/home/mubei/software/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/data/__init__.py", line 26, in <module>
    from .metrics import glue_compute_metrics, xnli_compute_metrics
  File "/home/mubei/software/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/data/metrics/__init__.py", line 19, in <module>
    from scipy.stats import pearsonr, spearmanr
  File "/home/mubei/software/anaconda3/envs/llava/lib/python3.10/site-packages/scipy/stats/__init__.py", line 624, in <module>
    from ._stats_py import *
  File "/home/mubei/software/anaconda3/envs/llava/lib/python3.10/site-packages/scipy/stats/_stats_py.py", line 39, in <module>
    from scipy.spatial import distance_matrix
  File "/home/mubei/software/anaconda3/envs/llava/lib/python3.10/site-packages/scipy/spatial/__init__.py", line 116, in <module>
    from ._geometric_slerp import geometric_slerp
  File "/home/mubei/software/anaconda3/envs/llava/lib/python3.10/site-packages/scipy/spatial/_geometric_slerp.py", line 7, in <module>
    from scipy.spatial.distance import euclidean
  File "/home/mubei/software/anaconda3/envs/llava/lib/python3.10/site-packages/scipy/spatial/distance.py", line 121, in <module>
    from ..special import rel_entr
  File "/home/mubei/software/anaconda3/envs/llava/lib/python3.10/site-packages/scipy/special/__init__.py", line 826, in <module>
    from . import _basic
  File "/home/mubei/software/anaconda3/envs/llava/lib/python3.10/site-packages/scipy/special/_basic.py", line 22, in <module>
    from ._multiufuncs import (assoc_legendre_p_all,
  File "/home/mubei/software/anaconda3/envs/llava/lib/python3.10/site-packages/scipy/special/_multiufuncs.py", line 142, in <module>
    sph_legendre_p = MultiUFunc(
  File "/home/mubei/software/anaconda3/envs/llava/lib/python3.10/site-packages/scipy/special/_multiufuncs.py", line 41, in __init__
    raise ValueError("All ufuncs must have type `numpy.ufunc`."
ValueError: All ufuncs must have type `numpy.ufunc`. Received (<ufunc 'sph_legendre_p'>, <ufunc 'sph_legendre_p'>, <ufunc 'sph_legendre_p'>)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/mubei/桌面/llava/LLaVA-1.6-ft-main/llava/train/train_mem.py", line 1, in <module>
    from llava.train.train import train
  File "/home/mubei/桌面/llava/LLaVA-main/llava/train/train.py", line 32, in <module>
    from llava.train.llava_trainer import LLaVATrainer
  File "/home/mubei/桌面/llava/LLaVA-main/llava/train/llava_trainer.py", line 7, in <module>
    from transformers import Trainer
  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
  File "/home/mubei/software/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1354, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/home/mubei/software/anaconda3/envs/llava/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 1366, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.trainer because of the following error (look up to see its traceback):
All ufuncs must have type `numpy.ufunc`. Received (<ufunc 'sph_legendre_p'>, <ufunc 'sph_legendre_p'>, <ufunc 'sph_legendre_p'>)

以上错误是由于scipy版本和numpy不匹配导致的。运行以下命令，安装能相互匹配的scipy版本：