anythingLLM和deepseek4j和milvus组合建立RAG知识库

curl -X POST http://localhost:5000/chat -H "Content-Type: application/json" -d '{"message":"如何重置密码？缺少C++运行库，需要下载Visual Studio C++，输入https://aka.ms/vs/16/release/vc_redist.x64.exe，进行下载。f.write(f"问题：{ro

编程界的小子

974人浏览 · 2025-03-04 11:42:46

编程界的小子 · 2025-03-04 11:42:46 发布

链接: https://pan.baidu.com/s/1YfNKhYNBO1t8ULuK00E5yQ?进入 milvus的管理界面可以看到 anything创建的向量库。通过网盘分享的文件：AnythingLLMDesktop.exe。1、deepseek本地化部署使用 ollama。2、安装好向量数据库 milvus。第四步 Embedding模型配置。3、安装 anythingLLM。
引用anythingLLM和deepseek4j和milvus组合建立RAG知识库_milvus_非ban必选-DeepSeek技术社区

=========================================================================

连接本地deepseek搭建知识库

1、anythingLLM

2、Page Assist(浏览器插件)

3、cherry Studio

=========================================================================

知识文档被解析后会被向量化，最终存到向量数据库中

企业级场景：优先选择Milvus、Pinecone或Zilliz Cloud，兼顾性能与扩展性

原生向量数据库（开源）

Milvus 由Zilliz开发的高性能开源数据库，支持十亿级向量检索，具备云原生架构和分布式扩展能力，适用于AI、推荐系统、计算机视觉等场景。

提供三种部署模式：轻量级库（Milvus Lite）、单机版（Standalone）和分布式版（Kubernetes集群）。

Weaviate 支持自动化向量搜索和知识图谱构建，内置自然语言处理模型，适合语义搜索和多模态数据集成。

Chroma 轻量级开源数据库，专注于机器学习场景，与LangChain等框架集成便捷，适合快速搭建RAG应用。

Vald 高扩展性开源数据库，适用于实时数据更新和大规模分布式场景。

Lancedb 基于Rust开发的开源数据库，支持高效的向量索引和混合查询（向量+结构化数据）（anythingLLM 使用）https://zhuanlan.zhihu.com/p/692637675

LanceDB 是一个用于 AI 的开源向量数据库，旨在存储、管理、查询和检索大规模多模态数据的嵌入。LanceDB 的核心是用 Rust编写的，并建立在Lance之上，Lance 是一种开源列式数据格式，专为高性能 ML 工作负载和快速随机访问而设计。

=======================================================================

deepseek4j

RAG知识库接口调用

 /**
     * RAG知识库接口
     * @param prompt
     * @return
     */
    @GetMapping(value = "/rag/chat", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
    public Flux<ChatCompletionResponse> ragchat(String prompt) {

        List<Float> floatList = embeddingClient.embed(prompt);
// anythingllm_hhy  anythingllm_test01
        SearchReq searchReq = SearchReq.builder()
                .collectionName("anythingllm_test01")
                .data(Collections.singletonList(new FloatVec(floatList)))
//                metadata  text   deepseek4j_test
                .outputFields(Collections.singletonList("metadata"))
                .topK(3)
                .build();

        SearchResp searchResp = milvusClientV2.search(searchReq);

        List<String> resultList = new ArrayList<>();
        List<List<SearchResp.SearchResult>> searchResults = searchResp.getSearchResults();
        for (List<SearchResp.SearchResult> results : searchResults) {
            System.out.println("TopK results:");
            for (SearchResp.SearchResult result : results) {
                Gson gson = new Gson();
                JsonObject jsonObject = gson.fromJson(result.getEntity().get("metadata").toString(), JsonObject.class);
                resultList.add(jsonObject.get("text").toString());
            }
        }


        ChatCompletionRequest request = ChatCompletionRequest.builder()
                // 根据渠道模型名称动态修改这个参数
                .model("deepseek-r1:32b")
                .addUserMessage(String.format("你要根据用户输入的问题：%s \n \n 参考如下内容： %s  \n\n 整理处理最终结果", prompt, resultList)).build();

        return deepSeekClient.chatFluxCompletion(request);
    }

1. 环境准备
1.1 安装Python和必要工具
安装Python：从官网下载并安装 Python 3.8+。
安装pip工具：Python安装时会自动包含pip（包管理工具）。
验证安装：
python --version # 显示Python版本
pip --version # 显示pip版本
1.2 安装深度学习库
打开终端（命令行），执行以下命令安装依赖库：
pip install torch transformers flask pandas numpy

升级pip 到版本25.0.1
python.exe -m pip install --upgrade pip

ERROR: Could not find a version that satisfies the requirement setuptools; python_version >= "3.12" (from torch) (from versions: none)
ERROR: No matching distribution found for setuptools; python_version >= "3.12"

pip install --upgrade setuptools
# 使用清华大学源
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple <package_name>
# 使用阿里云源
pip install -i https://mirrors.aliyun.com/pypi/simple <package_name>
使用阿里云下载
pip install -i https://mirrors.aliyun.com/pypi/simple setuptools
python镜像源操作
https://blog.csdn.net/qq_37723720/article/details/136400078
查看系统配置的镜像源
pip config list
永久国内镜像源设置(一劳永逸的方法)
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple

1.3 确认GPU支持（可选）
如果有NVIDIA GPU，安装CUDA驱动和PyTorch GPU版本：
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113
2. 数据准备
2.1 收集专业知识数据
将常见问题和答案整理成CSV文件（例如 qa_data.csv），格式如下：

question,answer
"如何重置密码？","请访问设置页面，点击‘忘记密码’并按提示操作。"
"产品保修期多久？","本产品保修期为2年，具体条款请查看官网。"
...
2.2 数据预处理
创建一个Python脚本 preprocess.py，将数据转换为模型训练格式：

import pandas as pd

# 读取CSV数据
df = pd.read_csv("qa_data.csv")

# 保存为文本文件（每行一个问题+答案）
with open("train.txt", "w") as f:
for index, row in df.iterrows():
f.write(f"问题：{row['question']}\n答案：{row['answer']}\n\n")
3. 模型训练
3.1 使用Hugging Face库微调模型
创建一个Python脚本 train.py：

from transformers import GPT2LMHeadModel, GPT2Tokenizer, TrainingArguments, Trainer
import torch

# 加载预训练模型和分词器
model_name = "deepseek-ai/deepseek-r1" # 替换为你的模型名称
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name)

# 加载训练数据
with open("train.txt", "r") as f:
texts = f.read().split("\n\n")

# 数据编码
inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")

# 训练参数设置
training_args = TrainingArguments(
output_dir="./results",
num_train_epochs=3,
per_device_train_batch_size=1,
save_steps=500,
logging_steps=100,
)

# 开始训练
trainer = Trainer(
model=model,
args=training_args,
train_dataset=inputs["input_ids"],
)
trainer.train()

# 保存微调后的模型
model.save_pretrained("./custom_model")
tokenizer.save_pretrained("./custom_model")
3.2 运行训练脚本

python train.py

OSError: [WinError 126] 找不到指定的模块。 Error loading "C:\Users\bibenet4\AppData\Local\Programs\Python\Python313\Lib\site-packages\torch\lib\c10.dll" or one of its dependencies.
缺少C++运行库，需要下载Visual Studio C++，输入https://aka.ms/vs/16/release/vc_redist.x64.exe，进行下载。

OSError: Incorrect path_or_model_id: 'deepseek-ai/deepseek-r1:1.5b'. Please provide either the path to a local folder or the repo_id of a model on the Hub.
Hugging Face Hub 模型名称：格式为组织名/模型名（如 deepseek-ai/deepseek-r1），不能包含 :1.5b 这样的后缀。

OSError: Can't load tokenizer for 'deepseek-ai/deepseek-r1'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'deepseek-ai/deepseek-r1' is the correct path to a directory containing all relevant files for a GPT2Tokenizer tokenizer.
https://blog.csdn.net/weixin_74009895/article/details/144658225
设置环境变量 setx HF_ENDPOINT https://hf-mirror.com 重新打开cmd执行

ImportError:
requires the protobuf library but it was not found in your environment. Checkout the instructions on the
installation page of its repo: https://github.com/protocolbuffers/protobuf/tree/master/python#installation and follow the ones
that match your environment. Please note that you may need to restart your runtime after installation.
缺包
pip install protobuf

The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
这个错误提示表明你加载的分词器（tokenizer）类型与当前代码中使用的分词器类型不匹配。具体来说，deepseek-ai/DeepSeek-R1 模型的分词器类型可能与 GPT2Tokenizer 不兼容。

ValueError: The repository for deepseek-ai/DeepSeek-R1 contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/deepseek-ai/DeepSeek-R1.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
Hugging Face 的 transformers 库默认不会加载或执行远程仓库中的自定义代码，因为这可能带来安全风险（例如恶意代码）。因此，当你尝试加载包含自定义代码的模型时，需要显式地允许执行这些代码。
加载模型和分词器时，添加 trust_remote_code=True 参数：
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)

ImportError: Loading an FP8 quantized model requires accelerate (`pip install accelerate`)

GPU 0

Intel(R) HD Graphics 630

   驱动程序版本:   31.0.101.2111
   驱动程序日期:   2022/7/19
   DirectX 版本:   12 (FL 12.1)
   物理位置：   PCI 总线 0、设备 2、功能 0

   利用率   4%
   专用 GPU 内存
   共享 GPU 内存   0.2/8.0 GB
   GPU 内存   0.2/8.0 GB
你的 GPU 是 Intel 集成显卡，不支持 CUDA，因此无法使用 PyTorch 的 GPU 加速功能。

RuntimeError: No GPU found. A GPU is needed for FP8 quantization.
FP8 量化是一种高性能的量化技术，通常用于加速深度学习模型的推理过程。由于 FP8 量化依赖于 GPU 的硬件支持（例如 NVIDIA 的 Tensor Core），因此必须在 GPU 环境中运行。
FP8 量化模型需要 GPU 支持，因此必须在 GPU 环境中运行

要不加NVIDIA GPU要不换非量化版本的模型
非量化版本的模型名称可能包含 fp16 或 no-quant 等字样

4. 部署智能客服API
4.1 创建Flask API
创建一个Python脚本 app.py：

from flask import Flask, request, jsonify
from transformers import GPT2LMHeadModel, GPT2Tokenizer

app = Flask(__name__)

# 加载微调后的模型
model = GPT2LMHeadModel.from_pretrained("./custom_model")
tokenizer = GPT2Tokenizer.from_pretrained("./custom_model")

@app.route("/chat", methods=["POST"])
def chat():
user_input = request.json.get("message")
input_ids = tokenizer.encode(f"问题：{user_input}\n答案：", return_tensors="pt")
output = model.generate(input_ids, max_length=100, num_return_sequences=1)
response = tokenizer.decode(output[0], skip_special_tokens=True)
return jsonify({"response": response})

if __name__ == "__main__":
app.run(host="0.0.0.0", port=5000)
4.2 启动API服务

python app.py
5. 测试智能客服
5.1 使用curl测试
在终端输入：

curl -X POST http://localhost:5000/chat -H "Content-Type: application/json" -d '{"message":"如何重置密码？"}'
5.2 使用Python代码测试
创建一个测试脚本 test.py：

import requests

response = requests.post("http://localhost:5000/chat", json={"message": "产品保修期多久？"})
print(response.json())
6. 集成到项目（示例：网页前端）
6.1 创建HTML页面
创建一个文件 index.html：

<!DOCTYPE html>
<html>
<head>
<title>智能客服</title>
</head>
<body>
<div id="chatbox"></div>
<input type="text" id="message" placeholder="输入问题...">
<button οnclick="sendMessage()">发送</button>