ragflow——一个非常强大的开源RAG引擎
写了一篇文章,介绍开源RAG引擎RAGFlow的核心功能和部署方式。ragflow其具备强大的文档理解能力,支持多种文件类型,能精准提取信息并提供有据可依的回答。详细讲解了部署流程、AI助手创建、API集成、多模态支持及实际应用示例,是一份实用的RAG系统搭建指南。
ragflow——一个非常强大的开源RAG引擎
阅读原文
建议阅读原文,始终查看最新文档版本,获得最佳阅读体验:《ragflow——一个非常强大的开源RAG引擎》
什么是ragflow
RAGFlow 是一个基于深度文档理解的开源 RAG(检索增强生成)引擎。它为各种规模的企业提供了一套简化的 RAG 工作流程,结合大语言模型(LLM),能够提供基于多种复杂格式数据的、具有可靠引用依据的真实问答能力。
🌟 核心特性:
🍭 “质量输入,质量输出” 基于深度文档理解,从复杂格式的非结构化数据中提取高质量知识。 能够在数量几乎无限的文本中精准定位所需信息(“在数据 haystack 中找 needle”)。
🍱 模板化分块处理 智能且可解释。 提供丰富的模板选项供选择。
🌱 有据可依的引用,减少幻觉 支持文本分块可视化,便于人工干预。 可快速查看关键参考资料,并提供可追溯的引用来源,确保回答准确可靠。
🍔 兼容多种异构数据源 支持 Word、PPT、Excel、TXT、图片、扫描件、结构化数据、网页等多种格式。
🛀 自动化、零负担的 RAG 工作流 为个人用户和大型企业量身打造的简化版 RAG 编排流程。 支持配置各类大语言模型(LLM)和嵌入模型。 多路召回 + 融合重排序。 提供直观的 API,便于无缝集成到业务系统中。
部署(docker compose)
docker部署
写作本文时,最新版本是0.19.0,建议始终安装最新版本的ragflow
git clone https://github.com/infiniflow/ragflow.git
cd ragflow/docker
git checkout -f v0.19.0
编辑.env文件(位于ragflow/docker目录)
主要更改的是image(下面的第87行),因为默认从docker hub拉取镜像,容易出错
# The type of doc engine to use.
# Available options:
# - `elasticsearch` (default)
# - `infinity` (https://github.com/infiniflow/infinity)
DOC_ENGINE=${DOC_ENGINE:-elasticsearch}
# ------------------------------
# docker env var for specifying vector db type at startup
# (based on the vector db type, the corresponding docker
# compose profile will be used)
# ------------------------------
COMPOSE_PROFILES=${DOC_ENGINE}
# The version of Elasticsearch.
STACK_VERSION=8.11.3
# The hostname where the Elasticsearch service is exposed
ES_HOST=es01
# The port used to expose the Elasticsearch service to the host machine,
# allowing EXTERNAL access to the service running inside the Docker container.
ES_PORT=1200
# The password for Elasticsearch.
ELASTIC_PASSWORD=infini_rag_flow
# The port used to expose the Kibana service to the host machine,
# allowing EXTERNAL access to the service running inside the Docker container.
KIBANA_PORT=6601
KIBANA_USER=rag_flow
KIBANA_PASSWORD=infini_rag_flow
# The maximum amount of the memory, in bytes, that a specific Docker container can use while running.
# Update it according to the available memory in the host machine.
MEM_LIMIT=8073741824
# The hostname where the Infinity service is exposed
INFINITY_HOST=infinity
# Port to expose Infinity API to the host
INFINITY_THRIFT_PORT=23817
INFINITY_HTTP_PORT=23820
INFINITY_PSQL_PORT=5432
# The password for MySQL.
MYSQL_PASSWORD=infini_rag_flow
# The hostname where the MySQL service is exposed
MYSQL_HOST=mysql
# The database of the MySQL service to use
MYSQL_DBNAME=rag_flow
# The port used to expose the MySQL service to the host machine,
# allowing EXTERNAL access to the MySQL database running inside the Docker container.
MYSQL_PORT=5455
# The hostname where the MinIO service is exposed
MINIO_HOST=minio
# The port used to expose the MinIO console interface to the host machine,
# allowing EXTERNAL access to the web-based console running inside the Docker container.
MINIO_CONSOLE_PORT=9001
# The port used to expose the MinIO API service to the host machine,
# allowing EXTERNAL access to the MinIO object storage service running inside the Docker container.
MINIO_PORT=9000
# The username for MinIO.
# When updated, you must revise the `minio.user` entry in service_conf.yaml accordingly.
MINIO_USER=rag_flow
# The password for MinIO.
# When updated, you must revise the `minio.password` entry in service_conf.yaml accordingly.
MINIO_PASSWORD=infini_rag_flow
# The hostname where the Redis service is exposed
REDIS_HOST=redis
# The port used to expose the Redis service to the host machine,
# allowing EXTERNAL access to the Redis service running inside the Docker container.
REDIS_PORT=6379
# The password for Redis.
REDIS_PASSWORD=infini_rag_flow
# The port used to expose RAGFlow's HTTP API service to the host machine,
# allowing EXTERNAL access to the service running inside the Docker container.
SVR_HTTP_PORT=9380
# The RAGFlow Docker image to download.
# Defaults to the v0.16.0-slim edition, which is the RAGFlow Docker image without embedding models.
#RAGFLOW_IMAGE=infiniflow/ragflow:v0.16.0-slim
#
# To download the RAGFlow Docker image with embedding models, uncomment the following line instead:
RAGFLOW_IMAGE=swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/infiniflow/ragflow:v0.19.0
#
# The Docker image of the v0.16.0 edition includes:
# - Built-in embedding models:
# - BAAI/bge-large-zh-v1.5
# - BAAI/bge-reranker-v2-m3
# - maidalun1020/bce-embedding-base_v1
# - maidalun1020/bce-reranker-base_v1
# - Embedding models that will be downloaded once you select them in the RAGFlow UI:
# - BAAI/bge-base-en-v1.5
# - BAAI/bge-large-en-v1.5
# - BAAI/bge-small-en-v1.5
# - BAAI/bge-small-zh-v1.5
# - jinaai/jina-embeddings-v2-base-en
# - jinaai/jina-embeddings-v2-small-en
# - nomic-ai/nomic-embed-text-v1.5
# - sentence-transformers/all-MiniLM-L6-v2
#
#
# If you cannot download the RAGFlow Docker image:
#
# - For the `nightly-slim` edition, uncomment either of the following:
# RAGFLOW_IMAGE=swr.cn-north-4.myhuaweicloud.com/infiniflow/ragflow:nightly-slim
# RAGFLOW_IMAGE=registry.cn-hangzhou.aliyuncs.com/infiniflow/ragflow:nightly-slim
#
# - For the `nightly` edition, uncomment either of the following:
# RAGFLOW_IMAGE=swr.cn-north-4.myhuaweicloud.com/infiniflow/ragflow:nightly
# RAGFLOW_IMAGE=registry.cn-hangzhou.aliyuncs.com/infiniflow/ragflow:nightly
# The local time zone.
TIMEZONE='Asia/Shanghai'
# Uncomment the following line if you have limited access to huggingface.co:
HF_ENDPOINT=https://hf-mirror.com
# Optimizations for MacOS
# Uncomment the following line if your OS is MacOS:
# MACOS=1
# The maximum file size for each uploaded file, in bytes.
# You can uncomment this line and update the value if you wish to change the 128M file size limit
# MAX_CONTENT_LENGTH=134217728
# After making the change, ensure you update `client_max_body_size` in nginx/nginx.conf correspondingly.
# The log level for the RAGFlow's owned packages and imported packages.
# Available level:
# - `DEBUG`
# - `INFO` (default)
# - `WARNING`
# - `ERROR`
# For example, following line changes the log level of `ragflow.es_conn` to `DEBUG`:
LOG_LEVELS=ragflow.es_conn=DEBUG
编辑docker-compose-base.yml文件
主要还是编辑image,安装下面的设置,国内肯定是可以正常拉取容器镜像的
services:
es01:
container_name: ragflow-es-01
profiles:
- elasticsearch
image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/library/elasticsearch:${STACK_VERSION}
volumes:
- esdata01:/usr/share/elasticsearch/data
ports:
- ${ES_PORT}:9200
env_file: .env
environment:
- node.name=es01
- ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
- bootstrap.memory_lock=false
- discovery.type=single-node
- xpack.security.enabled=true
- xpack.security.http.ssl.enabled=false
- xpack.security.transport.ssl.enabled=false
- cluster.routing.allocation.disk.watermark.low=5gb
- cluster.routing.allocation.disk.watermark.high=3gb
- cluster.routing.allocation.disk.watermark.flood_stage=2gb
- TZ=${TIMEZONE}
mem_limit: ${MEM_LIMIT}
ulimits:
memlock:
soft: -1
hard: -1
healthcheck:
test: ["CMD-SHELL", "curl http://localhost:9200"]
interval: 10s
timeout: 10s
retries: 120
networks:
- ragflow
restart: on-failure
infinity:
container_name: ragflow-infinity
profiles:
- infinity
image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/infiniflow/infinity:v0.6.0-dev3
volumes:
- infinity_data:/var/infinity
- ./infinity_conf.toml:/infinity_conf.toml
command: ["-f", "/infinity_conf.toml"]
ports:
- ${INFINITY_THRIFT_PORT}:23817
- ${INFINITY_HTTP_PORT}:23820
- ${INFINITY_PSQL_PORT}:5432
env_file: .env
environment:
- TZ=${TIMEZONE}
mem_limit: ${MEM_LIMIT}
ulimits:
nofile:
soft: 500000
hard: 500000
networks:
- ragflow
healthcheck:
test: ["CMD", "curl", "http://localhost:23820/admin/node/current"]
interval: 10s
timeout: 10s
retries: 120
restart: on-failure
mysql:
# mysql:5.7 linux/arm64 image is unavailable.
image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/mysql:8.0.39
container_name: ragflow-mysql
env_file: .env
environment:
- MYSQL_ROOT_PASSWORD=${MYSQL_PASSWORD}
- TZ=${TIMEZONE}
command:
--max_connections=1000
--character-set-server=utf8mb4
--collation-server=utf8mb4_unicode_ci
--default-authentication-plugin=mysql_native_password
--tls_version="TLSv1.2,TLSv1.3"
--init-file /data/application/init.sql
ports:
- ${MYSQL_PORT}:3306
volumes:
- mysql_data:/var/lib/mysql
- ./init.sql:/data/application/init.sql
networks:
- ragflow
healthcheck:
test: ["CMD", "mysqladmin" ,"ping", "-uroot", "-p${MYSQL_PASSWORD}"]
interval: 10s
timeout: 10s
retries: 3
restart: on-failure
minio:
image: quay.io/minio/minio:RELEASE.2023-12-20T01-00-02Z
container_name: ragflow-minio
command: server --console-address ":9001" /data
ports:
- ${MINIO_PORT}:9000
- ${MINIO_CONSOLE_PORT}:9001
env_file: .env
environment:
- MINIO_ROOT_USER=${MINIO_USER}
- MINIO_ROOT_PASSWORD=${MINIO_PASSWORD}
- TZ=${TIMEZONE}
volumes:
- minio_data:/data
networks:
- ragflow
restart: on-failure
redis:
# swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/valkey/valkey:8
image: valkey/valkey:8
container_name: ragflow-redis
command: redis-server --requirepass ${REDIS_PASSWORD} --maxmemory 128mb --maxmemory-policy allkeys-lru
env_file: .env
ports:
- ${REDIS_PORT}:6379
volumes:
- redis_data:/data
networks:
- ragflow
restart: on-failure
volumes:
esdata01:
driver: local
infinity_data:
driver: local
mysql_data:
driver: local
minio_data:
driver: local
redis_data:
driver: local
networks:
ragflow:
driver: bridge
运行
docker compose -f docker-compose.yml up -d
检查ragflow服务器状态
docker logs -f ragflow-server
登录
一开始没有默认账户,因此需要先创建一个账户,很简单,创建新账户后,登录,界面如下
部署在k8s上
用git命令下载仓库后,其实本身就已经有helm chart了,主要是更改values.yaml文件,然后就可以用helm部署ragflow
如果想要将克隆的代码放置在自定义的目录下,可以按照下面的命令
git clone https://github.com/infiniflow/ragflow.git ragflow-v0.19.1
以下是values.yaml文件示例:主要是更改了容器仓库,方便国内拉取容器,还启用了ingress,这个要根据实际情况选择
# Based on docker compose .env file
env:
# The type of doc engine to use.
# Available options:
# - `elasticsearch` (default)
# - `infinity` (https://github.com/infiniflow/infinity)
# DOC_ENGINE: elasticsearch
DOC_ENGINE: infinity
# The version of Elasticsearch.
STACK_VERSION: "8.11.3"
# The password for Elasticsearch
ELASTIC_PASSWORD: infini_rag_flow_helm
# The password for MySQL
MYSQL_PASSWORD: infini_rag_flow_helm
# The database of the MySQL service to use
MYSQL_DBNAME: rag_flow
# The username for MinIO.
MINIO_ROOT_USER: rag_flow
# The password for MinIO
MINIO_PASSWORD: infini_rag_flow_helm
# The password for Redis
REDIS_PASSWORD: infini_rag_flow_helm
# The RAGFlow Docker image to download.
# Defaults to the v0.19.0-slim edition, which is the RAGFlow Docker image without embedding models.
#RAGFLOW_IMAGE: infiniflow/ragflow:v0.19.0-slim #此处进行了调整
#
# To download the RAGFlow Docker image with embedding models, uncomment the following line instead:
RAGFLOW_IMAGE: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/infiniflow/ragflow:v0.19.0 #此处进行了调整
#
# The Docker image of the v0.19.0 edition includes:
# - Built-in embedding models:
# - BAAI/bge-large-zh-v1.5
# - BAAI/bge-reranker-v2-m3
# - maidalun1020/bce-embedding-base_v1
# - maidalun1020/bce-reranker-base_v1
# - Embedding models that will be downloaded once you select them in the RAGFlow UI:
# - BAAI/bge-base-en-v1.5
# - BAAI/bge-large-en-v1.5
# - BAAI/bge-small-en-v1.5
# - BAAI/bge-small-zh-v1.5
# - jinaai/jina-embeddings-v2-base-en
# - jinaai/jina-embeddings-v2-small-en
# - nomic-ai/nomic-embed-text-v1.5
# - sentence-transformers/all-MiniLM-L6-v2
#
#
# The local time zone.
TIMEZONE: "Asia/Shanghai"
# Uncomment the following line if you have limited access to huggingface.co:
HF_ENDPOINT: https://hf-mirror.com #此处进行了调整
# The maximum file size for each uploaded file, in bytes.
# You can uncomment this line and update the value if you wish to change 128M file size limit
# MAX_CONTENT_LENGTH: "134217728"
# After making the change, ensure you update `client_max_body_size` in nginx/nginx.conf correspondingly.
ragflow:
deployment:
strategy:
resources:
service:
# Use LoadBalancer to expose the web interface externally
type: ClusterIP
api:
service:
enabled: true
type: ClusterIP
infinity:
image:
repository: infiniflow/infinity
tag: v0.6.0-dev3
storage:
className:
capacity: 5Gi
deployment:
strategy:
resources:
service:
type: ClusterIP
elasticsearch:
storage:
className:
capacity: 20Gi
deployment:
strategy:
resources:
requests:
cpu: "4"
memory: "16Gi"
service:
type: ClusterIP
minio:
image:
repository: quay.io/minio/minio
tag: RELEASE.2023-12-20T01-00-02Z
storage:
className:
capacity: 5Gi
deployment:
strategy:
resources:
service:
type: ClusterIP
mysql:
image:
repository: mysql
tag: 8.0.39
storage:
className:
capacity: 5Gi
deployment:
strategy:
resources:
service:
type: ClusterIP
redis:
image:
repository: valkey/valkey
tag: 8
storage:
className:
capacity: 5Gi
persistence:
enabled: true
deployment:
strategy:
resources:
service:
type: ClusterIP
# This block is for setting up web service ingress. For more information, see:
# https://kubernetes.io/docs/concepts/services-networking/ingress/
ingress:
enabled: true #此处进行了调整
className: ""
annotations: {}
# kubernetes.io/ingress.class: nginx
# kubernetes.io/tls-acme: "true"
hosts:
- host: ragflow.dltornado2.com #此处进行了调整
paths:
- path: /
pathType: ImplementationSpecific
#tls: [] #此处进行了调整
# - secretName: chart-example-tls
# hosts:
# - chart-example.local
在templates目录下还有一个文件需要更改,如下,将xpack.security.enabled的值设置为false,这样会导致安全性很低,生产环境中决不能如此设置,我发现如果不这样的设置的话,后续ragflow的日志一直会提示无法连接到es,如果进入webUI,注册时,单击continue没有任何反应。当然也可以在部署后,通过修改configmap来更改elasticsearch的配置
{{- if eq .Values.env.DOC_ENGINE "elasticsearch" -}}
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ include "ragflow.fullname" . }}-es-config
data:
node.name: "es01"
bootstrap.memory_lock: "false"
discovery.type: "single-node"
xpack.security.enabled: "false"
xpack.security.http.ssl.enabled: "false"
xpack.security.transport.ssl.enabled: "false"
{{- end -}}
cd ragflow/helm/
helm install -f values.yaml ragflow . --create-namespace -n ragflow --timeout 5h
其实,下面这个values.yaml文件也是可以的,使用的是默认的infinity 文档引擎,只是我发现,如果用下面的yaml文件,则至少要等30分钟(实践发现,就算是用elasticsearch作为文档引擎,也要等比较久才能正常使用ragflow),才能正常注册账户,否则点continue没有任何反映.
另外,我发现用infinity作为文档引擎,回答的效果不好。
# Based on docker compose .env file
env:
# The type of doc engine to use.
# Available options:
# - `elasticsearch` (default)
# - `infinity` (https://github.com/infiniflow/infinity)
# DOC_ENGINE: elasticsearch
DOC_ENGINE: infinity
# The version of Elasticsearch.
STACK_VERSION: "8.11.3"
# The password for Elasticsearch
ELASTIC_PASSWORD: infini_rag_flow_helm
# The password for MySQL
MYSQL_PASSWORD: infini_rag_flow_helm
# The database of the MySQL service to use
MYSQL_DBNAME: rag_flow
# The username for MinIO.
MINIO_ROOT_USER: rag_flow
# The password for MinIO
MINIO_PASSWORD: infini_rag_flow_helm
# The password for Redis
REDIS_PASSWORD: infini_rag_flow_helm
# The RAGFlow Docker image to download.
# Defaults to the v0.17.2-slim edition, which is the RAGFlow Docker image without embedding models.
# RAGFLOW_IMAGE: infiniflow/ragflow:v0.17.2-slim
#
# To download the RAGFlow Docker image with embedding models, uncomment the following line instead:
RAGFLOW_IMAGE: infiniflow/ragflow:v0.17.2
#
# The Docker image of the v0.17.2 edition includes:
# - Built-in embedding models:
# - BAAI/bge-large-zh-v1.5
# - BAAI/bge-reranker-v2-m3
# - maidalun1020/bce-embedding-base_v1
# - maidalun1020/bce-reranker-base_v1
# - Embedding models that will be downloaded once you select them in the RAGFlow UI:
# - BAAI/bge-base-en-v1.5
# - BAAI/bge-large-en-v1.5
# - BAAI/bge-small-en-v1.5
# - BAAI/bge-small-zh-v1.5
# - jinaai/jina-embeddings-v2-base-en
# - jinaai/jina-embeddings-v2-small-en
# - nomic-ai/nomic-embed-text-v1.5
# - sentence-transformers/all-MiniLM-L6-v2
#
#
# The local time zone.
TIMEZONE: "Asia/Shanghai"
# Uncomment the following line if you have limited access to huggingface.co:
HF_ENDPOINT: https://hf-mirror.com
# The maximum file size for each uploaded file, in bytes.
# You can uncomment this line and update the value if you wish to change 128M file size limit
# MAX_CONTENT_LENGTH: "134217728"
# After making the change, ensure you update `client_max_body_size` in nginx/nginx.conf correspondingly.
ragflow:
deployment:
strategy:
resources:
service:
# Use LoadBalancer to expose the web interface externally
type: NodePort
api:
service:
enabled: true
type: ClusterIP
infinity:
image:
repository: infiniflow/infinity
tag: v0.6.0-dev3
storage:
className:
capacity: 5Gi
deployment:
strategy:
resources:
service:
type: ClusterIP
elasticsearch:
storage:
className:
capacity: 20Gi
deployment:
strategy:
resources:
requests:
cpu: "4"
memory: "16Gi"
service:
type: ClusterIP
minio:
image:
repository: quay.io/minio/minio
tag: RELEASE.2023-12-20T01-00-02Z
storage:
className:
capacity: 5Gi
deployment:
strategy:
resources:
service:
type: ClusterIP
mysql:
image:
repository: mysql
tag: 8.0.39
storage:
className:
capacity: 5Gi
deployment:
strategy:
resources:
service:
type: ClusterIP
redis:
image:
repository: valkey/valkey
tag: 8
storage:
className:
capacity: 5Gi
deployment:
strategy:
resources:
service:
type: ClusterIP
# This block is for setting up web service ingress. For more information, see:
# https://kubernetes.io/docs/concepts/services-networking/ingress/
ingress:
enabled: true
className: ""
annotations: {}
# kubernetes.io/ingress.class: nginx
# kubernetes.io/tls-acme: "true"
hosts:
- host: myai.dltornado2.com
paths:
- path: /
pathType: ImplementationSpecific
tls: []
# - secretName: chart-example-tls
# hosts:
# - chart-example.local
要能看到下图红色箭头所示的日志提示,才说明可以正常使用ragflow了
升级
k8s
先克隆代码,后面的ragflow-v0.19.1指的是代码存放的本地目录,为了方便可以用版本命令
git clone https://github.com/infiniflow/ragflow.git ragflow-v0.19.1
更改helm目录下的values.yaml文件,然后开始升级,
注意,由于ragflow镜像很大,接近20GB,所以最好延长超时时间,默认超时时间为5m
helm upgrade ragflow . -f values.yaml --atomic -n ragflow --timeout=600m
创建rag应用(AI助理)
添加模型供应商
创建知识库
可以看到对文档进行解析时,对CPU消耗非常大,当然日志中也提到了是通过CPU进行计算的,如果有GPU,会快很多。
验证
如下图,先创建一个助理,然后直接提问,可以看到这个助手已经从知识库中找到了有关信息,而且给出了引用源
创建ragflow api key
若要将ragflow与其他应用集成,则需要创建ragflow api key
让AI助手支持联网搜索(可选)
注册tavily
进入tavily官网,先注册一个账户,可以直接通过github账户或者Google账户登录,成功登录后,就能看到下面这个界面。tavily每月提供1000次免费调用次数,复制API key
在AI助理的“助理设置”页面中,粘贴刚刚复制的tavily API KEY保存即可,然后现在雨AI助理聊天时,AI助理会自动联网搜索以查找自己不知道的知识(包括知识库中不存在的知识),减少幻觉,下面是一个例子:
示例
与AI助理对话时上传附件
关于多模态模型的说明
实测,发现如果想要上传附件,大模型可以不用选择多模态大模型,普通大模型即可,比如qwen-plus、deepseek等
示例一
ragflow支持发送文件给AI助理,包括各种类型的文件,比如图片、视频、文档等等,以下是一个示例:
我上传的图片如下:
下面是对话截图
单击回复上方的灯泡图标可以查看详细情况,可以看到,AI助理已经识别出我上传图片中的文字,并将这些文字作为上下文传给LLM
示例二
下面这个示例更能说明ragflow十分有用,这个示例是用户手上有一份经营报告,用户直接上传这份报告,然后就这份报告询问AI助理,任何问题都可以问,比如下图,询问1-2月份完成情况,AI助理根据经营报告给出了详细的数据,数据是完全准确的,因为是直接从经营报告中检索的,而不会虚构数据。
从prompt上可以看出ragflow正确的识别出ppt文件中的内容,并根据提取到的内容进行回答
查看知识库,发现用户上传的文件自动出现在知识库中了,只是不会自动解析,也就是别的用户和会话是不会调用这些数据的,除非解析过了。
关于PDF文件表格分页问题
参考资料:Configure knowledge base | RAGFlow
调整切片方法即可解决,可以选择“manual”切片方法,同一部分中的图和表不会被分割,并且块大小可能会很大
再进行文档解析,可以看出,确实已经将连续页中的同一个表作为一个块了。
关于作者和DreamAI
关注微信公众号“AI发烧友”,获取更多IT开发运维实用工具与技巧,还有很多AI技术文档!
更多推荐
所有评论(0)