ragflow——一个非常强大的开源RAG引擎

参考资料:Quick start | RAGFlow

阅读原文

建议阅读原文,始终查看最新文档版本,获得最佳阅读体验:《ragflow——一个非常强大的开源RAG引擎》

什么是ragflow

RAGFlow 是一个基于深度文档理解的开源 RAG(检索增强生成)引擎。它为各种规模的企业提供了一套简化的 RAG 工作流程,结合大语言模型(LLM),能够提供基于多种复杂格式数据的、具有可靠引用依据的真实问答能力。

🌟 核心特性:

🍭 “质量输入,质量输出”   基于深度文档理解,从复杂格式的非结构化数据中提取高质量知识。   能够在数量几乎无限的文本中精准定位所需信息(“在数据 haystack 中找 needle”)。

🍱 模板化分块处理   智能且可解释。   提供丰富的模板选项供选择。

🌱 有据可依的引用,减少幻觉   支持文本分块可视化,便于人工干预。   可快速查看关键参考资料,并提供可追溯的引用来源,确保回答准确可靠。

🍔 兼容多种异构数据源   支持 Word、PPT、Excel、TXT、图片、扫描件、结构化数据、网页等多种格式。

🛀 自动化、零负担的 RAG 工作流   为个人用户和大型企业量身打造的简化版 RAG 编排流程。   支持配置各类大语言模型(LLM)和嵌入模型。   多路召回 + 融合重排序。   提供直观的 API,便于无缝集成到业务系统中。

部署(docker compose)

docker部署

写作本文时,最新版本是0.19.0,建议始终安装最新版本的ragflow

git clone https://github.com/infiniflow/ragflow.git
cd ragflow/docker
git checkout -f v0.19.0

编辑.env文件(位于ragflow/docker目录)

主要更改的是image(下面的第87行),因为默认从docker hub拉取镜像,容易出错

# The type of doc engine to use.
# Available options:
# - `elasticsearch` (default) 
# - `infinity` (https://github.com/infiniflow/infinity)
DOC_ENGINE=${DOC_ENGINE:-elasticsearch}

# ------------------------------
# docker env var for specifying vector db type at startup
# (based on the vector db type, the corresponding docker
# compose profile will be used)
# ------------------------------
COMPOSE_PROFILES=${DOC_ENGINE}

# The version of Elasticsearch.
STACK_VERSION=8.11.3

# The hostname where the Elasticsearch service is exposed
ES_HOST=es01

# The port used to expose the Elasticsearch service to the host machine, 
# allowing EXTERNAL access to the service running inside the Docker container.
ES_PORT=1200

# The password for Elasticsearch. 
ELASTIC_PASSWORD=infini_rag_flow

# The port used to expose the Kibana service to the host machine, 
# allowing EXTERNAL access to the service running inside the Docker container.
KIBANA_PORT=6601
KIBANA_USER=rag_flow
KIBANA_PASSWORD=infini_rag_flow

# The maximum amount of the memory, in bytes, that a specific Docker container can use while running.
# Update it according to the available memory in the host machine.
MEM_LIMIT=8073741824

# The hostname where the Infinity service is exposed
INFINITY_HOST=infinity

# Port to expose Infinity API to the host
INFINITY_THRIFT_PORT=23817
INFINITY_HTTP_PORT=23820
INFINITY_PSQL_PORT=5432

# The password for MySQL. 
MYSQL_PASSWORD=infini_rag_flow
# The hostname where the MySQL service is exposed
MYSQL_HOST=mysql
# The database of the MySQL service to use
MYSQL_DBNAME=rag_flow
# The port used to expose the MySQL service to the host machine, 
# allowing EXTERNAL access to the MySQL database running inside the Docker container. 
MYSQL_PORT=5455

# The hostname where the MinIO service is exposed
MINIO_HOST=minio
# The port used to expose the MinIO console interface to the host machine, 
# allowing EXTERNAL access to the web-based console running inside the Docker container. 
MINIO_CONSOLE_PORT=9001
# The port used to expose the MinIO API service to the host machine, 
# allowing EXTERNAL access to the MinIO object storage service running inside the Docker container. 
MINIO_PORT=9000
# The username for MinIO. 
# When updated, you must revise the `minio.user` entry in service_conf.yaml accordingly.
MINIO_USER=rag_flow
# The password for MinIO. 
# When updated, you must revise the `minio.password` entry in service_conf.yaml accordingly.
MINIO_PASSWORD=infini_rag_flow

# The hostname where the Redis service is exposed
REDIS_HOST=redis
# The port used to expose the Redis service to the host machine, 
# allowing EXTERNAL access to the Redis service running inside the Docker container.
REDIS_PORT=6379
# The password for Redis.
REDIS_PASSWORD=infini_rag_flow

# The port used to expose RAGFlow's HTTP API service to the host machine, 
# allowing EXTERNAL access to the service running inside the Docker container.
SVR_HTTP_PORT=9380

# The RAGFlow Docker image to download.
# Defaults to the v0.16.0-slim edition, which is the RAGFlow Docker image without embedding models.
#RAGFLOW_IMAGE=infiniflow/ragflow:v0.16.0-slim
#
# To download the RAGFlow Docker image with embedding models, uncomment the following line instead:
RAGFLOW_IMAGE=swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/infiniflow/ragflow:v0.19.0
# 
# The Docker image of the v0.16.0 edition includes:
# - Built-in embedding models:
#   - BAAI/bge-large-zh-v1.5
#   - BAAI/bge-reranker-v2-m3
#   - maidalun1020/bce-embedding-base_v1
#   - maidalun1020/bce-reranker-base_v1
# - Embedding models that will be downloaded once you select them in the RAGFlow UI:
#   - BAAI/bge-base-en-v1.5
#   - BAAI/bge-large-en-v1.5
#   - BAAI/bge-small-en-v1.5
#   - BAAI/bge-small-zh-v1.5
#   - jinaai/jina-embeddings-v2-base-en
#   - jinaai/jina-embeddings-v2-small-en
#   - nomic-ai/nomic-embed-text-v1.5
#   - sentence-transformers/all-MiniLM-L6-v2
#
# 


# If you cannot download the RAGFlow Docker image:
#
# - For the `nightly-slim` edition, uncomment either of the following:
# RAGFLOW_IMAGE=swr.cn-north-4.myhuaweicloud.com/infiniflow/ragflow:nightly-slim
# RAGFLOW_IMAGE=registry.cn-hangzhou.aliyuncs.com/infiniflow/ragflow:nightly-slim
#
# - For the `nightly` edition, uncomment either of the following:
# RAGFLOW_IMAGE=swr.cn-north-4.myhuaweicloud.com/infiniflow/ragflow:nightly
# RAGFLOW_IMAGE=registry.cn-hangzhou.aliyuncs.com/infiniflow/ragflow:nightly

# The local time zone.
TIMEZONE='Asia/Shanghai'

# Uncomment the following line if you have limited access to huggingface.co:
HF_ENDPOINT=https://hf-mirror.com

# Optimizations for MacOS
# Uncomment the following line if your OS is MacOS:
# MACOS=1

# The maximum file size for each uploaded file, in bytes.
# You can uncomment this line and update the value if you wish to change the 128M file size limit
# MAX_CONTENT_LENGTH=134217728
# After making the change, ensure you update `client_max_body_size` in nginx/nginx.conf correspondingly.

# The log level for the RAGFlow's owned packages and imported packages.
# Available level:
# - `DEBUG`
# - `INFO` (default)
# - `WARNING`
# - `ERROR`
# For example, following line changes the log level of `ragflow.es_conn` to `DEBUG`:
LOG_LEVELS=ragflow.es_conn=DEBUG

编辑docker-compose-base.yml文件

主要还是编辑image,安装下面的设置,国内肯定是可以正常拉取容器镜像的

services:
  es01:
    container_name: ragflow-es-01
    profiles:
      - elasticsearch
    image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/library/elasticsearch:${STACK_VERSION}
    volumes:
      - esdata01:/usr/share/elasticsearch/data
    ports:
      - ${ES_PORT}:9200
    env_file: .env
    environment:
      - node.name=es01
      - ELASTIC_PASSWORD=${ELASTIC_PASSWORD}
      - bootstrap.memory_lock=false
      - discovery.type=single-node
      - xpack.security.enabled=true
      - xpack.security.http.ssl.enabled=false
      - xpack.security.transport.ssl.enabled=false
      - cluster.routing.allocation.disk.watermark.low=5gb
      - cluster.routing.allocation.disk.watermark.high=3gb
      - cluster.routing.allocation.disk.watermark.flood_stage=2gb
      - TZ=${TIMEZONE}
    mem_limit: ${MEM_LIMIT}
    ulimits:
      memlock:
        soft: -1
        hard: -1
    healthcheck:
      test: ["CMD-SHELL", "curl http://localhost:9200"]
      interval: 10s
      timeout: 10s
      retries: 120
    networks:
      - ragflow
    restart: on-failure

  infinity:
    container_name: ragflow-infinity
    profiles:
      - infinity
    image:  swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/infiniflow/infinity:v0.6.0-dev3
    volumes:
      - infinity_data:/var/infinity
      - ./infinity_conf.toml:/infinity_conf.toml
    command: ["-f", "/infinity_conf.toml"]
    ports:
      - ${INFINITY_THRIFT_PORT}:23817
      - ${INFINITY_HTTP_PORT}:23820
      - ${INFINITY_PSQL_PORT}:5432
    env_file: .env
    environment:
      - TZ=${TIMEZONE}
    mem_limit: ${MEM_LIMIT}
    ulimits:
      nofile:
        soft: 500000
        hard: 500000
    networks:
      - ragflow
    healthcheck:
      test: ["CMD", "curl", "http://localhost:23820/admin/node/current"]
      interval: 10s
      timeout: 10s
      retries: 120
    restart: on-failure


  mysql:
    # mysql:5.7 linux/arm64 image is unavailable.
    image: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/mysql:8.0.39
    container_name: ragflow-mysql
    env_file: .env
    environment:
      - MYSQL_ROOT_PASSWORD=${MYSQL_PASSWORD}
      - TZ=${TIMEZONE}
    command:
      --max_connections=1000
      --character-set-server=utf8mb4
      --collation-server=utf8mb4_unicode_ci
      --default-authentication-plugin=mysql_native_password
      --tls_version="TLSv1.2,TLSv1.3"
      --init-file /data/application/init.sql
    ports:
      - ${MYSQL_PORT}:3306
    volumes:
      - mysql_data:/var/lib/mysql
      - ./init.sql:/data/application/init.sql
    networks:
      - ragflow
    healthcheck:
      test: ["CMD", "mysqladmin" ,"ping", "-uroot", "-p${MYSQL_PASSWORD}"]
      interval: 10s
      timeout: 10s
      retries: 3
    restart: on-failure

  minio:
    image: quay.io/minio/minio:RELEASE.2023-12-20T01-00-02Z
    container_name: ragflow-minio
    command: server --console-address ":9001" /data
    ports:
      - ${MINIO_PORT}:9000
      - ${MINIO_CONSOLE_PORT}:9001
    env_file: .env
    environment:
      - MINIO_ROOT_USER=${MINIO_USER}
      - MINIO_ROOT_PASSWORD=${MINIO_PASSWORD}
      - TZ=${TIMEZONE}
    volumes:
      - minio_data:/data
    networks:
      - ragflow
    restart: on-failure

  redis:
    # swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/valkey/valkey:8
    image: valkey/valkey:8
    container_name: ragflow-redis
    command: redis-server --requirepass ${REDIS_PASSWORD} --maxmemory 128mb --maxmemory-policy allkeys-lru
    env_file: .env
    ports:
      - ${REDIS_PORT}:6379
    volumes:
      - redis_data:/data
    networks:
      - ragflow
    restart: on-failure



volumes:
  esdata01:
    driver: local
  infinity_data:
    driver: local
  mysql_data:
    driver: local
  minio_data:
    driver: local
  redis_data:
    driver: local

networks:
  ragflow:
    driver: bridge

运行

docker compose -f docker-compose.yml up -d

image.png

image.png

检查ragflow服务器状态

docker logs -f ragflow-server

image.png

登录

一开始没有默认账户,因此需要先创建一个账户,很简单,创建新账户后,登录,界面如下

image.png

部署在k8s上

用git命令下载仓库后,其实本身就已经有helm chart了,主要是更改values.yaml文件,然后就可以用helm部署ragflow

image.png

如果想要将克隆的代码放置在自定义的目录下,可以按照下面的命令

git clone https://github.com/infiniflow/ragflow.git ragflow-v0.19.1

以下是values.yaml文件示例:主要是更改了容器仓库,方便国内拉取容器,还启用了ingress,这个要根据实际情况选择

# Based on docker compose .env file
env:
  # The type of doc engine to use.
  # Available options:
  # - `elasticsearch` (default)
  # - `infinity` (https://github.com/infiniflow/infinity)
  # DOC_ENGINE: elasticsearch
  DOC_ENGINE: infinity

  # The version of Elasticsearch.
  STACK_VERSION: "8.11.3"

  # The password for Elasticsearch
  ELASTIC_PASSWORD: infini_rag_flow_helm

  # The password for MySQL
  MYSQL_PASSWORD: infini_rag_flow_helm
  # The database of the MySQL service to use
  MYSQL_DBNAME: rag_flow

  # The username for MinIO.
  MINIO_ROOT_USER: rag_flow
  # The password for MinIO
  MINIO_PASSWORD: infini_rag_flow_helm

  # The password for Redis
  REDIS_PASSWORD: infini_rag_flow_helm

  # The RAGFlow Docker image to download.
  # Defaults to the v0.19.0-slim edition, which is the RAGFlow Docker image without embedding models.
  #RAGFLOW_IMAGE: infiniflow/ragflow:v0.19.0-slim #此处进行了调整
  #
  # To download the RAGFlow Docker image with embedding models, uncomment the following line instead:
  RAGFLOW_IMAGE: swr.cn-north-4.myhuaweicloud.com/ddn-k8s/docker.io/infiniflow/ragflow:v0.19.0 #此处进行了调整
  #
  # The Docker image of the v0.19.0 edition includes:
  # - Built-in embedding models:
  #   - BAAI/bge-large-zh-v1.5
  #   - BAAI/bge-reranker-v2-m3
  #   - maidalun1020/bce-embedding-base_v1
  #   - maidalun1020/bce-reranker-base_v1
  # - Embedding models that will be downloaded once you select them in the RAGFlow UI:
  #   - BAAI/bge-base-en-v1.5
  #   - BAAI/bge-large-en-v1.5
  #   - BAAI/bge-small-en-v1.5
  #   - BAAI/bge-small-zh-v1.5
  #   - jinaai/jina-embeddings-v2-base-en
  #   - jinaai/jina-embeddings-v2-small-en
  #   - nomic-ai/nomic-embed-text-v1.5
  #   - sentence-transformers/all-MiniLM-L6-v2
  #
  #

  # The local time zone.
  TIMEZONE: "Asia/Shanghai"

  # Uncomment the following line if you have limited access to huggingface.co:
  HF_ENDPOINT: https://hf-mirror.com #此处进行了调整

  # The maximum file size for each uploaded file, in bytes.
  # You can uncomment this line and update the value if you wish to change 128M file size limit
  # MAX_CONTENT_LENGTH: "134217728"
  # After making the change, ensure you update `client_max_body_size` in nginx/nginx.conf correspondingly.

ragflow:
  deployment:
    strategy:
    resources:
  service:
    # Use LoadBalancer to expose the web interface externally
    type: ClusterIP
  api:
    service:
      enabled: true
      type: ClusterIP

infinity:
  image:
    repository: infiniflow/infinity
    tag: v0.6.0-dev3
  storage:
    className:
    capacity: 5Gi
  deployment:
    strategy:
    resources:
  service:
    type: ClusterIP

elasticsearch:
  storage:
    className:
    capacity: 20Gi
  deployment:
    strategy:
    resources:
      requests:
        cpu: "4"
        memory: "16Gi"
  service:
    type: ClusterIP

minio:
  image:
    repository: quay.io/minio/minio
    tag: RELEASE.2023-12-20T01-00-02Z
  storage:
    className:
    capacity: 5Gi
  deployment:
    strategy:
    resources:
  service:
    type: ClusterIP

mysql:
  image:
    repository: mysql
    tag: 8.0.39
  storage:
    className:
    capacity: 5Gi
  deployment:
    strategy:
    resources:
  service:
    type: ClusterIP

redis:
  image:
    repository: valkey/valkey
    tag: 8
  storage:
    className:
    capacity: 5Gi
  persistence:
    enabled: true
  deployment:
    strategy:
    resources:
  service:
    type: ClusterIP


# This block is for setting up web service ingress. For more information, see:
# https://kubernetes.io/docs/concepts/services-networking/ingress/
ingress:
  enabled: true #此处进行了调整
  className: ""
  annotations: {}
    # kubernetes.io/ingress.class: nginx
    # kubernetes.io/tls-acme: "true"
  hosts:
    - host: ragflow.dltornado2.com #此处进行了调整
      paths:
        - path: /
          pathType: ImplementationSpecific
  #tls: [] #此处进行了调整
  #  - secretName: chart-example-tls
  #    hosts:
  #      - chart-example.local

在templates目录下还有一个文件需要更改,如下,将xpack.security.enabled的值设置为false,这样会导致安全性很低,生产环境中决不能如此设置,我发现如果不这样的设置的话,后续ragflow的日志一直会提示无法连接到es,如果进入webUI,注册时,单击continue没有任何反应。当然也可以在部署后,通过修改configmap来更改elasticsearch的配置

{{- if eq .Values.env.DOC_ENGINE "elasticsearch" -}}
apiVersion: v1
kind: ConfigMap
metadata:
  name: {{ include "ragflow.fullname" . }}-es-config
data:
  node.name: "es01"
  bootstrap.memory_lock: "false"
  discovery.type: "single-node"
  xpack.security.enabled: "false"
  xpack.security.http.ssl.enabled: "false"
  xpack.security.transport.ssl.enabled: "false"
{{- end -}}

cd ragflow/helm/
helm install -f values.yaml ragflow . --create-namespace -n ragflow --timeout 5h

image.png

其实,下面这个values.yaml文件也是可以的,使用的是默认的infinity 文档引擎,只是我发现,如果用下面的yaml文件,则至少要等30分钟(实践发现,就算是用elasticsearch作为文档引擎,也要等比较久才能正常使用ragflow),才能正常注册账户,否则点continue没有任何反映.

另外,我发现用infinity作为文档引擎,回答的效果不好。

# Based on docker compose .env file
env:
  # The type of doc engine to use.
  # Available options:
  # - `elasticsearch` (default)
  # - `infinity` (https://github.com/infiniflow/infinity)
  # DOC_ENGINE: elasticsearch
  DOC_ENGINE: infinity

  # The version of Elasticsearch.
  STACK_VERSION: "8.11.3"

  # The password for Elasticsearch
  ELASTIC_PASSWORD: infini_rag_flow_helm

  # The password for MySQL
  MYSQL_PASSWORD: infini_rag_flow_helm
  # The database of the MySQL service to use
  MYSQL_DBNAME: rag_flow

  # The username for MinIO.
  MINIO_ROOT_USER: rag_flow
  # The password for MinIO
  MINIO_PASSWORD: infini_rag_flow_helm

  # The password for Redis
  REDIS_PASSWORD: infini_rag_flow_helm

  # The RAGFlow Docker image to download.
  # Defaults to the v0.17.2-slim edition, which is the RAGFlow Docker image without embedding models.
  # RAGFLOW_IMAGE: infiniflow/ragflow:v0.17.2-slim
  #
  # To download the RAGFlow Docker image with embedding models, uncomment the following line instead:
  RAGFLOW_IMAGE: infiniflow/ragflow:v0.17.2
  #
  # The Docker image of the v0.17.2 edition includes:
  # - Built-in embedding models:
  #   - BAAI/bge-large-zh-v1.5
  #   - BAAI/bge-reranker-v2-m3
  #   - maidalun1020/bce-embedding-base_v1
  #   - maidalun1020/bce-reranker-base_v1
  # - Embedding models that will be downloaded once you select them in the RAGFlow UI:
  #   - BAAI/bge-base-en-v1.5
  #   - BAAI/bge-large-en-v1.5
  #   - BAAI/bge-small-en-v1.5
  #   - BAAI/bge-small-zh-v1.5
  #   - jinaai/jina-embeddings-v2-base-en
  #   - jinaai/jina-embeddings-v2-small-en
  #   - nomic-ai/nomic-embed-text-v1.5
  #   - sentence-transformers/all-MiniLM-L6-v2
  #
  #

  # The local time zone.
  TIMEZONE: "Asia/Shanghai"

  # Uncomment the following line if you have limited access to huggingface.co:
  HF_ENDPOINT: https://hf-mirror.com

  # The maximum file size for each uploaded file, in bytes.
  # You can uncomment this line and update the value if you wish to change 128M file size limit
  # MAX_CONTENT_LENGTH: "134217728"
  # After making the change, ensure you update `client_max_body_size` in nginx/nginx.conf correspondingly.

ragflow:
  deployment:
    strategy:
    resources:
  service:
    # Use LoadBalancer to expose the web interface externally
    type: NodePort
  api:
    service:
      enabled: true
      type: ClusterIP

infinity:
  image:
    repository: infiniflow/infinity
    tag: v0.6.0-dev3
  storage:
    className:
    capacity: 5Gi
  deployment:
    strategy:
    resources:
  service:
    type: ClusterIP

elasticsearch:
  storage:
    className:
    capacity: 20Gi
  deployment:
    strategy:
    resources:
      requests:
        cpu: "4"
        memory: "16Gi"
  service:
    type: ClusterIP

minio:
  image:
    repository: quay.io/minio/minio
    tag: RELEASE.2023-12-20T01-00-02Z
  storage:
    className:
    capacity: 5Gi
  deployment:
    strategy:
    resources:
  service:
    type: ClusterIP

mysql:
  image:
    repository: mysql
    tag: 8.0.39
  storage:
    className:
    capacity: 5Gi
  deployment:
    strategy:
    resources:
  service:
    type: ClusterIP

redis:
  image:
    repository: valkey/valkey
    tag: 8
  storage:
    className:
    capacity: 5Gi
  deployment:
    strategy:
    resources:
  service:
    type: ClusterIP


# This block is for setting up web service ingress. For more information, see:
# https://kubernetes.io/docs/concepts/services-networking/ingress/
ingress:
  enabled: true
  className: ""
  annotations: {}
    # kubernetes.io/ingress.class: nginx
    # kubernetes.io/tls-acme: "true"
  hosts:
    - host: myai.dltornado2.com
      paths:
        - path: /
          pathType: ImplementationSpecific
  tls: []
  #  - secretName: chart-example-tls
  #    hosts:
  #      - chart-example.local

要能看到下图红色箭头所示的日志提示,才说明可以正常使用ragflow了

image.png

升级

k8s

先克隆代码,后面的ragflow-v0.19.1指的是代码存放的本地目录,为了方便可以用版本命令

git clone https://github.com/infiniflow/ragflow.git ragflow-v0.19.1

更改helm目录下的values.yaml文件,然后开始升级,

注意,由于ragflow镜像很大,接近20GB,所以最好延长超时时间,默认超时时间为5m

helm upgrade ragflow . -f values.yaml --atomic -n ragflow --timeout=600m

创建rag应用(AI助理)

添加模型供应商

image.png

创建知识库

可以看到对文档进行解析时,对CPU消耗非常大,当然日志中也提到了是通过CPU进行计算的,如果有GPU,会快很多。

image.png

image.png

验证

如下图,先创建一个助理,然后直接提问,可以看到这个助手已经从知识库中找到了有关信息,而且给出了引用源

image.png

创建ragflow api key

若要将ragflow与其他应用集成,则需要创建ragflow api key

image.png

让AI助手支持联网搜索(可选)

注册tavily

进入tavily官网,先注册一个账户,可以直接通过github账户或者Google账户登录,成功登录后,就能看到下面这个界面。tavily每月提供1000次免费调用次数,复制API key

image.png

在AI助理的“助理设置”页面中,粘贴刚刚复制的tavily API KEY保存即可,然后现在雨AI助理聊天时,AI助理会自动联网搜索以查找自己不知道的知识(包括知识库中不存在的知识),减少幻觉,下面是一个例子:

示例

image.png

与AI助理对话时上传附件

关于多模态模型的说明

实测,发现如果想要上传附件,大模型可以不用选择多模态大模型,普通大模型即可,比如qwen-plus、deepseek等

示例一

ragflow支持发送文件给AI助理,包括各种类型的文件,比如图片、视频、文档等等,以下是一个示例:

我上传的图片如下:

image.png

下面是对话截图

image.png

单击回复上方的灯泡图标可以查看详细情况,可以看到,AI助理已经识别出我上传图片中的文字,并将这些文字作为上下文传给LLM

image.png

示例二

下面这个示例更能说明ragflow十分有用,这个示例是用户手上有一份经营报告,用户直接上传这份报告,然后就这份报告询问AI助理,任何问题都可以问,比如下图,询问1-2月份完成情况,AI助理根据经营报告给出了详细的数据,数据是完全准确的,因为是直接从经营报告中检索的,而不会虚构数据。

image.png

从prompt上可以看出ragflow正确的识别出ppt文件中的内容,并根据提取到的内容进行回答

image.png

查看知识库,发现用户上传的文件自动出现在知识库中了,只是不会自动解析,也就是别的用户和会话是不会调用这些数据的,除非解析过了。

image.png

关于PDF文件表格分页问题

参考资料:Configure knowledge base | RAGFlow

企业微信截图_17427997762293.png企业微信截图_17427997866359.png

调整切片方法即可解决,可以选择“manual”切片方法,同一部分中的图和表不会被分割,并且块大小可能会很大

image.pngimage.png

再进行文档解析,可以看出,确实已经将连续页中的同一个表作为一个块了。

image.png

image.png

关于作者和DreamAI

https://docs.dingtalk.com/i/nodes/Amq4vjg890AlRbA6Td9ZvlpDJ3kdP0wQ?iframeQuery=utm_source=portal&utm_medium=portal_recent

关注微信公众号“AI发烧友”,获取更多IT开发运维实用工具与技巧,还有很多AI技术文档!

梦幻智能logo-01(无水印).png

Logo

技术共进,成长同行——讯飞AI开发者社区

更多推荐