Docusaurus+离线安装Typesense并实现中文全文搜索
Docusaurus+离线安装Typesense并实现中文全文搜索
Centos离线安装Typesense并实现中文全文搜索
技术栈介绍
- Docusaurus:Facebook 专门为开源项目开发者提供的一款易于维护的静态网站创建工具,使用 Markdown 即可更新网站。构建一个带有主页、文档、API、帮助以及博客页面的静态网站。
- Typesense:Typesense是一个开源搜索引擎,它是Algolia 的开源替代品。
用到的服务器
- 本地机器:
IP:172.28.0.1
操作系统:win
作用:编辑docusaurus,部署nginx - 服务器A
IP:172.28.12.248
操作系统:Centos7.9
作用:通过docker部署Typesense和docsearch-scraper
开始部署
安装docker
下载docker
从官网下载如下文件
containerd.io-1.6.9-3.1.el7.x86_64.rpm
docker-buildx-plugin-0.11.2-1.el7.x86_64.rpm
docker-ce-24.0.6-1.el7.x86_64.rpm
docker-ce-cli-24.0.6-1.el7.x86_64.rpm
docker-ce-rootless-extras-24.0.6-1.el7.x86_64.rpm
docker-compose-plugin-2.6.0-3.el7.x86_64.rpm
docker-scan-plugin-0.9.0-3.el7.x86_64.rpm
并上传至服务器上
安装dockeryum install *.rpm
启动dockersystemctl start docker
部署Typesense
通过能联网的docker下载镜像,docker pull typesense/typesense:0.25.1
docker pull typesense/docsearch-scraper:0.8.0
保存镜像并拷贝到服务器上docker save typesense/typesense:0.25.1 -o typesense:0.25.1.tar
docker save typesense/docsearch-scraper:0.8.0 -o docsearch-scraper.0.8.0.tar
在服务器加载镜像docker load -i typesense_0.25.1.tar
docker load -i docsearch-scraper.0.8.0.tar
启动typesense/typesense:0.25.1mkdir -p /tmp/typesense/typesense-data
docker run -d -p 8108:8108 -v/tmp/typesense/typesense-data:/data typesense/typesense:0.25.1 --data-dir /data --api-key=xyz --enable-cors
安装并使用Docusaurus
npx create-docusaurus@latest my-website1 classic
cd my-website1
npm install docusaurus-theme-search-typesense@next --save --legacy-peer-deps
参考docusaurus.config.js修改你的docusaurus.config.js文件,最小修改结果如下:
// @ts-check
// Note: type annotations allow type checking and IDEs autocompletion
const lightCodeTheme = require('prism-react-renderer/themes/github');
const darkCodeTheme = require('prism-react-renderer/themes/dracula');
/** @type {import('@docusaurus/types').Config} */
const config = {
title: 'My Site',
tagline: 'Dinosaurs are cool',
favicon: 'img/favicon.ico',
// Set the production url of your site here
url: 'http://172.28.0.1',
// Set the /<baseUrl>/ pathname under which your site is served
// For GitHub pages deployment, it is often '/<projectName>/'
baseUrl: '/',
// GitHub pages deployment config.
// If you aren't using GitHub pages, you don't need these.
organizationName: 'facebook', // Usually your GitHub org/user name.
projectName: 'docusaurus', // Usually your repo name.
onBrokenLinks: 'throw',
onBrokenMarkdownLinks: 'warn',
// Even if you don't use internalization, you can use this field to set useful
// metadata like html lang. For example, if your site is Chinese, you may want
// to replace "en" with "zh-Hans".
i18n: {
defaultLocale: 'en',
locales: ['en'],
},
themes: ['docusaurus-theme-search-typesense'],
presets: [
[
'classic',
/** @type {import('@docusaurus/preset-classic').Options} */
({
docs: {
sidebarPath: require.resolve('./sidebars.js'),
// Please change this to your repo.
// Remove this to remove the "edit this page" links.
editUrl:
'https://github.com/facebook/docusaurus/tree/main/packages/create-docusaurus/templates/shared/',
},
blog: {
showReadingTime: true,
// Please change this to your repo.
// Remove this to remove the "edit this page" links.
editUrl:
'https://github.com/facebook/docusaurus/tree/main/packages/create-docusaurus/templates/shared/',
},
theme: {
customCss: require.resolve('./src/css/custom.css'),
},
}),
],
],
themeConfig:
/** @type {import('@docusaurus/preset-classic').ThemeConfig} */
({
// Replace with your project's social card
image: 'img/docusaurus-social-card.jpg',
navbar: {
title: 'My Site',
logo: {
alt: 'My Site Logo',
src: 'img/logo.svg',
},
items: [
{
type: 'docSidebar',
sidebarId: 'tutorialSidebar',
position: 'left',
label: 'Tutorial',
},
{to: '/blog', label: 'Blog', position: 'left'},
{
href: 'https://github.com/facebook/docusaurus',
label: 'GitHub',
position: 'right',
},
],
},
footer: {
style: 'dark',
links: [
{
title: 'Docs',
items: [
{
label: 'Tutorial',
to: '/docs/intro',
},
],
},
{
title: 'Community',
items: [
{
label: 'Stack Overflow',
href: 'https://stackoverflow.com/questions/tagged/docusaurus',
},
{
label: 'Discord',
href: 'https://discordapp.com/invite/docusaurus',
},
{
label: 'Twitter',
href: 'https://twitter.com/docusaurus',
},
],
},
{
title: 'More',
items: [
{
label: 'Blog',
to: '/blog',
},
{
label: 'GitHub',
href: 'https://github.com/facebook/docusaurus',
},
],
},
],
copyright: `Copyright © ${new Date().getFullYear()} My Project, Inc. Built with Docusaurus.`,
},
prism: {
theme: lightCodeTheme,
darkTheme: darkCodeTheme,
},
typesense: {
// Replace this with the name of your index/collection.
// It should match the "index_name" entry in the scraper's "config.json" file.
typesenseCollectionName: 'docusaurus-2',
typesenseServerConfig: {
nodes: [
{
host: '172.28.12.248',
port: 8108,
protocol: 'http',
},
],
apiKey: 'xyz',
},
// Optional: Typesense search parameters: https://typesense.org/docs/0.24.0/api/search.html#search-parameters
typesenseSearchParameters: {},
// Optional
contextualSearch: true,
},
}),
};
module.exports = config;
新建中文文档/docs/中文文档.md
待测试用
## 中文文档测试
如果你是在现有的项目中使用 Docusaurus 的话,单一仓库(monorepo)模式可能更适合你。单一仓库模式(Monorepos)能让你在多个类似项目之间共享依赖。例如,你的网站可能需要使用本地的软件包来展示最新的功能,而不是依赖已发布的版本。并且,你的项目的贡献者也可以在实现某些功能时方便地更新文档。一个单一仓库(monorepo)的文件夹的结构如下:
启动Docusaurus:npm run start
访问网址进行预览:http://localhost:3000/
中文文档地址为:http://localhost:3000/docs/中文文档
注意:
此时,只能在本地预览,不能在局域网预览。假如本机地址是172.28.0.1,如果想让局域网访问,则可修改package.json
文件,将"start": "docusaurus start",
修改为"start": "docusaurus start --host 0.0.0.0",
,此时,可通过http://172.28.0.1:3000
进行访问,局域网也将正常访问。
打包并放入nginx(nginx的部署及配置略)npm run build
访问http://172.28.0.1
进行验证
部署typesense/docsearch-scraper
需要先安装jq,上传jq-linux-amd64至/tmp目录下
[root@localhost tmp]# chmod +x jq-linux-amd64
[root@localhost tmp]# mv jq-linux-amd64 /usr/bin/jq
部署docsearch-scraper需要docusaurus网站的网址,这里是http://172.28.0.1
- 新建文件
/root/typesense/docusaurus-2.json
{
"index_name": "docusaurus-2",
"start_urls": [
"http://172.28.0.1"
],
"sitemap_urls": [
"http://172.28.0.1/sitemap.xml"
],
"sitemap_alternate_links": true,
"stop_urls": [
"/tests"
],
"selectors": {
"lvl0": {
"selector": "(//ul[contains(@class,'menu__list')]//a[contains(@class, 'menu__link menu__link--sublist menu__link--active')]/text() | //nav[contains(@class, 'navbar')]//a[contains(@class, 'navbar__link--active')]/text())[last()]",
"type": "xpath",
"global": true,
"default_value": "Documentation"
},
"lvl1": "header h1",
"lvl2": "article h2",
"lvl3": "article h3",
"lvl4": "article h4",
"lvl5": "article h5, article td:first-child",
"lvl6": "article h6",
"text": "article p, article li, article td:last-child"
},
"strip_chars": " .,;:#",
"custom_settings": {
"separatorsToIndex": "_",
"attributesForFaceting": [
"language",
"version",
"type",
"docusaurus_tag"
],
"attributesToRetrieve": [
"hierarchy",
"content",
"anchor",
"url",
"url_without_anchor",
"type"
]
},
"conversation_id": [
"833762294"
],
"nb_hits": 46250
}
- 新建文件
/root/typesense/typesense.env
TYPESENSE_API_KEY=xyz
TYPESENSE_HOST=172.28.12.248
TYPESENSE_PORT=8108
TYPESENSE_PROTOCOL=http
通过docker运行docsearch-scraper
cd /root/typesense/
docker run -it --env-file=/root/typesense/typesense.env -e "CONFIG=$(cat docusaurus-2.json | jq -r tostring)" typesense/docsearch-scraper:0.8.0
此时可测试搜索功能,发现搜索没问题,但是搜索中文,比如“共享”,发现搜索不到
实现中文全文搜索
这里采用Ansj 中文分词实现中文全文搜索,适合文档较少的场景。
import cn.hutool.core.io.resource.ClassPathResource;
import cn.hutool.core.util.StrUtil;
import cn.hutool.json.JSONObject;
import cn.hutool.json.JSONUtil;
import org.ansj.library.DicLibrary;
import org.ansj.splitWord.analysis.DicAnalysis;
import org.ansj.util.MyStaticValue;
import org.typesense.api.Client;
import org.typesense.api.Configuration;
import org.typesense.resources.Node;
import java.time.Duration;
import java.util.*;
/**
* @ClassName Main
* @Author
* @Date 2023-10-16 09:34
* @Description:
**/
public class Main {
// static {
// MyStaticValue.ENV.put(DicLibrary.DEFAULT, "library/default.dic");
// }
public static void main(String[] args) throws Exception {
updateTypesense();
}
private static void updateTypesense() throws Exception {
List<Node> nodes = new ArrayList<>();
nodes.add(new Node("http", "172.28.12.248", "8108"));
Configuration configuration = new Configuration(nodes, Duration.ofSeconds(2), "xyz");
Client client = new Client(configuration);
String exportStr=client.collections("docusaurus-2").documents().export();
for (String docStr: StrUtil.split(exportStr,"\n")
) {
JSONObject jsonObject= JSONUtil.parseObj(docStr);
String content=jsonObject.getStr("content");
if (StrUtil.isBlank(content)){
continue;
}
Map<String,Object> map=new HashMap<>();
map.put("content",fenci(content));
client.collections("docusaurus-2").documents(jsonObject.getStr("id")).update(map);
}
}
private static String fenci(String str) {
return DicAnalysis.parse(str).toStringWithOutNature(" ");
}
}
执行之后,再次搜索“共享”,发现已经可以搜索到了。
注意
typesense/docsearch-scraper和中文分词都是一次性执行,后续可以用定时任务的方式定时执行,这里不再赘述。
更多推荐
所有评论(0)