UE5 对接 语音识别 通用解决方案
UE5 语音识别通用解决方案
·
就在前两天,公司提出需求,要求完成一个UE5语音识别的业务。
一开始的思路是用UE5 c++ 录音完成之后直接获取文件对象。所有有了以下写法(省略录音部分),
// 获取音频通道数和采样率
float Channels = this->AudioCapture->GetNumChannels();
float SampleRate = this->AudioCapture->GetSampleRate();
Audio::FSampleBuffer SampleBuffer(this->AudioBuffer.GetData(), this->AudioBuffer.Num(), Channels, SampleRate);
Audio::FSoundWavePCMWriter Writer;
FString FilePath = FPaths::ProjectSavedDir();
UE_LOG(LogTemp, Warning, TEXT("FilePath: %s"), *FilePath)
Writer.BeginWriteToWavFile(SampleBuffer, "CapturedAudio", FilePath, []()
{
UE_LOG(LogTemp, Log, TEXT("SaveComplate"))
});
Writer.SaveFinishedSoundWaveToPath(FilePath + "CapturedAudio");
希望在SaveComplate处保存文件,再读取出来。这个方案有很大缺点。首先,有两次IO的操作。其次,并不是每次都能触发SaveComplate的事件。
在深思熟虑之下,我采用了另一种解决方案。
// 获取音频通道数和采样率
float Channels = this->AudioCapture->GetNumChannels();
float SampleRate = this->AudioCapture->GetSampleRate();
TArray<uint8> ByteData;
const int32 DataSize = AudioBuffer.Num() * sizeof(float);
ByteData.Append(reinterpret_cast<const uint8*>(AudioBuffer.GetData()), DataSize);
UploadUrl += "?sampleRate=" + FString::FromInt(SampleRate);
UploadUrl += "&numChannels=" + FString::FromInt(Channels);
// 创建 HTTP 请求
TSharedRef<IHttpRequest, ESPMode::ThreadSafe> HttpRequest = FHttpModule::Get().CreateRequest();
HttpRequest->SetURL(UploadUrl);
HttpRequest->SetVerb(TEXT("POST"));
HttpRequest->SetHeader(TEXT("Content-Type"), TEXT("application/octet-stream"));
// 设置请求内容为二进制数据
HttpRequest->SetContent(ByteData);
// 设置回调函数
HttpRequest->OnProcessRequestComplete().BindLambda(
[this,UploadComplate](FHttpRequestPtr Request, FHttpResponsePtr Response, bool bSuccess)
{
if (bSuccess && Response.IsValid())
{
UE_LOG(LogTemp, Log, TEXT("Response: %s"), *Response->GetContentAsString());
this->AudioBuffer.Reset();
UploadComplate.Execute(Response -> GetContentAsString());
}
else
{
UE_LOG(LogTemp, Error, TEXT("HTTP Request failed"));
}
});
// 发送请求
HttpRequest->ProcessRequest();
直接将录音得到的TArray<float>数组发送给后端,再让后端组成文件。相比于C++,我更加熟悉JAVA的API。
JAVA的实现如下:
@PostMapping("/speechRecognition")
public String speechRecognition(@RequestBody byte[] audioData, @RequestParam int sampleRate, @RequestParam int numChannels) throws Exception {
int bitsPerSample = 32; // 32位浮点
File file = WavConverter.convertToWav(
audioData,
numChannels,
sampleRate,
bitsPerSample
);
String res = speechRecognitionService.speechSynthesis(file);
// 定义正则表达式(匹配所有汉字)
Pattern pattern = Pattern.compile("\\\\\"w\\\\\":\\\\\"([\\u4e00-\\u9fa5]+)\\\\\"");
Matcher matcher = pattern.matcher(res);
// 提取结果
List<String> chineseList = new ArrayList<>();
while (matcher.find()) {
chineseList.add(matcher.group(1));
}
// 合并结果
return String.join("", chineseList);
}
package com.ruoyi.ai.utils;
import java.io.File;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.charset.StandardCharsets;
public class WavConverter {
/**
* 将原始音频字节流转换为WAV文件
*
* @param audioData 原始音频数据(小端字节序)
* @param numChannels 声道数(1=单声道,2=立体声)
* @param sampleRate 采样率(如44100)
* @param bitsPerSample 位深度(32=浮点,16=PCM)
*/
public static File convertToWav(
byte[] audioData,
int numChannels,
int sampleRate,
int bitsPerSample
) throws IOException {
// 计算参数
int byteRate = sampleRate * numChannels * (bitsPerSample / 8);
int blockAlign = numChannels * (bitsPerSample / 8);
int dataSize = audioData.length;
int riffChunkSize = 36 + dataSize;
// 创建头缓冲区(44字节)
ByteBuffer header = ByteBuffer.allocate(44);
header.order(ByteOrder.LITTLE_ENDIAN);
// RIFF块
header.put("RIFF".getBytes(StandardCharsets.US_ASCII));
header.putInt(riffChunkSize);
header.put("WAVE".getBytes(StandardCharsets.US_ASCII));
// fmt子块
header.put("fmt ".getBytes(StandardCharsets.US_ASCII));
header.putInt(16); // 子块大小(16表示PCM/浮点标准格式)
header.putShort((short) (bitsPerSample == 32 ? 3 : 1)); // 格式代码
header.putShort((short) numChannels);
header.putInt(sampleRate);
header.putInt(byteRate);
header.putShort((short) blockAlign);
header.putShort((short) bitsPerSample);
// data子块
header.put("data".getBytes(StandardCharsets.US_ASCII));
header.putInt(dataSize);
// 合并头和音频数据
byte[] headerBytes = header.array();
byte[] wavBytes = new byte[headerBytes.length + audioData.length];
System.arraycopy(headerBytes, 0, wavBytes, 0, headerBytes.length);
System.arraycopy(audioData, 0, wavBytes, headerBytes.length, audioData.length);
return createTempWavFile(wavBytes);
}
public static File createTempWavFile(byte[] wavBytes) throws IOException {
// 创建临时文件(自动生成文件名,后缀为 .wav)
Path tempFilePath = Files.createTempFile("audio_", ".wav");
File tempFile = tempFilePath.toFile();
// 写入字节数据
Files.write(tempFilePath, wavBytes);
// 程序退出时删除临时文件(可选)
tempFile.deleteOnExit();
return tempFile;
}
}
这里语音识别使用的是讯飞的API。获取音频文件对象之后可以换成自己想要服务。
如果你对C++熟悉,第二步可以换成C++实现。
更多推荐
所有评论(0)