Python实现从MP4文件中的语音提取文本
语音转文本,这个技术也比较成熟了,我这边使用了stt中的models--Systran--faster-whisper-medium模型,准确率不错。输出的json文本中,带文本对应视频的开始时间和结束时间,时间精度为毫秒。忽略stt具体使用,如果有需要,就下载代码。有不清楚的地方,欢迎留言讨论。1.从mp4文件中获取语音。
·
从MP4文件中的语音提取文本,主要有2个核心逻辑:
1.从mp4文件中获取语音
2.语音转文本
从mp4文件中获取语音的python实现如下:
import moviepy.editor as mp
def convert_mp4_to_wav(input_file, output_file):
video = mp.VideoFileClip(input_file)
video.audio.write_audiofile(output_file)
if __name__ == '__main__':
file_name="1.mp4"
convert_mp4_to_wav(file_name,'1.wav')
语音转文本,这个技术也比较成熟了,我这边使用了stt中的models--Systran--faster-whisper-medium模型,准确率不错。
忽略stt具体使用,如果有需要,就下载代码。
这里说一下,怎么调用stt并输出文本,其代码如下:
def mp4totxt(mp4FileName):
#添加日志
logging.basicConfig(filename='mp4totxt.log', level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logging.info("--开始----"+mp4FileName)
wavFileName=mp4FileName.lower().replace('.mp4','.wav')
#mp4转wav
convert_mp4_to_wav(mp4FileName,wavFileName)
url = "http://127.0.0.1:9971/api"
# 请求参数 file:音视频文件,language:语言代码,model:模型,response_format:text|json|srt
# 返回 code==0 成功,其他失败,msg==成功为ok,其他失败原因,data=识别后返回文字
files = {"file": open(wavFileName, "rb")}
data={"language":"zh","model":"medium","response_format":"json"}
response = requests.request("POST", url, timeout=6000, data=data,files=files)
#print(response.json())
logging.info("--解析----"+str(response.json()))
localFileJSON=wavFileName.lower().replace('.wav','.json')
with open(localFileJSON, 'w') as jsonfile:
jsonfile.write(str(response.json()).replace('\'','\"')) # 将内容写入文件
logging.info("--结束----"+mp4FileName)
输出的文本json格式如下:
{"code": 0, "data": [{"end_time": "00:00:02,600", "line": 1, "start_time": "00:00:00,000", "text": "一分部署 九分落实"}, {"end_time": "00:00:06,400", "line": 2, "start_time": "00:00:02,600", "text": "聚焦16个县市区 开发区 保税区"}, {"end_time": "00:00:08,600", "line": 3, "start_time": "00:00:06,400", "text": "在装落实 求突破"}, {"end_time": "00:00:12,800", "line": 4, "start_time": "00:00:08,600", "text": "拼经济上的新思路 新作为 新使命"}, {"end_time": "00:00:16,400", "line": 5, "start_time": "00:00:12,800", "text": "展现全市各发展主体 努力建设"}, {"end_time": "00:00:18,200", "line": 6, "start_time": "00:00:16,400", "text": "实力强 品质优"}, {"end_time": "00:00:21,400", "line": 7, "start_time": "00:00:18,200", "text": "生活美的 更好为方的澎湃力量"}, {"end_time": "00:00:25,400", "line": 8, "start_time": "00:00:21,400", "text": "为方市广播电视台 大型柔媒体访谈节目"}, {"end_time": "00:00:29,600", "line": 9, "start_time": "00:00:25,400", "text": "落实进行时 已先后制作播出了六期节目"}, {"end_time": "00:00:33,600", "line": 10, "start_time": "00:00:29,600", "text": "在全市广大干部群众中引起强烈反响"}, {"end_time": "00:00:37,360", "line": 11, "start_time": "00:00:33,600", "text": "全市上下掀起了围绕七个加利突破"}, {"end_time": "00:00:40,960", "line": 12, "start_time": "00:00:37,360", "text": "比学赶超 争先进位的更大热潮"}, {"end_time": "00:00:43,960", "line": 13, "start_time": "00:00:40,960", "text": "落实进行时 重点抓落实"}, {"end_time": "00:00:47,560", "line": 14, "start_time": "00:00:43,960", "text": "第七期将播出特别节目 落实回头看"}, {"end_time": "00:00:51,400", "line": 15, "start_time": "00:00:47,560", "text": "对相关现实区主要工作推进落实情况"}, {"end_time": "00:00:53,160", "line": 16, "start_time": "00:00:51,400", "text": "进行调查采访"}, {"end_time": "00:00:55,200", "line": 17, "start_time": "00:00:53,160", "text": "进一步激发全市上下"}, {"end_time": "00:00:58,960", "line": 18, "start_time": "00:00:55,200", "text": "真抓实干 求突破的更大热情"}], "msg": "ok"}
输出的json文本中,带文本对应视频的开始时间和结束时间,时间精度为毫秒。
完整代码下载地址 Python实现从MP4文件中的语音提取文本
有不清楚的地方,欢迎留言讨论。
更多推荐
所有评论(0)