python批量提取word文档中的图片（含图片格式转换和GUI）

您所在的位置：网站首页 › docx转换为doc格式 › python批量提取word文档中的图片（含图片格式转换和GUI）

python批量提取word文档中的图片（含图片格式转换和GUI）

2023-03-14 08:07| 来源: 网络整理| 查看: 265

作者：小小明

文章目录doc格式批量转为docx批量提取docx文档的图片批量图片格式转换完整代码GUI图形化工具开发打包exe给GUI加入进度条

日常工作中，领导要求你将一份 Word 文档中的图片存储到一个文件夹内，你可能会一边内心崩溃，一边开始一张张的

另存为。

但假如领导要求你将几百个word文档中的图片全部都拷贝出来，你是不是打算离职不干了？

就比如下面这些word文档中的图片，你能否快速的把所有图片都拷贝出来呢？

python批量提取word文档中的图片（含图片格式转换和GUI）_ico

即使你知道可以把word文档用压缩文件打开，逐个解压的话依然会耗时较长时间，另外里面掺杂了doc格式的word文档，你还需将这些03版本的word文档另存为docx格式。

但对于我小小明来说，这种东西都不是什么难事，写个程序，十秒内全部给你转换完毕，并把图片都提取出来，还能批量从真实修改图片格式，而不是简单的修改一下扩展名。

最终效果展示（文末附带exe可执行程序）：

python批量提取word文档中的图片（含图片格式转换和GUI）_ico_02

下面看看我的代码吧：

doc格式批量转为docx

python提供了win32com模块，其中的SaveAs方法可以代替人手批量将文件另存为我们需要的格式。

win32com包含在pypiwin32模块中，只需安装pypiwin32模块即可：

pip install pypiwin32

下面的代码将指定目录下的doc文件转换为docx格式，并放在该目录的temp_dir下面：

from win32com import client as wc # 导入模块from pathlib import Pathimport osimport shutil

doc_path = r"E:\tmp\答疑整理"temp_dir = "temp"if os.path.exists(f"{doc_path}/{temp_dir}"): shutil.rmtree(f"{doc_path}/{temp_dir}")os.mkdir(f"{doc_path}/{temp_dir}")

word = wc.Dispatch("Word.Application") # 打开word应用程序try: for filename in Path(doc_path).glob("*.doc"): file = str(filename) dest_name = str(filename.parent/f"{temp_dir}"/str(filename.name))+"x" print(file, dest_name) doc = word.Documents.Open(file) # 打开word文件 doc.SaveAs(dest_name, 12) # 另存为后缀为".docx"的文件，其中参数12指docx文件finally: word.Quit()

运行结果：

E:\tmp\答疑整理\1.晚分享答疑.doc E:\tmp\答疑整理\temp\1.晚分享答疑.docxE:\tmp\答疑整理\10-17问题答疑.doc E:\tmp\答疑整理\temp\10-17问题答疑.docxE:\tmp\答疑整理\10-20答疑.doc E:\tmp\答疑整理\temp\10-20答疑.docxE:\tmp\答疑整理\10月10日晚分享提问与答疑.doc E:\tmp\答疑整理\temp\10月10日晚分享提问与答疑.docxE:\tmp\答疑整理\10月11日问题收集.doc E:\tmp\答疑整理\temp\10月11日问题收集.docxE:\tmp\答疑整理\10月14答疑最新版.doc E:\tmp\答疑整理\temp\10月14答疑最新版.docxE:\tmp\答疑整理\11月26、27日答疑汇总.doc E:\tmp\答疑整理\temp\11月26、27日答疑汇总.docxE:\tmp\答疑整理\2018.10.15提问与答疑.doc E:\tmp\答疑整理\temp\2018.10.15提问与答疑.docxE:\tmp\答疑整理\2018.10.18提问与答疑.doc E:\tmp\答疑整理\temp\2018.10.18提问与答疑.docxE:\tmp\答疑整理\2018.10.9提问与答疑最终版.doc E:\tmp\答疑整理\temp\2018.10.9提问与答疑最终版.docxE:\tmp\答疑整理\2018.12.01答疑.doc E:\tmp\答疑整理\temp\2018.12.01答疑.docxE:\tmp\答疑整理\2018.12.02答疑(1).doc E:\tmp\答疑整理\temp\2018.12.02答疑(1).docxE:\tmp\答疑整理\2018.12.02答疑.doc E:\tmp\答疑整理\temp\2018.12.02答疑.docxE:\tmp\答疑整理\20181204答疑.doc E:\tmp\答疑整理\temp\20181204答疑.docxE:\tmp\答疑整理\20181206答疑.doc E:\tmp\答疑整理\temp\20181206答疑.docxE:\tmp\答疑整理\20181208答疑.doc E:\tmp\答疑整理\temp\20181208答疑.docxE:\tmp\答疑整理\2018年11月29日答疑（请以此为准）.doc E:\tmp\答疑整理\temp\2018年11月29日答疑（请以此为准）.docxE:\tmp\答疑整理\7.10答疑整理.doc E:\tmp\答疑整理\temp\7.10答疑整理.docxE:\tmp\答疑整理\7.9答疑整理.doc E:\tmp\答疑整理\temp\7.9答疑整理.docx

转换得到的文件：

python批量提取word文档中的图片（含图片格式转换和GUI）_另存为_03

批量提取docx文档的图片

docx文档其实也是一个zip压缩包，所以我们可以通过zip包解压它，下面的代码将解压每个docx文档中的图片，我将其移动到临时目录下的imgs目录下：

import itertoolsfrom zipfile import ZipFileimport shutil

if os.path.exists(f"{doc_path}/{temp_dir}/imgs"): shutil.rmtree(f"{doc_path}/{temp_dir}/imgs")os.makedirs(f"{doc_path}/{temp_dir}/imgs")

i = 1for filename in itertools.chain(Path(doc_path).glob("*.docx"), (Path(doc_path)/temp_dir).glob("*.docx")): print(filename) with ZipFile(filename) as zip_file: for names in zip_file.namelist(): if names.startswith("word/media/image"): zip_file.extract(names, doc_path) os.rename(f"{doc_path}/{names}", f"{doc_path}/{temp_dir}/imgs/{i}{names[names.find('.'):]}") print("\t", names, f"{i}{names[names.find('.'):]}") i += 1shutil.rmtree(f"{doc_path}/word")

打印结果：

E:\tmp\答疑整理\10.16答疑汇总.docxE:\tmp\答疑整理\10.19答疑汇总.docxE:\tmp\答疑整理\12月7号答疑稿.docx word/media/image5.jpeg 1.jpeg word/media/image4.png 2.png word/media/image3.jpeg 3.jpeg word/media/image2.gif 4.gif word/media/image1.gif 5.gif word/media/image6.gif 6.gifE:\tmp\答疑整理\2018.10.12提问与答疑.docx word/media/image1.jpeg 7.jpeg word/media/image2.png 8.png word/media/image3.png 9.png.....E:\tmp\答疑整理\temp\7.9答疑整理.docx word/media/image1.jpeg 162.jpeg word/media/image2.jpeg 163.jpeg word/media/image3.png 164.png word/media/image4.png 165.png word/media/image5.jpeg 166.jpeg

提取结果：

python批量提取word文档中的图片（含图片格式转换和GUI）_python_04

批量图片格式转换

PIL：Python Imaging Library，已经是Python平台事实上的图像处理标准库了。PIL功能非常强大，但API却非常简单易用。

由于PIL仅支持到Python 2.7，加上年久失修，于是一群志愿者在PIL的基础上创建了兼容的版本，名字叫Pillow，支持最新Python 3.x，又加入了许多新特性，因此，我们可以直接安装使用Pillow。

如果安装了Anaconda，Pillow就已经可用了。否则，需要在命令行下通过pip安装：

pip install pillow

直接修改文件扩展名并不能真实的修改图片格式，通过pillow库我们即可将图片批量真实的转换为jpg格式：

from PIL import Image

if not os.path.exists(f"{doc_path}/imgs"): os.mkdir(f"{doc_path}/imgs")

for filename in Path(f"{doc_path}/{temp_dir}/imgs").glob("*"): file = str(filename) with Image.open(file) as im: im.convert('RGB').save( f"{doc_path}/imgs/{filename.name[:filename.name.find('.')]}.jpg", 'jpeg')

转换后：

python批量提取word文档中的图片（含图片格式转换和GUI）_python_05

完整代码#!/usr/bin/env python3# -*- coding: utf-8 -*-# 创建时间：2020/12/25 21:46__author__ = 'xiaoxiaoming'

import itertoolsimport osimport shutilfrom pathlib import Pathfrom zipfile import ZipFile

from PIL import Imagefrom win32com import client as wc # 导入模块

def word_img_extract(doc_path, temp_dir): if os.path.exists(f"{doc_path}/{temp_dir}"): shutil.rmtree(f"{doc_path}/{temp_dir}") os.mkdir(f"{doc_path}/{temp_dir}")

word = wc.Dispatch("Word.Application") # 打开word应用程序 try: for filename in Path(doc_path).glob("*.doc"): file = str(filename) dest_name = str(filename.parent / f"{temp_dir}" / str(filename.name)) + "x" print(file, dest_name) doc = word.Documents.Open(file) # 打开word文件 doc.SaveAs(dest_name, 12) # 另存为后缀为".docx"的文件，其中参数12指docx文件 finally: word.Quit()

if os.path.exists(f"{doc_path}/{temp_dir}/imgs"): shutil.rmtree(f"{doc_path}/{temp_dir}/imgs") os.makedirs(f"{doc_path}/{temp_dir}/imgs")

i = 1 for filename in itertools.chain(Path(doc_path).glob("*.docx"), (Path(doc_path) / temp_dir).glob("*.docx")): print(filename) with ZipFile(filename) as zip_file: for names in zip_file.namelist(): if names.startswith("word/media/image"): zip_file.extract(names, doc_path) os.rename(f"{doc_path}/{names}", f"{doc_path}/{temp_dir}/imgs/{i}{names[names.find('.'):]}") print("\t", names, f"{i}{names[names.find('.'):]}") i += 1 shutil.rmtree(f"{doc_path}/word")

if not os.path.exists(f"{doc_path}/imgs"): os.mkdir(f"{doc_path}/imgs")

if __name__ == '__main__': doc_path = r"E:\tmp\答疑整理" temp_dir = "temp" word_img_extract(doc_path, temp_dir)

最终全部执行完成耗时7s：

python批量提取word文档中的图片（含图片格式转换和GUI）_另存为_06

GUI图形化工具开发

下面使用PySimpleGUI开发一个图形化工具，使用以下命令安装该库：

pip install PySimpleGUI

如果是下载速度慢的可以用下面的清华镜像地址下载：

pip install PySimpleGUI -i https://pypi.tuna.tsinghua.edu.cn/simple

以下是完整代码：

#!/usr/bin/env python3# -*- coding: utf-8 -*-# 创建时间：2020/12/25 12:03__author__ = 'xiaoxiaoming'

import PySimpleGUI as sg

from word_img_extract import word_img_extract

sg.change_look_and_feel("GreenMono")

layout = [ [ sg.Text("请输入word文档所在的目录："), sg.In(size=(25, 1), enable_events=True, key="-FOLDER-"), sg.FolderBrowse('浏览'), ], [ sg.Button('开始抽取', enable_events=True, key="抽取"), sg.Text(size=(40, 1), key="-TOUT-") ]]window = sg.Window('word文档图片抽取系统', layout)while True: event, values = window.read() if event in (None,): break # 相当于关闭界面 elif event == "抽取": if values["-FOLDER-"]: window["-TOUT-"].update("准备抽取！！！") sg.popup('抽取期间程序将处于假死状态，请稍等片刻，提取完成后会弹出提示！！！\n点击ok后开始抽取！！！') window["-TOUT-"].update("正在抽取中...") word_img_extract(values["-FOLDER-"]) window["-TOUT-"].update("抽取完毕！！！") sg.popup('抽取完毕！！！') else: sg.popup('请先输入word文档所在的路径！！！') print(f'Event: {event}， values: {values}')window.close()

运行效果：

python批量提取word文档中的图片（含图片格式转换和GUI）_ico_07

打包exe

创建并激活虚拟环境：

conda create -n gui python=3.6conda activate gui

注意：创建虚拟环境和激活环境并不是必须，只是为了精简环境，可以跳过

安装打包所用的包：

pip install PySimpleGUIpip install pillowpip install pywin32pip install pyinstaller

python批量提取word文档中的图片（含图片格式转换和GUI）_另存为_08

执行以下命令进行打包：

pyinstaller -F --icon=C:\Users\Think\Pictures\ico\ooopic_1467046829.ico word_img_extract_GUI.py

常用参数说明：

-F 表示生成单个可执行文件-w 表示去掉控制台窗口，这在GUI界面时非常有用。不过如果是命令行程序的话那就把这个选项删除吧！-p 表示你自己自定义需要加载的类路径，一般情况下用不到-i 表示可执行文件的图标

打包结果：

python批量提取word文档中的图片（含图片格式转换和GUI）_python_09

带上-w参数打包，可以去掉控制台：

pyinstaller -wF --icon=C:\Users\Think\Pictures\ico\ooopic_1467046829.ico word_img_extract_GUI.py给GUI加入进度条

改造处理程序，借助协程反馈程序的处理进度，完整代码如下：

#!/usr/bin/env python3# -*- coding: utf-8 -*-# 创建时间：2020/12/25 21:46__author__ = 'xiaoxiaoming'

import itertoolsimport osimport shutilfrom pathlib import Pathfrom zipfile import ZipFile

from PIL import Imagefrom win32com import client as wc # 导入模块

def word_img_extract(doc_path, temp_dir="temp"): if os.path.exists(f"{doc_path}/{temp_dir}"): shutil.rmtree(f"{doc_path}/{temp_dir}") os.mkdir(f"{doc_path}/{temp_dir}")

word = wc.Dispatch("Word.Application") # 打开word应用程序 try: files = list(Path(doc_path).glob("*.doc")) if len(files) == 0: raise Exception("当前目录中没有word文档") for i, filename in enumerate(files, 1): file = str(filename) dest_name = str(filename.parent / f"{temp_dir}" / str(filename.name)) + "x" # print(file, dest_name) doc = word.Documents.Open(file) # 打开word文件 doc.SaveAs(dest_name, 12) # 另存为后缀为".docx"的文件，其中参数12指docx文件 yield "word doc格式转docx格式：", i * 1000 // len(files) finally: word.Quit()

if os.path.exists(f"{doc_path}/{temp_dir}/imgs"): shutil.rmtree(f"{doc_path}/{temp_dir}/imgs") os.makedirs(f"{doc_path}/{temp_dir}/imgs")

i = 1 files = list(itertools.chain(Path(doc_path).glob("*.docx"), (Path(doc_path) / temp_dir).glob("*.docx"))) for j, filename in enumerate(files, 1): # print(filename) with ZipFile(filename) as zip_file: for names in zip_file.namelist(): if names.startswith("word/media/image"): zip_file.extract(names, doc_path) os.rename(f"{doc_path}/{names}", f"{doc_path}/{temp_dir}/imgs/{i}{names[names.find('.'):]}") # print("\t", names, f"{i}{names[names.find('.'):]}") i += 1 yield "word提取图片：", j * 1000 // len(files) shutil.rmtree(f"{doc_path}/word")

if not os.path.exists(f"{doc_path}/imgs"): os.mkdir(f"{doc_path}/imgs")

files = list(Path(f"{doc_path}/{temp_dir}/imgs").glob("*")) for i, filename in enumerate(files, 1): file = str(filename) with Image.open(file) as im: im.convert('RGB').save( f"{doc_path}/imgs/{filename.name[:filename.name.find('.')]}.jpg", 'jpeg') yield "图片转换为jpg格式：", i * 1000 // len(files)

if __name__ == '__main__': doc_path = r"E:\tmp\答疑整理" for msg, i in word_img_extract(doc_path): print(f"\r {msg}{i}", end="")

GUI程序的最终完整代码：

#!/usr/bin/env python3# -*- coding: utf-8 -*-# 创建时间：2020/12/25 12:03__author__ = 'xiaoxiaoming'

import PySimpleGUI as sg

from word_img_extract import word_img_extract

sg.change_look_and_feel("GreenMono")

layout = [ [ sg.Text("请输入word文档所在的目录："), sg.In(size=(25, 1), enable_events=True, key="-FOLDER-"), sg.FolderBrowse('浏览'), ], [ sg.Button('开始抽取', enable_events=True, key="抽取"), sg.Text(text_color="red", size=(47, 2), key="error"), ], [ sg.Text("准备：", size=(20, 1), key="-TOUT-"), sg.ProgressBar(1000, orientation='h', size=(35, 20), key='progressbar') ]]window = sg.Window('word文档图片抽取系统', layout)while True: event, values = window.read() if event in (None,): break # 相当于关闭界面 elif event == "抽取": if values["-FOLDER-"]: window["error"].update("") try: for msg, i in word_img_extract(values["-FOLDER-"]): window["-TOUT-"].update(msg) window['progressbar'].UpdateBar(i) window["-TOUT-"].update('抽取完毕！！！') except Exception as e: window["error"].update(str(e)) else: sg.popup('请先输入word文档所在的路径！！！')window.close()

重新打包：

pyinstaller -wF --icon=C:\Users\Think\Pictures\ico\ooopic_1467046829.ico word_img_extract_GUI.py

运行效果：

python批量提取word文档中的图片（含图片格式转换和GUI）_python_10

【本文地址】

python批量提取word文档中的图片（含图片格式转换和GUI）

python批量提取word文档中的图片（含图片格式转换和GUI）

今日新闻

推荐新闻