从PPT文档从获取图片
虽然可以打开ppt文档,然后从里面一个个拷贝出图片,但是还是觉得太累,这种典型的重复劳动工作就应该交给程序去干。所以写了下面这个脚本,python语言的。
程序核心那块(图片获取)不是我写的,网络上有相关说明,其实我只是熟悉一下python而已。
下面是代码
#!/usr/bin/env python
import sys,os,string
“”"
extract pictures from PowerPoint files.
the picture format includes jpeg,gif and png
mlsx.xplore(at)gmail.com
http://mlsx.xplore.cn
“”"
headers=[(“JFIF”, 6, “jpg”), (“GIF”, 0, “gif”), (“PNG”, 1, “png”)]
marker=[]
def usage():
print “Usage: ” + os.path.split(sys.argv[0])[1] + ”
print “\t\tfile must be powerpoint format at present!”
def getpic(filename=”",prefix=”img”):
try:
fid = open(filename, 'rb')
except:
sys.exit(1)
numlin = len(fid.readlines())
fid .seek(0)
i = 0; s = 0
curlin = fid.readline()
while i < numlin:
for flag,offset, ext in headers:
index = string.find(curlin,flag)
if index < 0:
continue
else:
pos = s + index -offset
marker.append((pos, ext))
s = s + len(curlin)
curlin = fid.readline()
i += 1
fid.seek(0)
j = len(marker)
imgnum = 0
if j == 0:
print "No images included in the document"
sys.exit(1)
for i in range(0, j):
if i == j-1:
info = marker[i]
thispos = info[0]
thisext = info[1]
nextpos = s
gap = nextpos - thispos
fid.seek(thispos)
data = fid.read(gap)
imgname = "%s%02d.%s" % (prefix,i, thisext)
fid1 = open(imgname, 'wb')
fid1.write(data)
fid1.close()
imgnum += 1
else:
info = marker[i]
thispos = info[0]
thisext = info[1]
nextinfo = marker[i+1]
nextpos = nextinfo[0]
gap = nextpos - thispos
fid.seek(thispos)
data = fid.read(gap)
imgname = "%s%02d.%s" % (prefix,i, thisext)
fid1 = open(imgname, 'wb')
fid1.write(data)
fid1.close()
imgnum += 1
fid.close()
print "%02d imgaes have been extracted from file %s\n" % (imgnum ,filename)
if __name__ == "__main__":
if len(sys.argv[1:]) >0:
filelist=sys.argv[1:]
while filelist:
filename=filelist.pop()
prefix=os.path.splitext(os.path.split(filename)[1])[0] #get the filename prefix
getpic(filename,prefix)
else:
usage()
sys.exit(1)
使用方法很简单,假定文件名是getpic.py,那么可以这样:
./getpic.py /tmp/ha.ppt /tmp/dc5.ppt
然后他会在当前目录下保存一文件名(去掉后缀)为前缀的图片。没有涉及如何指定保存目录的参数了。
我测试了几个文档,保存出来的图片能够正常使用的比例大概是90%,我觉得不错,呵呵。
原创文章,转载请注明: 转载自Linux|系统管理|WEB开发
本文链接地址: 从PPT文档从获取图片




兄弟,玩大了吧。 用WPS超级好解决,永中OFFICE也可以直接把PPT里面的图片另存为JPG
他们是一次性把所有的图片都保存出来吗?
我测试的时候似乎是一张一张保存的