labelimg标注格式转labelme标注格式,并读取imageData信息

您所在的位置:网站首页 如何读取json中的数据文件内容格式 labelimg标注格式转labelme标注格式,并读取imageData信息

labelimg标注格式转labelme标注格式,并读取imageData信息

2023-12-27 04:01| 来源: 网络整理| 查看: 265

在用CenterNet模型训练自己的数据集时,发现需要coco数据集格式,即需要labelme标注得到的json文件,但由于我是使用labelimg进行标注,所以只有xml文件。

于是开始寻找脚本进行转换,但发现网上的都没有办法读取imageData信息,得到json文件如下。

{ "version": "3.16.2", "flags": {}, "shapes": [ { "label": "test class", "points": [ [ 631.0, 275.0 ], [ 714.0, 509.0 ] ], "group_id": null, "shape_type": "rectangle", "flags": {} } ], "imagePath": "000000000000.jpg", "imageData": null, "imageHeight": 800, "imageWidth": 800 }

imageData的值为null,于是我开始找labelme读取图片信息时是怎么读取imageData的。最后找到了这篇。

https://blog.csdn.net/nodototao/article/details/123800645?spm=1001.2101.3001.6650.2&utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7Edefault-2.pc_relevant_default&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7Edefault-2.pc_relevant_default&utm_relevant_index=5icon-default.png?t=M3K6https://blog.csdn.net/nodototao/article/details/123800645?spm=1001.2101.3001.6650.2&utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7Edefault-2.pc_relevant_default&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7Edefault-2.pc_relevant_default&utm_relevant_index=5然后我将代码改了改,变成下面的代码。

# --- utf-8 --- # --- function: 将Labeling标注的格式转化为Labelme标注格式,并读取imageData --- import os import glob import shutil import xml.etree.ElementTree as ET import json from base64 import b64encode from json import dumps def get(root, name): return root.findall(name) # 检查读取xml文件是否出错 def get_and_check(root, name, length): vars = root.findall(name) if len(vars) == 0: raise NotImplementedError('Can not fing %s in %s.' % (name, root.tag)) if length > 0 and len(vars) != length: raise NotImplementedError('The size of %s is supposed to be %d, but is %d.' % (name, length, len(vars))) if length == 1: vars = vars[0] return vars def convert(xml_file, json_file, save_dir, name, data): # 定义通过Labelme标注后生成的json文件 json_dict = {"version": "3.16.2", "flags": {}, "shapes": [], "imagePath": "", "imageData": None, "imageHeight": 0, "imageWidth": 0} # img_name = xml_file.split('.')[0] img_path = name + '.jpg' json_dict["imagePath"] = img_path tree = ET.parse(xml_file) # 读取xml文件 root = tree.getroot() size = get_and_check(root, 'size', 1) # 读取xml中size字段中的内容 # 读取二进制图片,获得原始字节码 with open(data, 'rb') as jpg_file: byte_content = jpg_file.read() # 把原始字节码编码成base64字节码 base64_bytes = b64encode(byte_content) # 把base64字节码解码成utf-8格式的字符串 base64_string = base64_bytes.decode('utf-8') # 用字典的形式保存数据 json_dict["imageData"] = base64_string # 获取图片的长宽信息 width = int(get_and_check(size, 'width', 1).text) height = int(get_and_check(size, 'height', 1).text) json_dict["imageHeight"] = height json_dict["imageWidth"] = width # 当标注中有多个目标时全部读取出来 for obj in get(root, 'object'): # 定义图片的标注信息 img_mark_inf = {"label": "", "points": [], "group_id": None, "shape_type": "rectangle", "flags": {}} category = get_and_check(obj, 'name', 1).text # 读取当前目标的类别 img_mark_inf["label"] = category bndbox = get_and_check(obj, 'bndbox', 1) # 获取标注宽信息 xmin = float(get_and_check(bndbox, 'xmin', 1).text) ymin = float(get_and_check(bndbox, 'ymin', 1).text) xmax = float(get_and_check(bndbox, 'xmax', 1).text) ymax = float(get_and_check(bndbox, 'ymax', 1).text) img_mark_inf["points"].append([xmin, ymin]) img_mark_inf["points"].append([xmax, ymax]) # print(img_mark_inf["points"]) json_dict["shapes"].append(img_mark_inf) # print("{}".format(json_dict)) save = save_dir + json_file # json文件的路径地址 json_fp = open(save, 'w') # json_str = json.dumps(json_dict, indent=4) # 缩进,不需要的可以将indent=4去掉 json_fp.write(json_str) # 保存 json_fp.close() # print("{}, {}".format(width, height)) def do_transformation(xml_dir, save_path): cnt = 0 for fname in os.listdir(xml_dir): name = fname.split(".")[0] # 获取图片名字 path = os.path.join(xml_dir, fname) # 文件路径 save_json_name = name + '.json' data = img + name + '.jpg' # xml文件对应的图片路径 convert(path, save_json_name, save_path, name, data) cnt += 1 if __name__ == '__main__': img = "D:/test/VOCdevkit/VOC2007/JPEGImages/" # xml对应图片文件夹 xml_path = "D:/test/VOCdevkit/VOC2007/Annotations" # xml文件夹 save_json_path = "D:/test/12345/" # 存放json文件夹 if not os.path.exists(save_json_path): os.makedirs(save_json_path) do_transformation(xml_path, save_json_path) # xml = "2007_000039.xml" # xjson = "2007_000039.json" # convert(xml, xjson)

最后就能将数据集在labelimg标注得到的xml文件转为labelme标注的json文件,且还读取到了imageData,大功告成。

测试图片 000000000000.jpg D:\test\000000000000.jpg Unknown 800 800 3 0 test class Unspecified 0 0 631 275 714 509 { "version": "3.16.2", "flags": {}, "shapes": [ { "label": "test class", "points": [ [ 631.0, 275.0 ], [ 714.0, 509.0 ] ], "group_id": null, "shape_type": "rectangle", "flags": {} } ], "imagePath": "000000000000.jpg", "imageData": "/9j/4AAQSkZJRgABAQAAAQABAAD/......", "imageHeight": 800, "imageWidth": 800 }

以上就是转换结果,imageData太长了就不在这显示了。

代码参考

https://blog.csdn.net/Xiao_ZhiJ/article/details/122918983icon-default.png?t=M3K6https://blog.csdn.net/Xiao_ZhiJ/article/details/122918983https://blog.csdn.net/nodototao/article/details/123800645?spm=1001.2101.3001.6650.2&utm_medium=distribute.pc_relevant.none-task-blog-2~default~CTRLIST~default-2.pc_relevant_default&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2~default~CTRLIST~default-2.pc_relevant_default&utm_relevant_index=5icon-default.png?t=M3K6https://blog.csdn.net/nodototao/article/details/123800645?spm=1001.2101.3001.6650.2&utm_medium=distribute.pc_relevant.none-task-blog-2~default~CTRLIST~default-2.pc_relevant_default&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2~default~CTRLIST~default-2.pc_relevant_default&utm_relevant_index=5



【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3