Pythonでimagededupを使用し画像内の重複する画像を見つけてみます。
今回は、imagededupライブラリを使用します。imagededupはPythonの標準ライブラリではありませんので、事前にインストールする必要があります。
■Python
今回のPythonのバージョンは、「3.6.8」を使用しています。(Windows10)(pythonランチャーでの確認)
■画像を用意する
imagededupを使用し画像内の重複する画像を見つけてみますので、その前に画像を用意します。
今回は高品質なフリー画像素材Pixabay(https://pixabay.com/ja/)の画像を利用します。画像をダウンロードし、「C:\Users\user_\image_test(フォルダパス)」内に保存します。画像の形式はjpgです。
■imagededupを使用し画像内の重複する画像を見つける
画像が用意できましたので、imagededupを使用し画像内の重複する画像を見つけるスクリプトを書いていきます。
■コード
from imagededup.methods import PHash from imagededup.utils import plot_duplicates phasher = PHash() if __name__ == '__main__': encodings = phasher.encode_images(image_dir=r"C:\Users\user_\image_test") duplicates = phasher.find_duplicates(encoding_map=encodings) print(plot_duplicates(image_dir=r"C:\Users\user_\image_test",duplicate_map=duplicates,filename="model-women.jpg"))
phasher変数を定義し、PHash()で知覚ハッシュ法の取り込みます。
取り込み後、if __name__ == ‘__main__’を用いて、モジュールを作成。encodings変数を定義し、encode_images()の括弧内で引数,パラメータとして「image_dir=r”C:\Users\user_\image_test”」を渡し、今回用意した画像が置かれている場所(ディレクトリ)内のすべての画像のエンコーディングを生成し、encodings変数に格納します。
格納後、duplicates変数を定義し、find_duplicates()の括弧内で引数,パラメータとして「encoding_map=encodings」を渡し、生成されたエンコーディングを使用し、重複している画像を探します。
その後、plot_duplicates()の括弧内で第1の引数,パラメータとして「image_dir=r”C:\Users\user_\image_test”」を渡し、第2の引数,パラメータとして「duplicate_map=duplicates」を渡します。第3のの引数,パラメータとして「filename=”model-women.jpg”」を渡します。これで与えられた画像(model-women.jpg)の重複を、重複辞書(duplicates)を用いて、プロットします。
■実行・検証
このスクリプトを「image_duplication.py」という名前で、Pythonが実行されている作業ディレクトリ(カレントディレクトリ)に保存し、コマンドプロンプトから実行してみます。
2021-11-09 10:00:37,189: INFO Start: Calculating hashes... 100%|████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:02<00:00, 2.55it/s] 2021-11-09 10:00:39,868: INFO End: Calculating hashes! 2021-11-09 10:00:39,869: INFO Start: Evaluating hamming distances for getting duplicates 2021-11-09 10:00:39,869: INFO Start: Retrieving duplicates using BKTree algorithm 100%|████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:02<00:00, 2.54it/s] 2021-11-09 10:00:42,592: INFO End: Retrieving duplicates using BKTree algorithm 2021-11-09 10:00:42,592: INFO End: Evaluating hamming distances for getting duplicates C:\Users\user_\AppData\Roaming\Python\Python36\site-packages\imagededup\utils\plotter.py:66: MatplotlibDeprecationWarning: savefig() got unexpected keyword argument "dpi" which is no longer supported as of 3.3 and will become an error two minor releases later gs.tight_layout(fig)
実行してみると、コマンドプロンプト上で上記のメッセージが出力され、ウインドウが表示。表示されたウインドウには、重複していないオリジナルの画像(Original Image)である「model-women.jpg」が表示され、今回用意した画像が置かれている場所(ディレクトリ)内で重複している画像がその下に表示され、重複を発見できることを確認できました。
なお、スクリプトを「if __name__ == ‘__main__’」を用いないで実行すると、下記のERRORが出力されます。(https://github.com/idealo/imagededup/issues/94)
2021-11-09 10:05:18,375: INFO Start: Calculating hashes... 0%| | 0/6 [00:00<?, ?it/s]2021-11-09 10:05:20,871: INFO Start: Calculating hashes... 2021-11-09 10:05:20,872: INFO Start: Calculating hashes... Traceback (most recent call last): File "<string>", line 1, in <module> Traceback (most recent call last): File "<string>", line 1, in <module> File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 105, in spawn_main exitcode = _main(fd) File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 105, in spawn_main File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 114, in _main prepare(preparation_data)exitcode = _main(fd) File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 225, in prepare 2021-11-09 10:05:20,878: INFO Start: Calculating hashes... File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 114, in _main _fixup_main_from_path(data['init_main_from_path']) prepare(preparation_data) File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path Traceback (most recent call last): File "<string>", line 1, in <module> File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 225, in prepare File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 105, in spawn_main _fixup_main_from_path(data['init_main_from_path'])run_name="__mp_main__") exitcode = _main(fd) File "C:\Program Files\Python36\lib\runpy.py", line 263, in run_path File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path pkg_name=pkg_name, script_name=fname) File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 114, in _main run_name="__mp_main__") File "C:\Program Files\Python36\lib\runpy.py", line 96, in _run_module_code prepare(preparation_data) File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 225, in prepare mod_name, mod_spec, pkg_name, script_name) File "C:\Program Files\Python36\lib\runpy.py", line 263, in run_path _fixup_main_from_path(data['init_main_from_path']) File "C:\Program Files\Python36\lib\runpy.py", line 85, in _run_code pkg_name=pkg_name, script_name=fname) File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path exec(code, run_globals) File "C:\Program Files\Python36\lib\runpy.py", line 96, in _run_module_code run_name="__mp_main__") File "C:\Users\user_\image_duplication.py", line 5, in <module> mod_name, mod_spec, pkg_name, script_name) File "C:\Program Files\Python36\lib\runpy.py", line 263, in run_path encodings = phasher.encode_images(image_dir=r"C:\Users\user_\image_test") File "C:\Program Files\Python36\lib\runpy.py", line 85, in _run_code pkg_name=pkg_name, script_name=fname) File "C:\Users\user_\AppData\Roaming\Python\Python36\site-packages\imagededup\methods\hashing.py", line 153, in encode_images File "C:\Program Files\Python36\lib\runpy.py", line 96, in _run_module_code exec(code, run_globals) hashes = parallelise(self.encode_image, files, self.verbose) mod_name, mod_spec, pkg_name, script_name) File "C:\Users\user_\image_duplication.py", line 5, in <module> File "C:\Users\user_\AppData\Roaming\Python\Python36\site-packages\imagededup\utils\general_utils.py", line 61, in parallelise File "C:\Program Files\Python36\lib\runpy.py", line 85, in _run_code encodings = phasher.encode_images(image_dir=r"C:\Users\user_\image_test") File "C:\Users\user_\AppData\Roaming\Python\Python36\site-packages\imagededup\methods\hashing.py", line 153, in encode_images pool = Pool(processes=cpu_count())exec(code, run_globals) hashes = parallelise(self.encode_image, files, self.verbose) File "C:\Program Files\Python36\lib\multiprocessing\context.py", line 119, in Pool File "C:\Users\user_\image_duplication.py", line 5, in <module> 2021-11-09 10:05:20,908: INFO Start: Calculating hashes... File "C:\Users\user_\AppData\Roaming\Python\Python36\site-packages\imagededup\utils\general_utils.py", line 61, in parallelise context=self.get_context()) encodings = phasher.encode_images(image_dir=r"C:\Users\user_\image_test") pool = Pool(processes=cpu_count()) File "C:\Program Files\Python36\lib\multiprocessing\pool.py", line 174, in __init__ Traceback (most recent call last): File "C:\Users\user_\AppData\Roaming\Python\Python36\site-packages\imagededup\methods\hashing.py", line 153, in encode_images File "<string>", line 1, in <module> self._repopulate_pool() File "C:\Program Files\Python36\lib\multiprocessing\context.py", line 119, in Pool hashes = parallelise(self.encode_image, files, self.verbose) File "C:\Program Files\Python36\lib\multiprocessing\pool.py", line 239, in _repopulate_pool File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 105, in spawn_main context=self.get_context()) File "C:\Users\user_\AppData\Roaming\Python\Python36\site-packages\imagededup\utils\general_utils.py", line 61, in parallelise w.start()exitcode = _main(fd) File "C:\Program Files\Python36\lib\multiprocessing\pool.py", line 174, in __init__ pool = Pool(processes=cpu_count()) File "C:\Program Files\Python36\lib\multiprocessing\process.py", line 105, in start File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 114, in _main self._repopulate_pool() File "C:\Program Files\Python36\lib\multiprocessing\context.py", line 119, in Pool self._popen = self._Popen(self)prepare(preparation_data) File "C:\Program Files\Python36\lib\multiprocessing\pool.py", line 239, in _repopulate_pool context=self.get_context()) File "C:\Program Files\Python36\lib\multiprocessing\context.py", line 322, in _Popen File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 225, in prepare w.start() File "C:\Program Files\Python36\lib\multiprocessing\pool.py", line 174, in __init__ return Popen(process_obj)_fixup_main_from_path(data['init_main_from_path']) File "C:\Program Files\Python36\lib\multiprocessing\process.py", line 105, in start self._repopulate_pool() File "C:\Program Files\Python36\lib\multiprocessing\popen_spawn_win32.py", line 33, in __init__ File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path self._popen = self._Popen(self) File "C:\Program Files\Python36\lib\multiprocessing\pool.py", line 239, in _repopulate_pool prep_data = spawn.get_preparation_data(process_obj._name)run_name="__mp_main__") File "C:\Program Files\Python36\lib\multiprocessing\context.py", line 322, in _Popen File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 143, in get_preparation_data w.start()return Popen(process_obj) File "C:\Program Files\Python36\lib\runpy.py", line 263, in run_path _check_not_importing_main() File "C:\Program Files\Python36\lib\multiprocessing\process.py", line 105, in start File "C:\Program Files\Python36\lib\multiprocessing\popen_spawn_win32.py", line 33, in __init__ pkg_name=pkg_name, script_name=fname) File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main self._popen = self._Popen(self)prep_data = spawn.get_preparation_data(process_obj._name) File "C:\Program Files\Python36\lib\runpy.py", line 96, in _run_module_code is not going to be frozen to produce an executable.''') File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 143, in get_preparation_data File "C:\Program Files\Python36\lib\multiprocessing\context.py", line 322, in _Popen mod_name, mod_spec, pkg_name, script_name) RuntimeErrorreturn Popen(process_obj) _check_not_importing_main(): File "C:\Program Files\Python36\lib\runpy.py", line 85, in _run_code An attempt has been made to start a new process before the current process has finished its bootstrapping phase. This probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module: if __name__ == '__main__': freeze_support() ... The "freeze_support()" line can be omitted if the program is not going to be frozen to produce an executable.
コメント