Pythonでimagededupを使用し画像内の重複する画像を見つける

スポンサーリンク

Pythonでimagededupを使用し画像内の重複する画像を見つけてみます。

今回は、imagededupライブラリを使用します。imagededupはPythonの標準ライブラリではありませんので、事前にインストールする必要があります。

■Python

今回のPythonのバージョンは、「3.6.8」を使用しています。(Windows10)(pythonランチャーでの確認)

■画像を用意する

imagededupを使用し画像内の重複する画像を見つけてみますので、その前に画像を用意します。

今回は高品質なフリー画像素材Pixabay(https://pixabay.com/ja/)の画像を利用します。画像をダウンロードし、「C:\Users\user_\image_test(フォルダパス)」内に保存します。画像の形式はjpgです。

■imagededupを使用し画像内の重複する画像を見つける

画像が用意できましたので、imagededupを使用し画像内の重複する画像を見つけるスクリプトを書いていきます。

■コード

from imagededup.methods import PHash
from imagededup.utils import plot_duplicates

phasher = PHash()
if __name__ == '__main__':
    encodings = phasher.encode_images(image_dir=r"C:\Users\user_\image_test")
    duplicates = phasher.find_duplicates(encoding_map=encodings)
    print(plot_duplicates(image_dir=r"C:\Users\user_\image_test",duplicate_map=duplicates,filename="model-women.jpg"))

phasher変数を定義し、PHash()で知覚ハッシュ法の取り込みます。

取り込み後、if __name__ == ‘__main__’を用いて、モジュールを作成。encodings変数を定義し、encode_images()の括弧内で引数,パラメータとして「image_dir=r”C:\Users\user_\image_test”」を渡し、今回用意した画像が置かれている場所(ディレクトリ)内のすべての画像のエンコーディングを生成し、encodings変数に格納します。

格納後、duplicates変数を定義し、find_duplicates()の括弧内で引数,パラメータとして「encoding_map=encodings」を渡し、生成されたエンコーディングを使用し、重複している画像を探します。

その後、plot_duplicates()の括弧内で第1の引数,パラメータとして「image_dir=r”C:\Users\user_\image_test”」を渡し、第2の引数,パラメータとして「duplicate_map=duplicates」を渡します。第3のの引数,パラメータとして「filename=”model-women.jpg”」を渡します。これで与えられた画像(model-women.jpg)の重複を、重複辞書(duplicates)を用いて、プロットします。

■実行・検証

このスクリプトを「image_duplication.py」という名前で、Pythonが実行されている作業ディレクトリ(カレントディレクトリ)に保存し、コマンドプロンプトから実行してみます。

2021-11-09 10:00:37,189: INFO Start: Calculating hashes...
100%|████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:02<00:00, 2.55it/s]
2021-11-09 10:00:39,868: INFO End: Calculating hashes!
2021-11-09 10:00:39,869: INFO Start: Evaluating hamming distances for getting duplicates
2021-11-09 10:00:39,869: INFO Start: Retrieving duplicates using BKTree algorithm
100%|████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:02<00:00, 2.54it/s]
2021-11-09 10:00:42,592: INFO End: Retrieving duplicates using BKTree algorithm
2021-11-09 10:00:42,592: INFO End: Evaluating hamming distances for getting duplicates
C:\Users\user_\AppData\Roaming\Python\Python36\site-packages\imagededup\utils\plotter.py:66: MatplotlibDeprecationWarning: savefig() got unexpected keyword argument "dpi" which is no longer supported as of 3.3 and will become an error two minor releases later
gs.tight_layout(fig)

実行してみると、コマンドプロンプト上で上記のメッセージが出力され、ウインドウが表示。表示されたウインドウには、重複していないオリジナルの画像(Original Image)である「model-women.jpg」が表示され、今回用意した画像が置かれている場所(ディレクトリ)内で重複している画像がその下に表示され、重複を発見できることを確認できました。

なお、スクリプトを「if __name__ == ‘__main__’」を用いないで実行すると、下記のERRORが出力されます。(https://github.com/idealo/imagededup/issues/94

2021-11-09 10:05:18,375: INFO Start: Calculating hashes...
0%| | 0/6 [00:00<?, ?it/s]2021-11-09 10:05:20,871: INFO Start: Calculating hashes...
2021-11-09 10:05:20,872: INFO Start: Calculating hashes...
Traceback (most recent call last):
File "<string>", line 1, in <module>
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 105, in spawn_main
File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 114, in _main
prepare(preparation_data)exitcode = _main(fd)
File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 225, in prepare

2021-11-09 10:05:20,878: INFO Start: Calculating hashes... File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 114, in _main

_fixup_main_from_path(data['init_main_from_path'])
prepare(preparation_data) File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
Traceback (most recent call last):
File "<string>", line 1, in <module>

File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 225, in prepare
File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 105, in spawn_main
_fixup_main_from_path(data['init_main_from_path'])run_name="__mp_main__")

exitcode = _main(fd) File "C:\Program Files\Python36\lib\runpy.py", line 263, in run_path
File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path

pkg_name=pkg_name, script_name=fname) File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 114, in _main

run_name="__mp_main__") File "C:\Program Files\Python36\lib\runpy.py", line 96, in _run_module_code
prepare(preparation_data)
File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 225, in prepare
mod_name, mod_spec, pkg_name, script_name)

File "C:\Program Files\Python36\lib\runpy.py", line 263, in run_path
_fixup_main_from_path(data['init_main_from_path']) File "C:\Program Files\Python36\lib\runpy.py", line 85, in _run_code

pkg_name=pkg_name, script_name=fname) File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
exec(code, run_globals)

File "C:\Program Files\Python36\lib\runpy.py", line 96, in _run_module_code
run_name="__mp_main__") File "C:\Users\user_\image_duplication.py", line 5, in <module>

mod_name, mod_spec, pkg_name, script_name) File "C:\Program Files\Python36\lib\runpy.py", line 263, in run_path
encodings = phasher.encode_images(image_dir=r"C:\Users\user_\image_test")
File "C:\Program Files\Python36\lib\runpy.py", line 85, in _run_code

pkg_name=pkg_name, script_name=fname) File "C:\Users\user_\AppData\Roaming\Python\Python36\site-packages\imagededup\methods\hashing.py", line 153, in encode_images

File "C:\Program Files\Python36\lib\runpy.py", line 96, in _run_module_code
exec(code, run_globals) hashes = parallelise(self.encode_image, files, self.verbose)
mod_name, mod_spec, pkg_name, script_name)
File "C:\Users\user_\image_duplication.py", line 5, in <module>

File "C:\Users\user_\AppData\Roaming\Python\Python36\site-packages\imagededup\utils\general_utils.py", line 61, in parallelise
File "C:\Program Files\Python36\lib\runpy.py", line 85, in _run_code
encodings = phasher.encode_images(image_dir=r"C:\Users\user_\image_test")
File "C:\Users\user_\AppData\Roaming\Python\Python36\site-packages\imagededup\methods\hashing.py", line 153, in encode_images
pool = Pool(processes=cpu_count())exec(code, run_globals)

hashes = parallelise(self.encode_image, files, self.verbose) File "C:\Program Files\Python36\lib\multiprocessing\context.py", line 119, in Pool
File "C:\Users\user_\image_duplication.py", line 5, in <module>

2021-11-09 10:05:20,908: INFO Start: Calculating hashes...
File "C:\Users\user_\AppData\Roaming\Python\Python36\site-packages\imagededup\utils\general_utils.py", line 61, in parallelise
context=self.get_context()) encodings = phasher.encode_images(image_dir=r"C:\Users\user_\image_test")
pool = Pool(processes=cpu_count())
File "C:\Program Files\Python36\lib\multiprocessing\pool.py", line 174, in __init__
Traceback (most recent call last):
File "C:\Users\user_\AppData\Roaming\Python\Python36\site-packages\imagededup\methods\hashing.py", line 153, in encode_images

File "<string>", line 1, in <module>
self._repopulate_pool()
File "C:\Program Files\Python36\lib\multiprocessing\context.py", line 119, in Pool
hashes = parallelise(self.encode_image, files, self.verbose) File "C:\Program Files\Python36\lib\multiprocessing\pool.py", line 239, in _repopulate_pool
File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 105, in spawn_main

context=self.get_context()) File "C:\Users\user_\AppData\Roaming\Python\Python36\site-packages\imagededup\utils\general_utils.py", line 61, in parallelise
w.start()exitcode = _main(fd)

File "C:\Program Files\Python36\lib\multiprocessing\pool.py", line 174, in __init__
pool = Pool(processes=cpu_count()) File "C:\Program Files\Python36\lib\multiprocessing\process.py", line 105, in start
File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 114, in _main

self._repopulate_pool() File "C:\Program Files\Python36\lib\multiprocessing\context.py", line 119, in Pool
self._popen = self._Popen(self)prepare(preparation_data)

File "C:\Program Files\Python36\lib\multiprocessing\pool.py", line 239, in _repopulate_pool
context=self.get_context()) File "C:\Program Files\Python36\lib\multiprocessing\context.py", line 322, in _Popen
File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 225, in prepare

w.start() File "C:\Program Files\Python36\lib\multiprocessing\pool.py", line 174, in __init__

return Popen(process_obj)_fixup_main_from_path(data['init_main_from_path']) File "C:\Program Files\Python36\lib\multiprocessing\process.py", line 105, in start

self._repopulate_pool() File "C:\Program Files\Python36\lib\multiprocessing\popen_spawn_win32.py", line 33, in __init__
File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path

self._popen = self._Popen(self) File "C:\Program Files\Python36\lib\multiprocessing\pool.py", line 239, in _repopulate_pool

prep_data = spawn.get_preparation_data(process_obj._name)run_name="__mp_main__")
File "C:\Program Files\Python36\lib\multiprocessing\context.py", line 322, in _Popen

File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
w.start()return Popen(process_obj) File "C:\Program Files\Python36\lib\runpy.py", line 263, in run_path

_check_not_importing_main() File "C:\Program Files\Python36\lib\multiprocessing\process.py", line 105, in start
File "C:\Program Files\Python36\lib\multiprocessing\popen_spawn_win32.py", line 33, in __init__

pkg_name=pkg_name, script_name=fname) File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main

self._popen = self._Popen(self)prep_data = spawn.get_preparation_data(process_obj._name)

File "C:\Program Files\Python36\lib\runpy.py", line 96, in _run_module_code
is not going to be frozen to produce an executable.''') File "C:\Program Files\Python36\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
File "C:\Program Files\Python36\lib\multiprocessing\context.py", line 322, in _Popen

mod_name, mod_spec, pkg_name, script_name) RuntimeErrorreturn Popen(process_obj)
_check_not_importing_main():
File "C:\Program Files\Python36\lib\runpy.py", line 85, in _run_code

An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:

if __name__ == '__main__':
freeze_support()
...

The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.

コメント

タイトルとURLをコピーしました