PDFファイルをスクレイピングし単語インデックスを作成する「pdfplumber」のインストールについて解説しています。
「pdfplumber(https://github.com/moste00/PDF-Indexer)」は、PDFファイルをスクレイピングし、そこから単語インデックスを作成することができるPythonライブラリです。
■Python
今回のPythonのバージョンは、「3.8.5」を使用しています。(Windows10)(pythonランチャーでの確認)
■pdfplumberをインストールする
pdfplumberをインストールを行いますが、今回はpipを経由してインストールを行うので、まずWindowsのコマンドプロンプトを起動します。
pip install pdfplumber
起動後、上記のコマンドを入力し、Enterキーを押します。
なお、今回は、pythonランチャーを使用しており、Python Version 3.8.5にインストールを行うために、バージョンの切り替えを行います。
py -3.8 -m pip install pdfplumber
切り替えるために、上記のコマンドを入力し、Enterキーを押します。
Defaulting to user installation because normal site-packages is not writeable Collecting pdfplumber Downloading pdfplumber-0.7.1-py3-none-any.whl (39 kB) Collecting Wand>=0.6.7 Downloading Wand-0.6.7-py2.py3-none-any.whl (139 kB) ---------------------------------------- 139.2/139.2 kB 685.7 kB/s eta 0:00:00 Collecting pdfminer.six==20220524 Downloading pdfminer.six-20220524-py3-none-any.whl (5.6 MB) ---------------------------------------- 5.6/5.6 MB 3.2 MB/s eta 0:00:00 Requirement already satisfied: Pillow>=9.1 in c:\users\user_\appdata\roaming\python\python38\site-packages (from pdfplumber) (9.1.1) Collecting cryptography>=36.0.0 Using cached cryptography-37.0.2-cp36-abi3-win_amd64.whl (2.4 MB) Requirement already satisfied: charset-normalizer>=2.0.0 in c:\users\user_\appdata\roaming\python\python38\site-packages (from pdfminer.six==20220524->pdfplumber) (2.0.2) Requirement already satisfied: cffi>=1.12 in c:\users\user_\appdata\roaming\python\python38\site-packages (from cryptography>=36.0.0->pdfminer.six==20220524->pdfplumber) (1.15.0) Requirement already satisfied: pycparser in c:\users\user_\appdata\roaming\python\python38\site-packages (from cffi>=1.12->cryptography>=36.0.0->pdfminer.six==20220524->pdfplumber) (2.21) Installing collected packages: Wand, cryptography, pdfminer.six, pdfplumber Attempting uninstall: Wand Found existing installation: Wand 0.6.6 Uninstalling Wand-0.6.6: Successfully uninstalled Wand-0.6.6 Attempting uninstall: cryptography Found existing installation: cryptography 3.3.2 Uninstalling cryptography-3.3.2: Successfully uninstalled cryptography-3.3.2 Attempting uninstall: pdfminer.six Found existing installation: pdfminer.six 20191110 Uninstalling pdfminer.six-20191110: Successfully uninstalled pdfminer.six-20191110 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. textract 1.6.5 requires pdfminer.six==20191110, but you have pdfminer-six 20220524 which is incompatible. textract 1.6.5 requires six~=1.12.0, but you have six 1.16.0 which is incompatible. snowflake-connector-python 2.7.8 requires cryptography<37.0.0,>=3.1.0, but you have cryptography 37.0.2 which is incompatible. seleniumbase 3.1.0 requires beautifulsoup4==4.11.1; python_version >= "3.6", but you have beautifulsoup4 4.8.2 which is incompatible. seleniumbase 3.1.0 requires certifi>=2021.10.8, but you have certifi 2021.5.30 which is incompatible. seleniumbase 3.1.0 requires chardet==4.0.0; python_version >= "3.5", but you have chardet 3.0.4 which is incompatible. seleniumbase 3.1.0 requires charset-normalizer==2.0.12; python_version >= "3.5", but you have charset-normalizer 2.0.2 which is incompatible. seleniumbase 3.1.0 requires h11==0.13.0; python_version >= "3.7", but you have h11 0.12.0 which is incompatible. seleniumbase 3.1.0 requires idna==3.3; python_version >= "3.6", but you have idna 3.2 which is incompatible. seleniumbase 3.1.0 requires pdfminer.six==20220319; python_version >= "3.7", but you have pdfminer-six 20220524 which is incompatible. seleniumbase 3.1.0 requires pyopenssl==22.0.0; python_version >= "3.7", but you have pyopenssl 21.0.0 which is incompatible. seleniumbase 3.1.0 requires requests==2.27.1; python_version >= "3.6", but you have requests 2.26.0 which is incompatible. seleniumbase 3.1.0 requires tomli>=2.0.1; python_version >= "3.7", but you have tomli 1.2.3 which is incompatible. seleniumbase 3.1.0 requires urllib3==1.26.9, but you have urllib3 1.26.6 which is incompatible. mindsdb 22.6.1.2 requires cryptography<3.4,>=2.9.2, but you have cryptography 37.0.2 which is incompatible. Successfully installed Wand-0.6.7 cryptography-37.0.2 pdfminer.six-20220524 pdfplumber-0.7.1
Enterキーを押すと、インストールが開始され、上記のように「Successfully installed」と表示されます。これが表示されれば、pdfplumberのバージョン0.7.1が正常にインストールされたことになりますが、今回は「ERROR: pip’s dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.(ERROR: pip の依存性解決は現在インストールされているすべてのパッケージを考慮に入れていません。この挙動は以下のような依存関係の衝突の原因となっています。)」というエラーが出力されてしまいましたので、仮想環境を構築した上で、インストールすることを推奨します。
コメント