Pythonでニュース記事本文やメタデータ・画像候補を抽出する「goose3」のインストールについて解説しています。
「goose3(https://github.com/goose3/goose3)」は、ニュース記事や記事タイプのWebページの本文だけでなく、全てのメタデータと可能性の高い画像候補を抽出することができるライブラリです。
■Python
今回のPythonのバージョンは、「3.8.5」を使用しています。(Windows10)(pythonランチャーでの確認)
■goose3をインストールする
goose3をインストールを行いますが、今回はpipを経由してインストールを行うので、まずWindowsのコマンドプロンプトを起動します。
pip install goose3
起動後、上記のコマンドを入力し、Enterキーを押します。
なお、今回は、pythonランチャーを使用しており、Python Version 3.8.5にインストールを行うために、pipを使う場合にはコマンドでの切り替えを行います。
py -3.8 -m pip install goose3
切り替えるために、上記のコマンドを入力し、Enterキーを押します。
Defaulting to user installation because normal site-packages is not writeable Collecting goose3 Downloading goose3-3.1.11-py3-none-any.whl (87 kB) |████████████████████████████████| 87 kB 857 kB/s Requirement already satisfied: python-dateutil in c:\users\user_\appdata\roaming\python\python38\site-packages (from goose3) (2.8.1) Requirement already satisfied: Pillow in c:\users\user_\appdata\roaming\python\python38\site-packages (from goose3) (8.2.0) Requirement already satisfied: lxml in c:\users\user_\appdata\roaming\python\python38\site-packages (from goose3) (4.6.3) Requirement already satisfied: beautifulsoup4 in c:\users\user_\appdata\roaming\python\python38\site-packages (from goose3) (4.9.3) Collecting langdetect Using cached langdetect-1.0.9.tar.gz (981 kB) Preparing metadata (setup.py) ... done Requirement already satisfied: requests in c:\users\user_\appdata\roaming\python\python38\site-packages (from goose3) (2.24.0) Requirement already satisfied: nltk in c:\users\user_\appdata\roaming\python\python38\site-packages (from goose3) (3.6.5) Requirement already satisfied: cssselect in c:\users\user_\appdata\roaming\python\python38\site-packages (from goose3) (1.1.0) Collecting jieba Downloading jieba-0.42.1.tar.gz (19.2 MB) |████████████████████████████████| 19.2 MB 6.4 MB/s Preparing metadata (setup.py) ... done Requirement already satisfied: soupsieve>1.2 in c:\users\user_\appdata\roaming\python\python38\site-packages (from beautifulsoup4->goose3) (2.2.1) Requirement already satisfied: six in c:\users\user_\appdata\roaming\python\python38\site-packages (from langdetect->goose3) (1.15.0) Requirement already satisfied: joblib in c:\users\user_\appdata\roaming\python\python38\site-packages (from nltk->goose3) (1.0.1) Requirement already satisfied: click in c:\users\user_\appdata\roaming\python\python38\site-packages (from nltk->goose3) (8.0.4) Requirement already satisfied: regex>=2021.8.3 in c:\users\user_\appdata\roaming\python\python38\site-packages (from nltk->goose3) (2021.11.10) Requirement already satisfied: tqdm in c:\users\user_\appdata\roaming\python\python38\site-packages (from nltk->goose3) (4.60.0) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in c:\users\user_\appdata\roaming\python\python38\site-packages (from requests->goose3) (1.25.11) Requirement already satisfied: certifi>=2017.4.17 in c:\users\user_\appdata\roaming\python\python38\site-packages (from requests->goose3) (2021.5.30) Requirement already satisfied: chardet<4,>=3.0.2 in c:\users\user_\appdata\roaming\python\python38\site-packages (from requests->goose3) (3.0.4) Requirement already satisfied: idna<3,>=2.5 in c:\users\user_\appdata\roaming\python\python38\site-packages (from requests->goose3) (2.10) Requirement already satisfied: colorama in c:\users\user_\appdata\roaming\python\python38\site-packages (from click->nltk->goose3) (0.4.4) Building wheels for collected packages: jieba, langdetect Building wheel for jieba (setup.py) ... done Created wheel for jieba: filename=jieba-0.42.1-py3-none-any.whl size=19314476 sha256=d381a6bf80676d09167c5f45a3b16c9367f88e414fd01b16bb113ced93101be1 Stored in directory: c:\users\user_\appdata\local\pip\cache\wheels\ca\d8\dfdfe73bec1d12026b30cb7ce8da650f3f0ea2cf155ea018ae Building wheel for langdetect (setup.py) ... done Created wheel for langdetect: filename=langdetect-1.0.9-py3-none-any.whl size=993242 sha256=8069c5d6c9c58b979374ee54ef0b76ea46fdd530ffced99b55b622f19818ecd3 Stored in directory: c:\users\user_\appdata\local\pip\cache\wheels\c7\b0f66658626032e78fc1a83103690ef6797d551cb22e56e734 Successfully built jieba langdetect Installing collected packages: langdetect, jieba, goose3 Successfully installed goose3-3.1.11 jieba-0.42.1 langdetect-1.0.9
Enterキーを押すと、インストールが開始され、上記のように「Successfully installed」と表示されます。これが表示されれば、goose3が正常にインストールされたことになります。
なお、今回はgoose3のバージョン3.1.11をインストールしました。
コメント