ニュース記事を自動で収集できる「newspaper3k 」のインストール

Pythonでニュース記事を自動で収集できる「newspaper3k」のインストールについて解説しています。

newspaper3k(https://github.com/codelucas/newspaper/)は、Webスクレイピングを実行し、ニュースサイトから記事の取得(テキスト抽出など)やテキストからのキーワード抽出等を行うことができます。また10以上の言語(英語、中国語、ドイツ語)などに対応しており、日本語にも対応しています。なお、newspaper3kは、Webスクレイピングを実行しますので、同時に複数のリクエストを行ってしまうと、サイトからのブロックされてしまう恐れがありますので、これを踏まえてご利用ください。

■Python

今回のPythonのバージョンは、「3.8.5」を使用しています。(Windows10)(pythonランチャーでの確認)

■newspaper3kをインストールする

newspaper3kをインストールを行いますが、今回はpipを経由してインストールを行うので、まずWindowsのコマンドプロンプトを起動します。

pip install newspaper3k

起動後、上記のコマンドを入力し、Enterキーを押します。

なお、今回は、pythonランチャーを使用しており、Python Version 3.8.5にインストールを行うために、pipを使う場合にはコマンドでの切り替えを行います。

py -3.8 -m pip install newspaper3k

切り替えるために、上記のコマンドを入力し、Enterキーを押します。

Defaulting to user installation because normal site-packages is not writeable
Collecting newspaper3k
Downloading newspaper3k-0.2.8-py3-none-any.whl (211 kB)
|████████████████████████████████| 211 kB 819 kB/s
Collecting cssselect>=0.9.2
Downloading cssselect-1.1.0-py2.py3-none-any.whl (16 kB)
Requirement already satisfied: beautifulsoup4>=4.4.1 in c:\users\user_\appdata\roaming\python\python38\site-packages (from newspaper3k) (4.9.3)
Collecting tldextract>=2.0.1
Downloading tldextract-3.1.2-py2.py3-none-any.whl (87 kB)
|████████████████████████████████| 87 kB 1.5 MB/s
Collecting jieba3k>=0.35.1
Downloading jieba3k-0.35.1.zip (7.4 MB)
|████████████████████████████████| 7.4 MB 3.3 MB/s
Preparing metadata (setup.py) ... done
Collecting feedfinder2>=0.0.4
Downloading feedfinder2-0.0.4.tar.gz (3.3 kB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: feedparser>=5.2.1 in c:\users\user_\appdata\roaming\python\python38\site-packages (from newspaper3k) (5.2.1)
Collecting tinysegmenter==0.3
Downloading tinysegmenter-0.3.tar.gz (16 kB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: PyYAML>=3.11 in c:\users\user_\appdata\roaming\python\python38\site-packages (from newspaper3k) (5.4.1)
Requirement already satisfied: lxml>=3.6.0 in c:\users\user_\appdata\roaming\python\python38\site-packages (from newspaper3k) (4.6.3)
Requirement already satisfied: Pillow>=3.3.0 in c:\users\user_\appdata\roaming\python\python38\site-packages (from newspaper3k) (8.2.0)
Collecting nltk>=3.2.1
Downloading nltk-3.6.5-py3-none-any.whl (1.5 MB)
|████████████████████████████████| 1.5 MB 2.2 MB/s
Requirement already satisfied: requests>=2.10.0 in c:\users\user_\appdata\roaming\python\python38\site-packages (from newspaper3k) (2.25.1)
Requirement already satisfied: python-dateutil>=2.5.3 in c:\users\user_\appdata\roaming\python\python38\site-packages (from newspaper3k) (2.8.1)
Requirement already satisfied: soupsieve>1.2 in c:\users\user_\appdata\roaming\python\python38\site-packages (from beautifulsoup4>=4.4.1->newspaper3k) (2.2.1)
Requirement already satisfied: six in c:\users\user_\appdata\roaming\python\python38\site-packages (from feedfinder2>=0.0.4->newspaper3k) (1.15.0)
Requirement already satisfied: joblib in c:\users\user_\appdata\roaming\python\python38\site-packages (from nltk>=3.2.1->newspaper3k) (1.0.1)
Collecting regex>=2021.8.3
Downloading regex-2021.11.10-cp38-cp38-win_amd64.whl (273 kB)
|████████████████████████████████| 273 kB 3.2 MB/s
Requirement already satisfied: click in c:\users\user_\appdata\roaming\python\python38\site-packages (from nltk>=3.2.1->newspaper3k) (7.1.2)
Requirement already satisfied: tqdm in c:\users\user_\appdata\roaming\python\python38\site-packages (from nltk>=3.2.1->newspaper3k) (4.60.0)
Requirement already satisfied: chardet<5,>=3.0.2 in c:\users\user_\appdata\roaming\python\python38\site-packages (from requests>=2.10.0->newspaper3k) (4.0.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\users\user_\appdata\roaming\python\python38\site-packages (from requests>=2.10.0->newspaper3k) (1.26.5)
Requirement already satisfied: idna<3,>=2.5 in c:\users\user_\appdata\roaming\python\python38\site-packages (from requests>=2.10.0->newspaper3k) (2.10)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\user_\appdata\roaming\python\python38\site-packages (from requests>=2.10.0->newspaper3k) (2021.5.30)
Collecting requests-file>=1.4
Downloading requests_file-1.5.1-py2.py3-none-any.whl (3.7 kB)
Requirement already satisfied: filelock>=3.0.8 in c:\users\user_\appdata\roaming\python\python38\site-packages (from tldextract>=2.0.1->newspaper3k) (3.3.1)
Building wheels for collected packages: tinysegmenter, feedfinder2, jieba3k
Building wheel for tinysegmenter (setup.py) ... done
Created wheel for tinysegmenter: filename=tinysegmenter-0.3-py3-none-any.whl size=13552 sha256=5731157b65d0e30ddc7f2a931218bb9194911c5f5d42fd85c480b67f64e86f6a
Stored in directory: c:\users\user_\appdata\local\pip\cache\wheelsfac1c8d9c648cfabebbbffe97a889f6624817f3aa0bbe6c09
Building wheel for feedfinder2 (setup.py) ... done
Created wheel for feedfinder2: filename=feedfinder2-0.0.4-py3-none-any.whl size=3356 sha256=f945d0b813c1be9179ff9076b31dbbfd897873921b5d85ec3f9659b25e382d28
Stored in directory: c:\users\user_\appdata\local\pip\cache\wheels\b6\a9f15498ac02c23dde29f18745bc6a6f574ba4ab41861a3575
Building wheel for jieba3k (setup.py) ... done
Created wheel for jieba3k: filename=jieba3k-0.35.1-py3-none-any.whl size=7398405 sha256=f8da32e98d5417ca22a45cac6c55a112739decf8bd160297b8a44e1e04c8174e
Stored in directory: c:\users\user_\appdata\local\pip\cache\wheelsfe
Defaulting to user installation because normal site-packages is not writeable
Collecting newspaper3k
Downloading newspaper3k-0.2.8-py3-none-any.whl (211 kB)
|████████████████████████████████| 211 kB 819 kB/s
Collecting cssselect>=0.9.2
Downloading cssselect-1.1.0-py2.py3-none-any.whl (16 kB)
Requirement already satisfied: beautifulsoup4>=4.4.1 in c:\users\user_\appdata\roaming\python\python38\site-packages (from newspaper3k) (4.9.3)
Collecting tldextract>=2.0.1
Downloading tldextract-3.1.2-py2.py3-none-any.whl (87 kB)
|████████████████████████████████| 87 kB 1.5 MB/s
Collecting jieba3k>=0.35.1
Downloading jieba3k-0.35.1.zip (7.4 MB)
|████████████████████████████████| 7.4 MB 3.3 MB/s
Preparing metadata (setup.py) ... done
Collecting feedfinder2>=0.0.4
Downloading feedfinder2-0.0.4.tar.gz (3.3 kB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: feedparser>=5.2.1 in c:\users\user_\appdata\roaming\python\python38\site-packages (from newspaper3k) (5.2.1)
Collecting tinysegmenter==0.3
Downloading tinysegmenter-0.3.tar.gz (16 kB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: PyYAML>=3.11 in c:\users\user_\appdata\roaming\python\python38\site-packages (from newspaper3k) (5.4.1)
Requirement already satisfied: lxml>=3.6.0 in c:\users\user_\appdata\roaming\python\python38\site-packages (from newspaper3k) (4.6.3)
Requirement already satisfied: Pillow>=3.3.0 in c:\users\user_\appdata\roaming\python\python38\site-packages (from newspaper3k) (8.2.0)
Collecting nltk>=3.2.1
Downloading nltk-3.6.5-py3-none-any.whl (1.5 MB)
|████████████████████████████████| 1.5 MB 2.2 MB/s
Requirement already satisfied: requests>=2.10.0 in c:\users\user_\appdata\roaming\python\python38\site-packages (from newspaper3k) (2.25.1)
Requirement already satisfied: python-dateutil>=2.5.3 in c:\users\user_\appdata\roaming\python\python38\site-packages (from newspaper3k) (2.8.1)
Requirement already satisfied: soupsieve>1.2 in c:\users\user_\appdata\roaming\python\python38\site-packages (from beautifulsoup4>=4.4.1->newspaper3k) (2.2.1)
Requirement already satisfied: six in c:\users\user_\appdata\roaming\python\python38\site-packages (from feedfinder2>=0.0.4->newspaper3k) (1.15.0)
Requirement already satisfied: joblib in c:\users\user_\appdata\roaming\python\python38\site-packages (from nltk>=3.2.1->newspaper3k) (1.0.1)
Collecting regex>=2021.8.3
Downloading regex-2021.11.10-cp38-cp38-win_amd64.whl (273 kB)
|████████████████████████████████| 273 kB 3.2 MB/s
Requirement already satisfied: click in c:\users\user_\appdata\roaming\python\python38\site-packages (from nltk>=3.2.1->newspaper3k) (7.1.2)
Requirement already satisfied: tqdm in c:\users\user_\appdata\roaming\python\python38\site-packages (from nltk>=3.2.1->newspaper3k) (4.60.0)
Requirement already satisfied: chardet<5,>=3.0.2 in c:\users\user_\appdata\roaming\python\python38\site-packages (from requests>=2.10.0->newspaper3k) (4.0.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\users\user_\appdata\roaming\python\python38\site-packages (from requests>=2.10.0->newspaper3k) (1.26.5)
Requirement already satisfied: idna<3,>=2.5 in c:\users\user_\appdata\roaming\python\python38\site-packages (from requests>=2.10.0->newspaper3k) (2.10)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\user_\appdata\roaming\python\python38\site-packages (from requests>=2.10.0->newspaper3k) (2021.5.30)
Collecting requests-file>=1.4
Downloading requests_file-1.5.1-py2.py3-none-any.whl (3.7 kB)
Requirement already satisfied: filelock>=3.0.8 in c:\users\user_\appdata\roaming\python\python38\site-packages (from tldextract>=2.0.1->newspaper3k) (3.3.1)
Building wheels for collected packages: tinysegmenter, feedfinder2, jieba3k
Building wheel for tinysegmenter (setup.py) ... done
Created wheel for tinysegmenter: filename=tinysegmenter-0.3-py3-none-any.whl size=13552 sha256=5731157b65d0e30ddc7f2a931218bb9194911c5f5d42fd85c480b67f64e86f6a
Stored in directory: c:\users\user_\appdata\local\pip\cache\wheels\99\74\83\8fac1c8d9c648cfabebbbffe97a889f6624817f3aa0bbe6c09
Building wheel for feedfinder2 (setup.py) ... done
Created wheel for feedfinder2: filename=feedfinder2-0.0.4-py3-none-any.whl size=3356 sha256=f945d0b813c1be9179ff9076b31dbbfd897873921b5d85ec3f9659b25e382d28
Stored in directory: c:\users\user_\appdata\local\pip\cache\wheels\b6\09\68\a9f15498ac02c23dde29f18745bc6a6f574ba4ab41861a3575
Building wheel for jieba3k (setup.py) ... done
Created wheel for jieba3k: filename=jieba3k-0.35.1-py3-none-any.whl size=7398405 sha256=f8da32e98d5417ca22a45cac6c55a112739decf8bd160297b8a44e1e04c8174e
Stored in directory: c:\users\user_\appdata\local\pip\cache\wheels\1f\7e\0c\54f3b0f5164278677899f2db08f2b07943ce2d024a3c862afb
Successfully built tinysegmenter feedfinder2 jieba3k
Installing collected packages: requests-file, regex, tldextract, tinysegmenter, nltk, jieba3k, feedfinder2, cssselect, newspaper3k
Attempting uninstall: regex
Found existing installation: regex 2021.4.4
Uninstalling regex-2021.4.4:
Successfully uninstalled regex-2021.4.4
Successfully installed cssselect-1.1.0 feedfinder2-0.0.4 jieba3k-0.35.1 newspaper3k-0.2.8 nltk-3.6.5 regex-2021.11.10 requests-file-1.5.1 tinysegmenter-0.3 tldextract-3.1.2
cf3b0f5164278677899f2db08f2b07943ce2d024a3c862afb Successfully built tinysegmenter feedfinder2 jieba3k Installing collected packages: requests-file, regex, tldextract, tinysegmenter, nltk, jieba3k, feedfinder2, cssselect, newspaper3k Attempting uninstall: regex Found existing installation: regex 2021.4.4 Uninstalling regex-2021.4.4: Successfully uninstalled regex-2021.4.4 Successfully installed cssselect-1.1.0 feedfinder2-0.0.4 jieba3k-0.35.1 newspaper3k-0.2.8 nltk-3.6.5 regex-2021.11.10 requests-file-1.5.1 tinysegmenter-0.3 tldextract-3.1.2

Enterキーを押すと、インストールが開始され、上記のように「Successfully installed」と表示されます。これが表示されれば、newspaper3kが正常にインストールされたことになります。

なお、今回はnewspaper3kのバージョン0.2.8をインストールしました。

コメント

タイトルとURLをコピーしました