高速で高レベルのWebクロール及びWebスクレイピング「scrapy」のインストール

高速で高レベルのWebクロール及びWebスクレイピングフレームワーク「scrapy」のインストールについて解説しています。

「scrapy(https://scrapy.org/,https://github.com/scrapy/scrapy)」は、Webサイトをクロールし、ページから構造化データを抽出することができます。Webサイトの監視、自動テストなど、幅広い目的に使用することができます。

■Python

今回のPythonのバージョンは、「3.8.5」を使用しています。(Windows10)(pythonランチャーでの確認)

■scrapyをインストールする

scrapyをインストールを行いますが、今回はpipを経由してインストールを行うので、まずWindowsのコマンドプロンプトを起動します。

pip install scrapy

起動後、上記のコマンドを入力し、Enterキーを押します。

なお、今回は、pythonランチャーを使用しており、Python Version 3.8.5にインストールを行うために、pipを使う場合にはコマンドでの切り替えを行います。

py -3.8 -m pip install scrapy

切り替えるために、上記のコマンドを入力し、Enterキーを押します。

Defaulting to user installation because normal site-packages is not writeable
Collecting Scrapy
Downloading Scrapy-2.6.1-py2.py3-none-any.whl (264 kB)
|████████████████████████████████| 264 kB 726 kB/s
Collecting zope.interface>=4.1.3
Downloading zope.interface-5.4.0-cp38-cp38-win_amd64.whl (210 kB)
|████████████████████████████████| 210 kB 6.4 MB/s
Collecting itemadapter>=0.1.0
Downloading itemadapter-0.4.0-py3-none-any.whl (10 kB)
Requirement already satisfied: setuptools in c:\users\user_\appdata\roaming\python\python38\site-packages (from Scrapy) (58.3.0)
Collecting w3lib>=1.17.0
Downloading w3lib-1.22.0-py2.py3-none-any.whl (20 kB)
Collecting pyOpenSSL>=16.2.0
Downloading pyOpenSSL-22.0.0-py2.py3-none-any.whl (55 kB)
|████████████████████████████████| 55 kB 1.9 MB/s
Collecting parsel>=1.5.0
Downloading parsel-1.6.0-py2.py3-none-any.whl (13 kB)
Requirement already satisfied: lxml>=3.5.0 in c:\users\user_\appdata\roaming\python\python38\site-packages (from Scrapy) (4.6.3)
Collecting protego>=0.1.15
Downloading Protego-0.2.1-py2.py3-none-any.whl (8.2 kB)
Collecting queuelib>=1.4.2
Downloading queuelib-1.6.2-py2.py3-none-any.whl (13 kB)
Requirement already satisfied: cryptography>=2.0 in c:\users\user_\appdata\roaming\python\python38\site-packages (from Scrapy) (3.4.8)
Requirement already satisfied: cssselect>=0.9.1 in c:\users\user_\appdata\roaming\python\python38\site-packages (from Scrapy) (1.1.0)
Requirement already satisfied: tldextract in c:\users\user_\appdata\roaming\python\python38\site-packages (from Scrapy) (3.1.2)
Collecting PyDispatcher>=2.0.5
Downloading PyDispatcher-2.0.5.zip (47 kB)
|████████████████████████████████| 47 kB 1.3 MB/s
Preparing metadata (setup.py) ... done
Collecting itemloaders>=1.0.1
Downloading itemloaders-1.0.4-py3-none-any.whl (11 kB)
Collecting Twisted>=17.9.0
Downloading Twisted-22.2.0-py3-none-any.whl (3.1 MB)
|████████████████████████████████| 3.1 MB 3.3 MB/s
Collecting service-identity>=16.0.0
Downloading service_identity-21.1.0-py2.py3-none-any.whl (12 kB)
Requirement already satisfied: cffi>=1.12 in c:\users\user_\appdata\roaming\python\python38\site-packages (from cryptography>=2.0->Scrapy) (1.14.5)
Collecting jmespath>=0.9.5
Downloading jmespath-0.10.0-py2.py3-none-any.whl (24 kB)
Requirement already satisfied: six>=1.6.0 in c:\users\user_\appdata\roaming\python\python38\site-packages (from parsel>=1.5.0->Scrapy) (1.15.0)
Collecting cryptography>=2.0
Downloading cryptography-36.0.1-cp36-abi3-win_amd64.whl (2.2 MB)
|████████████████████████████████| 2.2 MB 3.3 MB/s
Requirement already satisfied: pyasn1 in c:\users\user_\appdata\roaming\python\python38\site-packages (from service-identity>=16.0.0->Scrapy) (0.4.8)
Requirement already satisfied: attrs>=19.1.0 in c:\users\user_\appdata\roaming\python\python38\site-packages (from service-identity>=16.0.0->Scrapy) (21.2.0)
Requirement already satisfied: pyasn1-modules in c:\users\user_\appdata\roaming\python\python38\site-packages (from service-identity>=16.0.0->Scrapy) (0.2.8)
Collecting incremental>=21.3.0
Downloading incremental-21.3.0-py2.py3-none-any.whl (15 kB)
Collecting twisted-iocpsupport<2,>=1.0.2
Downloading twisted_iocpsupport-1.0.2-cp38-cp38-win_amd64.whl (45 kB)
|████████████████████████████████| 45 kB 3.2 MB/s
Collecting constantly>=15.1
Downloading constantly-15.1.0-py2.py3-none-any.whl (7.9 kB)
Requirement already satisfied: typing-extensions>=3.6.5 in c:\users\user_\appdata\roaming\python\python38\site-packages (from Twisted>=17.9.0->Scrapy) (3.7.4.3)
Collecting Automat>=0.8.0
Downloading Automat-20.2.0-py2.py3-none-any.whl (31 kB)
Collecting hyperlink>=17.1.1
Downloading hyperlink-21.0.0-py2.py3-none-any.whl (74 kB)
|████████████████████████████████| 74 kB 1.6 MB/s
Requirement already satisfied: idna in c:\users\user_\appdata\roaming\python\python38\site-packages (from tldextract->Scrapy) (2.10)
Requirement already satisfied: requests>=2.1.0 in c:\users\user_\appdata\roaming\python\python38\site-packages (from tldextract->Scrapy) (2.27.1)
Requirement already satisfied: requests-file>=1.4 in c:\users\user_\appdata\roaming\python\python38\site-packages (from tldextract->Scrapy) (1.5.1)
Requirement already satisfied: filelock>=3.0.8 in c:\users\user_\appdata\roaming\python\python38\site-packages (from tldextract->Scrapy) (3.3.1)
Requirement already satisfied: pycparser in c:\users\user_\appdata\roaming\python\python38\site-packages (from cffi>=1.12->cryptography>=2.0->Scrapy) (2.20)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\user_\appdata\roaming\python\python38\site-packages (from requests>=2.1.0->tldextract->Scrapy) (2021.5.30)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\users\user_\appdata\roaming\python\python38\site-packages (from requests>=2.1.0->tldextract->Scrapy) (1.26.5)
Requirement already satisfied: charset-normalizer~=2.0.0 in c:\users\user_\appdata\roaming\python\python38\site-packages (from requests>=2.1.0->tldextract->Scrapy) (2.0.12)
Building wheels for collected packages: PyDispatcher
Building wheel for PyDispatcher (setup.py) ... done
Created wheel for PyDispatcher: filename=PyDispatcher-2.0.5-py3-none-any.whl size=12559 sha256=19fe95aadffaa929ac1443a90d9dfe2895c5b8a989bd85e549d6f3a6c92d6ec7
Stored in directory: c:\users\user_\appdata\local\pip\cache\wheelscf\d7d7b5f0b9bad841ed856138ff0c5ee2bf2e04dbeb413097c8
Successfully built PyDispatcher
Installing collected packages: w3lib, zope.interface, twisted-iocpsupport, parsel, jmespath, itemadapter, incremental, hyperlink, cryptography, constantly, Automat, Twisted, service-identity, queuelib, pyOpenSSL, PyDispatcher, protego, itemloaders, Scrapy
Attempting uninstall: cryptography
Found existing installation: cryptography 3.4.8
Uninstalling cryptography-3.4.8:
Successfully uninstalled cryptography-3.4.8
Successfully installed Automat-20.2.0 PyDispatcher-2.0.5 Scrapy-2.6.1 Twisted-22.2.0 constantly-15.1.0 cryptography-36.0.1 hyperlink-21.0.0 incremental-21.3.0 itemadapter-0.4.0 itemloaders-1.0.4 jmespath-0.10.0 parsel-1.6.0 protego-0.2.1 pyOpenSSL-22.0.0 queuelib-1.6.2 service-identity-21.1.0 twisted-iocpsupport-1.0.2 w3lib-1.22.0 zope.interface-5.4.0

Enterキーを押すと、インストールが開始され、上記のように「Successfully installed」と表示されます。これが表示されれば、scrapyが正常にインストールされたことになります。

なお、今回はscrapyのバージョン2.6.1をインストールしました。

コメント

タイトルとURLをコピーしました