高速で高レベルのWebクロール及びWebスクレイピングフレームワーク「scrapy」のインストールについて解説しています。
「scrapy(https://scrapy.org/,https://github.com/scrapy/scrapy)」は、Webサイトをクロールし、ページから構造化データを抽出することができます。Webサイトの監視、自動テストなど、幅広い目的に使用することができます。
■Python
今回のPythonのバージョンは、「3.8.5」を使用しています。(Windows10)(pythonランチャーでの確認)
■scrapyをインストールする
scrapyをインストールを行いますが、今回はpipを経由してインストールを行うので、まずWindowsのコマンドプロンプトを起動します。
pip install scrapy
起動後、上記のコマンドを入力し、Enterキーを押します。
なお、今回は、pythonランチャーを使用しており、Python Version 3.8.5にインストールを行うために、pipを使う場合にはコマンドでの切り替えを行います。
py -3.8 -m pip install scrapy
切り替えるために、上記のコマンドを入力し、Enterキーを押します。
Defaulting to user installation because normal site-packages is not writeable Collecting Scrapy Downloading Scrapy-2.6.1-py2.py3-none-any.whl (264 kB) |████████████████████████████████| 264 kB 726 kB/s Collecting zope.interface>=4.1.3 Downloading zope.interface-5.4.0-cp38-cp38-win_amd64.whl (210 kB) |████████████████████████████████| 210 kB 6.4 MB/s Collecting itemadapter>=0.1.0 Downloading itemadapter-0.4.0-py3-none-any.whl (10 kB) Requirement already satisfied: setuptools in c:\users\user_\appdata\roaming\python\python38\site-packages (from Scrapy) (58.3.0) Collecting w3lib>=1.17.0 Downloading w3lib-1.22.0-py2.py3-none-any.whl (20 kB) Collecting pyOpenSSL>=16.2.0 Downloading pyOpenSSL-22.0.0-py2.py3-none-any.whl (55 kB) |████████████████████████████████| 55 kB 1.9 MB/s Collecting parsel>=1.5.0 Downloading parsel-1.6.0-py2.py3-none-any.whl (13 kB) Requirement already satisfied: lxml>=3.5.0 in c:\users\user_\appdata\roaming\python\python38\site-packages (from Scrapy) (4.6.3) Collecting protego>=0.1.15 Downloading Protego-0.2.1-py2.py3-none-any.whl (8.2 kB) Collecting queuelib>=1.4.2 Downloading queuelib-1.6.2-py2.py3-none-any.whl (13 kB) Requirement already satisfied: cryptography>=2.0 in c:\users\user_\appdata\roaming\python\python38\site-packages (from Scrapy) (3.4.8) Requirement already satisfied: cssselect>=0.9.1 in c:\users\user_\appdata\roaming\python\python38\site-packages (from Scrapy) (1.1.0) Requirement already satisfied: tldextract in c:\users\user_\appdata\roaming\python\python38\site-packages (from Scrapy) (3.1.2) Collecting PyDispatcher>=2.0.5 Downloading PyDispatcher-2.0.5.zip (47 kB) |████████████████████████████████| 47 kB 1.3 MB/s Preparing metadata (setup.py) ... done Collecting itemloaders>=1.0.1 Downloading itemloaders-1.0.4-py3-none-any.whl (11 kB) Collecting Twisted>=17.9.0 Downloading Twisted-22.2.0-py3-none-any.whl (3.1 MB) |████████████████████████████████| 3.1 MB 3.3 MB/s Collecting service-identity>=16.0.0 Downloading service_identity-21.1.0-py2.py3-none-any.whl (12 kB) Requirement already satisfied: cffi>=1.12 in c:\users\user_\appdata\roaming\python\python38\site-packages (from cryptography>=2.0->Scrapy) (1.14.5) Collecting jmespath>=0.9.5 Downloading jmespath-0.10.0-py2.py3-none-any.whl (24 kB) Requirement already satisfied: six>=1.6.0 in c:\users\user_\appdata\roaming\python\python38\site-packages (from parsel>=1.5.0->Scrapy) (1.15.0) Collecting cryptography>=2.0 Downloading cryptography-36.0.1-cp36-abi3-win_amd64.whl (2.2 MB) |████████████████████████████████| 2.2 MB 3.3 MB/s Requirement already satisfied: pyasn1 in c:\users\user_\appdata\roaming\python\python38\site-packages (from service-identity>=16.0.0->Scrapy) (0.4.8) Requirement already satisfied: attrs>=19.1.0 in c:\users\user_\appdata\roaming\python\python38\site-packages (from service-identity>=16.0.0->Scrapy) (21.2.0) Requirement already satisfied: pyasn1-modules in c:\users\user_\appdata\roaming\python\python38\site-packages (from service-identity>=16.0.0->Scrapy) (0.2.8) Collecting incremental>=21.3.0 Downloading incremental-21.3.0-py2.py3-none-any.whl (15 kB) Collecting twisted-iocpsupport<2,>=1.0.2 Downloading twisted_iocpsupport-1.0.2-cp38-cp38-win_amd64.whl (45 kB) |████████████████████████████████| 45 kB 3.2 MB/s Collecting constantly>=15.1 Downloading constantly-15.1.0-py2.py3-none-any.whl (7.9 kB) Requirement already satisfied: typing-extensions>=3.6.5 in c:\users\user_\appdata\roaming\python\python38\site-packages (from Twisted>=17.9.0->Scrapy) (3.7.4.3) Collecting Automat>=0.8.0 Downloading Automat-20.2.0-py2.py3-none-any.whl (31 kB) Collecting hyperlink>=17.1.1 Downloading hyperlink-21.0.0-py2.py3-none-any.whl (74 kB) |████████████████████████████████| 74 kB 1.6 MB/s Requirement already satisfied: idna in c:\users\user_\appdata\roaming\python\python38\site-packages (from tldextract->Scrapy) (2.10) Requirement already satisfied: requests>=2.1.0 in c:\users\user_\appdata\roaming\python\python38\site-packages (from tldextract->Scrapy) (2.27.1) Requirement already satisfied: requests-file>=1.4 in c:\users\user_\appdata\roaming\python\python38\site-packages (from tldextract->Scrapy) (1.5.1) Requirement already satisfied: filelock>=3.0.8 in c:\users\user_\appdata\roaming\python\python38\site-packages (from tldextract->Scrapy) (3.3.1) Requirement already satisfied: pycparser in c:\users\user_\appdata\roaming\python\python38\site-packages (from cffi>=1.12->cryptography>=2.0->Scrapy) (2.20) Requirement already satisfied: certifi>=2017.4.17 in c:\users\user_\appdata\roaming\python\python38\site-packages (from requests>=2.1.0->tldextract->Scrapy) (2021.5.30) Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\users\user_\appdata\roaming\python\python38\site-packages (from requests>=2.1.0->tldextract->Scrapy) (1.26.5) Requirement already satisfied: charset-normalizer~=2.0.0 in c:\users\user_\appdata\roaming\python\python38\site-packages (from requests>=2.1.0->tldextract->Scrapy) (2.0.12) Building wheels for collected packages: PyDispatcher Building wheel for PyDispatcher (setup.py) ... done Created wheel for PyDispatcher: filename=PyDispatcher-2.0.5-py3-none-any.whl size=12559 sha256=19fe95aadffaa929ac1443a90d9dfe2895c5b8a989bd85e549d6f3a6c92d6ec7 Stored in directory: c:\users\user_\appdata\local\pip\cache\wheelscf\d7d7b5f0b9bad841ed856138ff0c5ee2bf2e04dbeb413097c8 Successfully built PyDispatcher Installing collected packages: w3lib, zope.interface, twisted-iocpsupport, parsel, jmespath, itemadapter, incremental, hyperlink, cryptography, constantly, Automat, Twisted, service-identity, queuelib, pyOpenSSL, PyDispatcher, protego, itemloaders, Scrapy Attempting uninstall: cryptography Found existing installation: cryptography 3.4.8 Uninstalling cryptography-3.4.8: Successfully uninstalled cryptography-3.4.8 Successfully installed Automat-20.2.0 PyDispatcher-2.0.5 Scrapy-2.6.1 Twisted-22.2.0 constantly-15.1.0 cryptography-36.0.1 hyperlink-21.0.0 incremental-21.3.0 itemadapter-0.4.0 itemloaders-1.0.4 jmespath-0.10.0 parsel-1.6.0 protego-0.2.1 pyOpenSSL-22.0.0 queuelib-1.6.2 service-identity-21.1.0 twisted-iocpsupport-1.0.2 w3lib-1.22.0 zope.interface-5.4.0
Enterキーを押すと、インストールが開始され、上記のように「Successfully installed」と表示されます。これが表示されれば、scrapyが正常にインストールされたことになります。
なお、今回はscrapyのバージョン2.6.1をインストールしました。
コメント