【Python】テキストプロンプトによる音声生成モデル「Bark」のインストールと音声生成

テキストプロンプトによる音声生成モデル「Bark」のインストールと音声生成の実行について解説しています。

「Bark（https://github.com/suno-ai/bark）」は、Suno AI（https://www.suno.ai/）からによって作成されたトランスフォーマーベースのテキストからオーディオへのモデルです。非常にリアルな多言語音声の生成や、音楽、背景ノイズ、単純な効果音なども生成することができます。多言語音声のため、日本語の音声も生成可能。

■今回の環境（Python）
■Google Colabでノートブックの新規作成
■Barkをインストールする
■テキストプロンプトから音声を生成する
1. ■実行・検証

■今回の環境（Python）

今回のPythonは、バージョン3.10.11を用いる。（なお、Google Colaboratory(Google Colab)を使用。）

■Google Colabでノートブックの新規作成

まずは、Google Colab（https://colab.research.google.com/）にアクセスします。アクセス後、お持ちのGoogleアカウントでログインされているか確認しておきましょう。

確認後、Google Colab上部の「ファイル」から「ノートブックを新規作成」をクリックします。

クリックすると「ノートブック」が作成されます。

■Barkをインストールする

Barkをインストールを行いますが、今回はpipを経由してインストールを行うので、ノートブックのコードセルにコードを記述します。

! pip install git+https://github.com/suno-ai/bark.git

上記のコマンドを入力し、実行ボタンをクリックします。

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting git+https://github.com/suno-ai/bark.git
Cloning https://github.com/suno-ai/bark.git to /tmp/pip-req-build-sy93golc
Running command git clone --filter=blob:none --quiet https://github.com/suno-ai/bark.git /tmp/pip-req-build-sy93golc
Resolved https://github.com/suno-ai/bark.git to commit f6f2db527b13c4a3e52ed6fbac587aadc3723eb6
Installing build dependencies ... done
Getting requirements to build wheel ... done
Installing backend dependencies ... done
Preparing metadata (pyproject.toml) ... done
Collecting boto3 (from suno-bark==0.0.1a0)
Downloading boto3-1.26.144-py3-none-any.whl (135 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 135.6/135.6 kB 4.2 MB/s eta 0:00:00
Collecting encodec (from suno-bark==0.0.1a0)
Downloading encodec-0.1.1.tar.gz (3.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.7/3.7 MB 55.2 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting funcy (from suno-bark==0.0.1a0)
Downloading funcy-2.0-py2.py3-none-any.whl (30 kB)
Collecting huggingface-hub>=0.14.1 (from suno-bark==0.0.1a0)
Downloading huggingface_hub-0.14.1-py3-none-any.whl (224 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 224.5/224.5 kB 30.3 MB/s eta 0:00:00
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from suno-bark==0.0.1a0) (1.22.4)
Requirement already satisfied: scipy in /usr/local/lib/python3.10/dist-packages (from suno-bark==0.0.1a0) (1.10.1)
Collecting tokenizers (from suno-bark==0.0.1a0)
Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.8/7.8 MB 96.2 MB/s eta 0:00:00
Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from suno-bark==0.0.1a0) (2.0.1+cu118)
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from suno-bark==0.0.1a0) (4.65.0)
Collecting transformers (from suno-bark==0.0.1a0)
Downloading transformers-4.29.2-py3-none-any.whl (7.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.1/7.1 MB 95.6 MB/s eta 0:00:00
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.14.1->suno-bark==0.0.1a0) (3.12.0)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.14.1->suno-bark==0.0.1a0) (2023.4.0)
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.14.1->suno-bark==0.0.1a0) (2.27.1)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.14.1->suno-bark==0.0.1a0) (6.0)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.14.1->suno-bark==0.0.1a0) (4.5.0)
Requirement already satisfied: packaging>=20.9 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.14.1->suno-bark==0.0.1a0) (23.1)
Collecting botocore<1.30.0,>=1.29.144 (from boto3->suno-bark==0.0.1a0)
Downloading botocore-1.29.144-py3-none-any.whl (10.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.8/10.8 MB 131.6 MB/s eta 0:00:00
Collecting jmespath<2.0.0,>=0.7.1 (from boto3->suno-bark==0.0.1a0)
Downloading jmespath-1.0.1-py3-none-any.whl (20 kB)
Collecting s3transfer<0.7.0,>=0.6.0 (from boto3->suno-bark==0.0.1a0)
Downloading s3transfer-0.6.1-py3-none-any.whl (79 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 79.8/79.8 kB 10.9 MB/s eta 0:00:00
Requirement already satisfied: torchaudio in /usr/local/lib/python3.10/dist-packages (from encodec->suno-bark==0.0.1a0) (2.0.2+cu118)
Collecting einops (from encodec->suno-bark==0.0.1a0)
Downloading einops-0.6.1-py3-none-any.whl (42 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.2/42.2 kB 5.9 MB/s eta 0:00:00
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch->suno-bark==0.0.1a0) (1.11.1)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch->suno-bark==0.0.1a0) (3.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch->suno-bark==0.0.1a0) (3.1.2)
Requirement already satisfied: triton==2.0.0 in /usr/local/lib/python3.10/dist-packages (from torch->suno-bark==0.0.1a0) (2.0.0)
Requirement already satisfied: cmake in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch->suno-bark==0.0.1a0) (3.25.2)
Requirement already satisfied: lit in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch->suno-bark==0.0.1a0) (16.0.5)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers->suno-bark==0.0.1a0) (2022.10.31)
Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /usr/local/lib/python3.10/dist-packages (from botocore<1.30.0,>=1.29.144->boto3->suno-bark==0.0.1a0) (2.8.2)
Requirement already satisfied: urllib3<1.27,>=1.25.4 in /usr/local/lib/python3.10/dist-packages (from botocore<1.30.0,>=1.29.144->boto3->suno-bark==0.0.1a0) (1.26.15)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch->suno-bark==0.0.1a0) (2.1.2)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface-hub>=0.14.1->suno-bark==0.0.1a0) (2022.12.7)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface-hub>=0.14.1->suno-bark==0.0.1a0) (2.0.12)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface-hub>=0.14.1->suno-bark==0.0.1a0) (3.4)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch->suno-bark==0.0.1a0) (1.3.0)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.30.0,>=1.29.144->boto3->suno-bark==0.0.1a0) (1.16.0)
Building wheels for collected packages: suno-bark, encodec
Building wheel for suno-bark (pyproject.toml) ... done
Created wheel for suno-bark: filename=suno_bark-0.0.1a0-py3-none-any.whl size=2566930 sha256=c4113eca5b9f29f0eccbfb298452d70e2c8039e9b95fa521bf30b93cccf403a2
Stored in directory: /tmp/pip-ephem-wheel-cache-1cw39l_d/wheels/e6/6d/c2/107ed849afe600f905bb4049a026df3c7c5aa75d86c2721ec7
Building wheel for encodec (setup.py) ... done
Created wheel for encodec: filename=encodec-0.1.1-py3-none-any.whl size=45760 sha256=01c36212f96ddbcb16e58aa8ab6174cd56f52613418d0d8f37b45720a62d65a6
Stored in directory: /root/.cache/pip/wheels/fc/36/cb/81af8b985a5f5e0815312d5e52b41263237af07b977e6bcbf3
Successfully built suno-bark encodec
Installing collected packages: tokenizers, funcy, jmespath, einops, huggingface-hub, botocore, transformers, s3transfer, boto3, encodec, suno-bark
Successfully installed boto3-1.26.144 botocore-1.29.144 einops-0.6.1 encodec-0.1.1 funcy-2.0 huggingface-hub-0.14.1 jmespath-1.0.1 s3transfer-0.6.1 suno-bark-0.0.1a0 tokenizers-0.13.3 transformers-4.29.2

Enterキーを押すと、インストールが開始され、上記のように「Successfully installed」と表示されます。これが表示されれば、正常にインストールされたことになります。

■テキストプロンプトから音声を生成する

インストール後、モデルを使ってテキストプロンプトから音声を生成しますので、新しいコードセルを追加し、コードを書いていきます。

from bark import SAMPLE_RATE, generate_audio, preload_models
from scipy.io.wavfile import write as write_wav
from IPython.display import Audio

#モデルのダウンロード
preload_models()

#テキストプロンプト（指示）
text_prompt = """
    大阪について解説してください
"""
audio_array = generate_audio(text_prompt)

#音声ファイルの書き込み
write_wav("test.wav", SAMPLE_RATE, audio_array)
  
#音声ファイルの再生
Audio(audio_array, rate=SAMPLE_RATE)

■実行・検証

コードセルにコードを記述後、実行ボタンをクリックします。

クリックすると、モデルのダウンロードとインストールが開始され、プロンプト（指示）された音声が生成され、今回はtest.wavというwav形式のファイルへ生成された音声が書き込まれ、その後音声ファイルを再生することができます。完了までには多少時間がかかります。

生成された音声ファイルを作成すると日本語の音声でテキストプロンプトの内容を読み上げられることを確認しました。なお、生成された音声ファイルはGoogleドライブ上に保存されます。