Hugging FaceのInference APIで「{‘error’: ‘Model **** is currently loading’, ‘estimated_time’: **.*}」を出力される

Hugging FaceのInference APIで「{‘error’: ‘Model **** is currently loading’, ‘estimated_time’: **.*}」を出力される。

Hugging FaceのInference APIを介して言語モデルのInference（推論）を実行させようとすると、「{‘error’: ‘Model ** is currently loading’, ‘estimated_time’: **}」を出力されることがあります。リクエストされたモデルが現在読み込み中であることを示し、使用することができない状態となっている。モデルの読み込みが完了するまで待つ必要がありますが、待っていっこうに読み込みされない。

この原因を調べてみると、言語モデルの容量が大きすぎるとAPIが扱うことができないようだ。Hugging FaceのInference API（https://huggingface.co/inference-api）でも記載されているが、「Large models (>10gb) require dedicated infrastructure and maintenance to work reliably, we can support this via an enterprise plan with yearly commitment.（大容量モデル（10GB以上）の安定した動作には、専用のインフラとメンテナンスが必要ですが、年間契約によるエンタープライズプランで対応可能です。）」とのこと。

Hugging FaceのInference APIで「{‘error’: ‘Model ** is currently loading’, ‘estimated_time’: .*}」を出力される

コメント