Azure OpenAI Serviceのリージョン間での応答速度を比較する

概要

Japan East、East USの2つのリージョンでAzure OpenAI Serviceの応答速度を比較
リージョン間の違いに加え、1文でのレスポンス及びストリーミングでの長文レスポンスの2x2の4パターンを検証
結果、East USリージョンではレスポンス時間がJapan Eastリージョンの1.4~2.5倍程度になった（注：ネットワーク条件に依存）

背景

2023年7月に、Azure OpenAI Serviceが日本リージョンでも作成可能になりました。それ以前はEast USを含む一部地域でのみ使用可能だった為、まずはそれらのリージョンで作成してパフォーマンスを見ることが多かったかと思います。

そこで、生成する文章の他に、応答速度が気になることが多くありました。チャットボットに質問をした際に少し待たされることはしばしばありますが、それが5秒10秒と長くなればそれだけユーザーがアプリケーションから離脱する確率も高まります。

応答速度を上げる方法はいくつか検討されていますが、CDNが各リージョンに配置されるように、一般的には地理的に近いリージョンでサービスにアクセスした方が応答速度が速くなると考えられます。そこで、今回はリージョン間で応答速度に差が出るかどうか

learn.microsoft.com

検証

実験設定

今回は以下のJapan East及びEast USの二つのリージョンでそれぞれリクエストを送信し、応答速度を比較します。

モデルにはgpt-35-turboを使用します。

リージョンの条件に加え、レスポンスに関する条件として、1文でのレスポンス及び長いレスポンスが想定されるメッセージを送信します。長いレスポンスについては、ストリーミングで受信した場合と一度に全てのメッセージを受信した場合の結果を比較します。

単文：Azure OpenAI Serviceへのリクエストのクエリ中に1文で回答する指示を末尾に付加する。
長文：Azure OpenAI Serviceへのリクエストのクエリ中に10文で回答する指示を末尾に付加する。
長文（stream）：Azure OpenAI Serviceへのリクエストのクエリ中に10文で回答する指示を末尾に付加する。また、Python SDKの機能でリクエスト送信時にstreamをTrueにすることでレスポンスを分割して受信する。

いずれもレスポンス長に差があると送信回数に差が出る為、温度パラメータを0にし、同じレスポンスが返ることを保証します。また、今回送信するリクエストではストリーミングとそうでない場合で同一の結果が返ってくるよう10文で結果を返すよう指定し、実際に同一の結果が返ることを確認済みです。

実験手順として、各条件で最初に1回簡単なリクエストを送信し、Azure OpenAI Serviceがリクエストを受け付けていることを確認します。その後、１．リクエストを送信、２．レスポンスを待機、の手順を10回ずつ実施し、各回の時間を記録します。ストリーミングについては、リクエストから初回レスポンス及び最後のレスポンスまでの時間をそれぞれ計測します。

仮説

各条件（レスポンス長、ストリーミング）において、Japan Eastリージョンの方がEast USリージョンよりレスポンス時間が短い。

結果

以下の表に結果を示します（単位はそれぞれ平均(s)/標準偏差(s)）。

通信速度が正規分布に従うか？という疑問はあったのですが、通信が安定している前提のもとでt検定をかけています。

.	Japan East	East US	t検定結果
単文	0.412, 0.157	0.994, 0.278	p=1.83181E-5(<.05)
長文	3.337, 0.379	5.747, 0.603	p=3.1195E-9(<.05)
長文（stream、初回）	0.061, 0.007	0.302, 0.192	p=9.04529E-4(<.05)
長文（stream、全部）	3.546, 0.791	5.661, 1.256	p=2.7342E-4(<.05)

各仮説が支持されました。

考察

ただ、実用上重要になるのはユースケースごとのユーザー体験の為、どのような差が生じるか考察していきます。

前提として、クライアント側のネットワーク環境に大きく左右されるため、以上の結果はあくまで参考値として、実際の秒数については自分の環境で測定する必要があります。

単文の場合時間が2倍以上、長文の場合1.4倍程度の時間がかかっています。特に長文で結果を返す場合には秒単位で待ち時間が変わることも想定される為、Japan Eastリージョンで使用することが望ましいように感じます。特にバッチ処理では時間単位で処理時間が変わることもある為、リージョンの選択も時間短縮の一つの重要な要素となります。具体例として、ユーザーが蓄積した文章や動画データを自動的に要約するようなケースでは常に高負荷になることが想定されます。ただし、今回は入力をローカルのアプリケーションで実施しているため、Azureのネットワーク上に配置されたFunctionsやBlob Srorageを使用することで速度が改善される可能性があります。

また、ストリーミングではEast USリージョンでも0.30秒と遅延をほとんど感じさせない結果となっており、リアルタイムコミュニケーションへの活用においては有効だと考えられます。しかし、他の処理と統合したエンドツーエンドの処理では遅延を感じる可能性があります。例えば音声会話のチャットボットでは音声認識・会話文生成・音声合成の手順を踏む必要があり、音声認識と音声合成で合計0.20秒の遅延が想定される場合、全体ではそれぞれ0.26秒及び0.50秒程度かかることが想定されます。一般的に音声通話においては遅延が0.4秒を超過すると違和感を感じると言われており、違和感を感じる時間のオーダーを考慮すると、100ミリ秒単位であっても遅延が削減されるメリットがあると考えられます。

補足情報

実験に使用したソースコードは以下のリポジトリにアップロードしています。

github.com

リクエストには以下のクエリを使用し、全てrole=userで送信しています。

QUERY_FIRST = "Hello. answer in 1 word." # 最初に送るクエリ
USER_QUERY = "Please tell me about Azure OpenAI Service." # ベースとなるクエリ
SUFFIX_SHORT = "Answer in 1 sentence." # 短いレスポンスを指定するクエリ
SUFFIX_LONG = "Answer in 10 sentences." # 長いレスポンスを指定するクエリ

レスポンスとしては、単文及び長文の場合で以下のレスポンスを受信しました（メッセージのみ抜粋）。

Azure OpenAI Service is a cloud-based platform that provides access to OpenAI's powerful language models for developers to build applications with natural language processing capabilities.

Azure OpenAI Service is a cloud-based platform that provides access to OpenAI's powerful artificial intelligence models. It allows developers to integrate natural language processing capabilities into their applications and services. The service offers pre-trained models for various tasks, such as text generation, language translation, sentiment analysis, and more.

Developers can use the Azure OpenAI Service to generate human-like text, create conversational agents, build chatbots, and automate customer support. The service supports multiple programming languages, including Python, JavaScript, and .NET, making it accessible to a wide range of developers.

Azure OpenAI Service leverages OpenAI's state-of-the-art models, such as GPT-3, to deliver high-quality results. These models have been trained on vast amounts of data and can understand and generate text in a contextually relevant manner.

The service provides an easy-to-use API that developers can use to interact with the models. It offers both synchronous and asynchronous methods for making API calls, allowing developers to choose the most suitable approach for their applications.

Azure OpenAI Service is highly scalable and can handle large volumes of requests. It offers flexible pricing options, including pay-as-you-go and subscription plans, making it cost-effective for both small-scale and enterprise-level deployments.

The service also provides features like rate limiting, authentication, and monitoring to ensure the security and reliability of the AI models. It integrates seamlessly with other Azure services, such as Azure Cognitive Services and Azure Machine Learning, enabling developers to build end-to-end AI solutions.

Azure OpenAI Service is continuously updated with the latest advancements from OpenAI, ensuring that developers have access to cutting-edge AI capabilities. It also offers extensive documentation, tutorials, and sample code to help developers get started quickly and easily.

Overall, Azure OpenAI Service empowers developers to leverage OpenAI's advanced AI models and build intelligent applications that can understand and generate natural language text. It provides a robust and scalable platform for integrating AI capabilities into various domains, from customer support to content generation.
Test 1:
Azure OpenAI Service is a cloud-based platform that provides access to OpenAI's powerful artificial intelligence models. It allows developers to integrate natural language processing capabilities into their applications and services. The service offers pre-trained models for various tasks, such as text generation, language translation, sentiment analysis, and more.

Developers can use the Azure OpenAI Service to generate human-like text, create conversational agents, build chatbots, and automate customer support. The service supports multiple programming languages, including Python, JavaScript, and .NET, making it accessible to a wide range of developers.

Azure OpenAI Service leverages OpenAI's state-of-the-art models, such as GPT-3, to deliver high-quality results. These models have been trained on vast amounts of data and can understand and generate text in a contextually relevant manner.

The service provides an easy-to-use API that developers can use to interact with the models. It offers both synchronous and asynchronous methods for making API calls, allowing developers to choose the most suitable approach for their applications.

Azure OpenAI Service is highly scalable and can handle large volumes of requests. It offers flexible pricing options, including pay-as-you-go and subscription plans, making it cost-effective for both small-scale and enterprise-level deployments.

The service also provides features like rate limiting, authentication, and monitoring to ensure the security and reliability of the AI models. It integrates seamlessly with other Azure services, such as Azure Cognitive Services and Azure Machine Learning, enabling developers to build end-to-end AI solutions.

Azure OpenAI Service is continuously updated with the latest advancements from OpenAI, ensuring that developers have access to cutting-edge AI capabilities. It also offers extensive documentation, tutorials, and sample code to help developers get started quickly and easily.

Overall, Azure OpenAI Service empowers developers to leverage OpenAI's advanced AI models and build intelligent applications that can understand and generate natural language text. It provides a robust and scalable platform for integrating AI capabilities into various domains, from customer support to content generation.