Lessons learned using Arize Phoenix
LLM observability is a really fast moving field and there’s a lot of tools out there masquerading as the state of the art LLM observability tool. Most of them are peddling the typical “single pane of glass” approach, which any software engineer knows is just an outright lie. The tool that caught my with it’s ease of implementation and immediate value add is the open-source version of Arize’s Phoenix service.
I recommend running this service in a docker container alongside the rest of your application, using a client server architecture to communicate. Having attempted to build an observability tool from scratch, I have to say that I was nowhere close to the sophistication of Arize Phoenix.
I did an in-depth analysis to assess the feasibility of LLM observability via Dynatrace, Langfuse and Arize Phoenix. Dynatrace’s recent LLM observability offering, though appearing promising on the surface, is missing a lot of the key features that both Phoenix and Langfuse offer. Dynatrace doesn’t offer cross LLM evaluation nor model comparison. Langfuse’s instability amongst its dependent packages creates an implementation nightmare in production, where you need a stable LLM observability tool to progress from user feedback to diagnosing a bug.
After spending about a week, dissecting all three tools, I found from an implementation standpoint the Arize Phoenix was far and away the best open source tool for LLM observability. Now, whether it stays that way in 6 months, is anyone’s guess. This is one of the fastest growing areas of AI engineering in general, with new tools coming out everyday. If you’re already using AWS Bedrock and are looking for lower overhead, then Agent Core is a suitable alternative, however it does not have a local implementation option. The benefit here with Arize Phoenix is the continuously updated docker containers enabling steady cross environment deployments.

