Building Scalable Agentic AI using LangGraph and GCP CloudRun

Introduction

If you’re building with Generative AI, you’ve likely been exploring the power of AI agents – applications that can perform complex reasoning, plan sequences of actions, and interact more dynamically. LangGraph (from LangChain-ai) is a popular open-source framework ideal for mapping out how LLM calls and tools connect to bring your AI agents to life. But what happens when you’re ready to move your LangGraph agents from your local machine to the cloud?

While LangChain-ai offers its own paid platform, “LangGraph Platform”, for deploying these agents, you’re not limited to it. With a bit of setup, you can deploy your LangGraph agents on robust and scalable platforms like Google Cloud Platform (GCP) Cloud Run. Personally, LangGraph and Cloud Run are two of my favorite products because of how easy they are to get started with, yet how powerful they are under the hood.

This post will guide you through deploying your LangGraph agent to Cloud Run using FastAPI, or more specifically, another LangChain-ai project called Langserve, which handily wraps FastAPI and Pydantic for a smoother experience.
We’ll walk through the development path, from introducing a sample LangGraph agent to deploying it as a containerized application on GCP Cloud Run.

NOTE: In this post, we’ll explore how to build Agentic AI workflows—without using Anthropic’s Model Context Protocol (MCP) or Google’s Agent-to-Agent (A2A) protocol. While these protocols are popular, they aren’t required for our approach.

1. A Quick Look: LangGraph, GCP Cloud Run, and Our Approach

So, what are these key pieces we’re working with?

LangGraph: From LangChain-ai, this is an open-source framework for building those complex AI agent workflows we talked about. Think of it as a way to map out how your agent thinks and acts, using a graph-based approach to manage all the moving parts. It’s fantastic for building AI that can reason, plan, and execute multi-step tasks.

Cloud Run: This is Google Cloud’s serverless platform for running containerized applications. The beauty of it is that you just give it your container, and it handles the rest – scaling up or down (even to zero, so you only pay for what you use) and making your app available via an HTTP request. It’s a really straightforward way to get web apps and APIs live.

For deploying our LangGraph agent, we’re taking a shortcut. We could use FastAPI directly to build an API around our LangGraph agent, but LangChain-ai offers another helpful tool called Langserve. It’s built on FastAPI and adds some nice conveniences specifically for serving LangChain components, which makes it a great fit for our needs in this guide.

2. Our Star: A Simple LangGraph Agent

To illustrate the deployment process, we’ll work with a simple LangGraph agent.

To get started with LangGraph itself, we’ll install it using the command:

pip install -U langgraph

Now we’ll create a simple LangGraph chatbot agent. The graph has three nodes and four edges. The primary chatbot node calls the LLM, while the inp and outp nodes handle input and output processing, respectively.

import os
import operator
from typing import Annotated, Sequence

from langchain_core.messages import BaseMessage, HumanMessage
from langchain.chat_models import init_chat_model
from langchain_openai import ChatOpenAI
from pydantic import BaseModel
from typing_extensions import TypedDict

from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages


os.environ["OPENAI_API_KEY"] = "your-api-key"

class State(TypedDict):
    messages: Annotated[Sequence[BaseMessage], operator.add]

class InputState(TypedDict):
    input: str

class OutputState(TypedDict):
    output: str

llm = init_chat_model("openai:gpt-4.1")
def chatbot(state: State):
    return {"messages": [llm.invoke(state["messages"])]}

def inp(state: InputState):
    question = state["input"]
    return {"messages": [HumanMessage(content=question)]}

def outp(state) -> OutputState:
    msg = state["messages"][-1].content
    return {"output": msg}
    

graph_builder = StateGraph(State, input=InputState, output=OutputState)

graph_builder.add_node("inp", inp)
graph_builder.add_node("outp", outp)
graph_builder.add_node("chatbot", chatbot)

graph_builder.add_edge(START, "inp")
graph_builder.add_edge("inp", "chatbot")
graph_builder.add_edge("chatbot", "outp")
graph_builder.add_edge("outp", END)

graph = graph_builder.compile()

We can test it by calling its invoke function directly.

result = graph.invoke({"input": "What is the Capital of Canada?"})
print(result["output"])

3. Enter Langserve: Ready to Serve

Langserve is designed to make it easy to deploy LangChain Runnables and Chains (and by extension, LangGraph agents) as REST APIs. It handles input and output parsing, provides a nice web UI for interacting with your deployed agent, and more.

Interestingly, Langserve utilizes Poetry for project management and dependency handling. So, the initial setup involves a few command-line steps:

pipx install poetry
pip install "langserve[all]"
pip install -U langchain-cli

Finally, LangChain CLI provides a handy command to bootstrap a new Langserve application,

langchain app new my-agent

This sets up a skeleton Langserve application, ready for us to integrate our LangGraph agent.

Note

According to LangChain-ai, Langserve is not going to get any new features but it already has what we need.

4. Weaving It Together: LangGraph Agent Meets Langserve

Now, we need to incorporate our previously developed LangGraph agent code into the Langserve application structure we just created. A common approach is to place our agent’s code within a local Python package inside the Langserve project. For instance, we could create a directory, say packages/my_graph, within our my-agent Langserve project and put our LangGraph agent’s Python files there. We’ll also add __init__.py and pyproject.toml files, so that Poetry can install it easily.

my-agent
    ├── app
       ├── __init__.py
       └── server.py 
    ├── packages
       └── my_graph
           ├── __init__.py
           ├── my_graph.py
           └── pyproject.toml
    ├── poetry.lock
    ├── pyproject.toml
    ├── Dockerfile
    └── README.md

We’d run a Poetry command to add this local package,

poetry add packages/my_graph

The final piece of this integration puzzle is to connect our LangGraph agent to Langserve’s server.py file. This typically involves importing our agent from the my_graph package and using Langserve’s functions (like add_routes) to expose it via API endpoints.

from fastapi import FastAPI
from fastapi.responses import RedirectResponse
from langserve import add_routes

from my_graph import graph as langgraph_app


app = FastAPI()

@app.get("/")
async def redirect_root_to_docs():
    return RedirectResponse("/docs")

add_routes(app, langgraph_app, path="/chat", disabled_endpoints=["playground", "batch"])


if __name__ == "__main__":
    import uvicorn

    uvicorn.run(app, host="0.0.0.0", port=8000)

5. Local Test Drive: Ensuring Everything Clicks

With our LangGraph agent integrated into the Langserve application, it’s crucial to test it locally before containerizing and deploying. Poetry again makes this straightforward. From within our my-agent project directory, we can start the development server using a command:

poetry run langchain serve --port=8080

This will fire up a local web server, typically with a UI (thanks to Langserve) where you can interact with your agent’s endpoints, send requests, and verify that it’s working as expected. Port 8080 is just an example; you can configure it to your preference.

6. Dockerizing: Packaging Our Agent for Portability

Once we’re confident that our Langserve app with the integrated LangGraph agent is running smoothly locally, the next step is to containerize it using Docker. This involves creating a Dockerfile in the root of our my-agent project.

FROM python:3.11-slim

RUN pip install poetry==1.6.1

RUN poetry config virtualenvs.create false

WORKDIR /code

COPY ./pyproject.toml ./README.md ./poetry.lock* ./

COPY ./package[s] ./packages

RUN poetry install  --no-interaction --no-ansi --no-root

COPY ./app ./app

RUN poetry install --no-interaction --no-ansi

EXPOSE 8080

CMD ["uvicorn", "app.server:app", "--host", "0.0.0.0", "--port", "8080"]

After crafting the Dockerfile, we build the Docker image and then run it using the following commands:

docker build -t my-agent .
docker run -p 8080:8080 my-agent

This command maps port 8080 on our host to port 8080 in the container, where our Langserve app should be listening.

7. To the Cloud: Deploying on GCP Cloud Run

With a working Docker image in hand, we’re ready for the final step: deploying to GCP Cloud Run. This typically involves:

Pushing our Docker image to Google Artifact Registry.

docker tag my-agent us-central1-docker.pkg.dev/[MY-PROJECT-ID]/test/my-agent:latest docker push us-central1-docker.pkg.dev/[MY-PROJECT-ID]/test/my-agent:latest

Deploying the image to Cloud Run.

gcloud run deploy my-agent \
 --image us-central1-docker.pkg.dev/[MY-PROJECT-ID]/test/my-agent \
 --platform managed \
 --region us-central1 \
 --allow-unauthenticated

And there you have it! Our AI agent is running on Cloud Run, ready to accept as many requests as you want without worrying about scaling.

We can test it by using cURL from command line.

curl -X POST \
  -H "Content-Type: application/json" \
  -d '{"input": {"input": "What is the capital of France?"}}' \
https://my-agent-yourcloudruntag.us-central1.run.app/chat/invoke

Output

{
  "output":{
    "output":"The capital of France is **Paris**."
    },
  "metadata":{
      "run_id":"977be7d7-417d-be41-33a99a54c34e","feedback_tokens":[]
    }
}

OR we can write a simple python program that uses a RemoteRunnable client.

from langserve.client import RemoteRunnable


runnable = RemoteRunnable("https://my-agent-yourcloudruntag.us-central1.run.app/chat")

question = "What is the capital of Canada?"
result = runnable.invoke(input={"input": question})

print("----------------------------------------")
print(f"Question: {question}")
print(f"Answer: {result["output"]}")

Output

 python langchain_remote_client.py
----------------------------------------
Question: What is the capital of Canada?
Answer: The capital of Canada is **Ottawa**.

Conclusion

We’ve successfully navigated the path from developing a local LangGraph agent to deploying a robust, scalable AI service on GCP Cloud Run. By leveraging Langserve, we effectively bridged the gap, creating an accessible API endpoint. This setup not only brings your agentic AI to a wider audience but also provides a solid foundation for handling production-level traffic. The journey demonstrates how these powerful tools can be combined to build and scale sophisticated AI applications. Happy agenting! 

Production deployment considerations

The deployment example provided is intentionally simplified for demonstration purposes and is not secure for production environments. For a production deployment, you should implement several additional security measures, including:
– Proper authentication and authorization (consider Firebase Auth, Google IAM, or other identity solutions)
– HTTPS enforcement with proper certificate management
– Environment-specific configuration management (don’t hardcode sensitive values)
– Rate limiting to prevent abuse
– Input sanitization beyond basic type validation
– Logging and monitoring for security events

Further Reading and Resources

LangGraphhttps://www.langchain.com/langgraph
CloudRunhttps://cloud.google.com/run
Langservehttps://github.com/langchain-ai/langserve
FastAPIhttps://fastapi.tiangolo.com/
OpenAPIhttps://www.openapis.org/
Pydantichttps://pydantic-docs.helpmanual.io/

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top