Building Dev Platforms in the AI Era - Challenges and Lessons Learned

Six months into a Backstage and Crossplane rollout, a client’s platform team was spending more time fighting their tooling than building with it. Adoption was below 30%. Developers were still raising Jira tickets to ops instead of using the self-service portal the team had spent months building. The platform was technically sound. It just didn’t fit how the organization actually worked.

That experience pushed me to revisit a question I thought had been settled: when does it actually make sense to build a developer platform from scratch rather than adopting off-the-shelf solutions? In 2026, AI has changed the answer in ways that are worth thinking through carefully.

What is a Developer Platform?

A developer platform is a self-service layer that abstracts infrastructure complexity away from development teams. Instead of raising tickets to ops or wrestling with cloud consoles, developers interact with a curated set of workflows designed for their organization. Provision a database, spin up an environment, deploy a service — all without leaving their normal toolchain.

The goal is to encode your organization’s operational knowledge into a product that developers actually want to use.

The Old Build-vs-Buy Calculus

Before AI tooling became capable, the decision was almost always “buy.” Building a custom developer platform meant:

A dedicated platform engineering team of 3–5 people minimum
Months of work before anything was usable
Ongoing maintenance load that grew with every new feature
Documentation that was always out of date

Off-the-shelf solutions like Backstage and Crossplane exist precisely to address this. They provide a foundation — plugin ecosystems, established patterns, community support — that lets a small team ship something functional without starting from zero.

For most organizations, that trade-off made sense. You accepted that the platform would be generic in exchange for getting it out the door.

How AI Shifts the Equation

AI coding assistants have materially changed what a small team can ship in a given amount of time. The tasks that used to eat months of platform engineering effort — writing Terraform modules, building CRUD interfaces, generating Kubernetes manifests, scaffolding API layers — are now hours or days.

This matters because most of the effort in building a custom dev platform was never the hard architectural decisions. It was the volume of implementation work around those decisions. Boilerplate, repetition, documentation, glue code. AI handles all of that well.

What AI doesn’t change: organizational alignment, understanding your developers’ actual workflows, designing UX that drives adoption, and making the right architecture calls. Those still require human judgment and time.

The practical result: a team of two platform engineers with good AI tooling can now produce what used to require a team of five or six. Custom development has become viable at an organizational scale where it simply wasn’t before.

The Client Story: Where Off-the-Shelf Hit Its Limits

Back to the client. Their setup was genuinely sophisticated — Crossplane for infrastructure abstraction, Backstage as the developer portal, ArgoCD for GitOps. The architecture was sound. The problem was in the details.

Crossplane Compositions became a maintenance burden. Crossplane’s abstraction model is powerful, but expressing your organization’s specific infrastructure patterns as Compositions requires deep knowledge of both the Crossplane API and your cloud provider’s resources. Here is a simplified version of what a database provisioning Composition looked like:

apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
  name: postgresql-aws
spec:
  compositeTypeRef:
    apiVersion: platform.example.com/v1alpha1
    kind: PostgreSQLInstance
  resources:
    - name: rdsinstance
      base:
        apiVersion: rds.aws.upbound.io/v1beta1
        kind: Instance
        spec:
          forProvider:
            region: eu-west-1
            instanceClass: db.t3.medium
            engine: postgres
            engineVersion: "15"
            skipFinalSnapshot: true
            storageEncrypted: true
            allocatedStorage: 20
      patches:
        - type: FromCompositeFieldPath
          fromFieldPath: spec.parameters.storageGB
          toFieldPath: spec.forProvider.allocatedStorage
        - type: FromCompositeFieldPath
          fromFieldPath: spec.parameters.size
          toFieldPath: spec.forProvider.instanceClass
          transforms:
            - type: map
              map:
                small: db.t3.medium
                medium: db.m5.large
                large: db.m5.2xlarge
        - type: ToCompositeFieldPath
          fromFieldPath: status.atProvider.endpoint
          toFieldPath: status.endpoint

This is manageable for one resource type. The client had twelve. Each Composition required patches for every environment-specific variation, transforms for every field that differed between teams, and status propagation logic so developers could actually see what was happening. Every time the cloud provider added a new required field or changed an API, someone had to update the Composition.

Backstage plugin integration was its own project. Connecting Backstage to the client’s specific CI/CD system, their internal secrets manager, and their custom deployment pipeline each required writing or forking existing plugins. The plugin ecosystem is rich but assumes common patterns. Where the client diverged from those patterns, they were writing TypeScript against Backstage’s internal APIs — APIs that changed between minor versions.

The result: the platform team spent roughly 60% of their time on maintenance and compatibility work. Only 40% went toward new features or improving developer experience. Developers felt it — the platform lagged behind their actual needs.

Rebuilding Custom: What We Learned

We spent eight weeks rebuilding the core of their platform as a custom application. Two engineers, heavy use of AI tooling. Here is what the experience taught us:

Start with the two or three workflows that cause the most friction. For this client it was database provisioning and environment creation. Those two workflows accounted for the majority of ops tickets. We built those first and got them in front of developers immediately. Everything else waited.

Resource status visibility is non-negotiable. Developers need to know what is happening when they trigger an infrastructure action. A database takes five minutes to provision. An environment takes longer. We used Kubernetes operators to drive the reconciliation loop, surfacing real status back to the UI at every step. Without this, developers assume the system is broken and raise a ticket anyway.

Involve developers before you think you’re ready. We ran demos at the end of every week, starting from week one when the UI was barely functional. The feedback from those sessions changed our priorities completely. Two features we had planned never got built because developers told us they didn’t care. Three things we hadn’t planned became urgent.

AI accelerates implementation, not decisions. We used AI to generate boilerplate, write Kubernetes operator scaffolding, build the API layer, and draft documentation. That work was genuinely fast. The time we spent debating architecture, talking to developers, and mapping workflows to infrastructure patterns — that took exactly as long as it would have without AI. Do not expect AI to shortcut the thinking work.

Custom still costs more upfront. Eight weeks for two engineers is not nothing. Off-the-shelf would have been running in week two. The bet is that higher adoption and lower long-term maintenance make the investment worthwhile. For this client, six months later, adoption is above 80% and the platform team is spending most of their time on new features.

When to Build, When to Buy

Off-the-shelf platforms are still the right answer for many organizations. The question is whether your workflows fit closely enough that the generic solution will actually get adopted.

Start with off-the-shelf when:

Your infrastructure patterns are standard (common cloud providers, typical Kubernetes setup)
You need something working in weeks, not months
Your platform engineering capacity is limited and you need the community to carry maintenance
Backstage’s plugin ecosystem already covers your integrations

Consider building custom when:

You have tried an off-the-shelf solution and adoption is low despite genuine effort
Your workflows diverge significantly from what the ecosystem assumes
The maintenance burden of customizing existing tools is comparable to building something purpose-built
You have platform engineering capacity and AI tooling to accelerate delivery

The two paths are not mutually exclusive. A reasonable approach is to start with Backstage as the developer portal while building custom tooling for the workflows where it matters most. Let the off-the-shelf solution handle service catalogs and documentation. Build custom where adoption and fit actually matter.

Conclusion

The build-vs-buy decision for developer platforms has not reversed — off-the-shelf solutions are still the right starting point for most teams. What has changed is the cost of the alternative. Custom development is no longer reserved for organizations with large platform engineering departments. A small team with good AI tooling can ship a purpose-built platform that fits their organization better than a customized Backstage ever will, and they can do it in a timeframe that was previously unrealistic.

The best developer platform is the one developers use. AI has made it more achievable than ever to build exactly that.

More info: