Full-Stack AI Application Startups: An approach to durability?
By Vic Singh
With OpenAI’s recent announcements, investors and startup founders alike are rightfully getting paranoid about application layer AI startups being killed. Much like Facebook and Apple killed many startups — during the mobile era — by building native functionality into their products and ecosystems, the same is happening at the application layer in the AI era. That begs the question: how should founders and startups approach building a durable AI application company? Maybe one of the answers is going full stack — a framework and approach I’ll outline in this post.
Full Stack AI Application Companies
I’ve come to the belief that AI application startups should pursue the full stack paradigm to build durable companies. In this post I’ll outline a simplistic yet potentially powerful approach to building a full stack AI application startup. And I pose the question: is this an approach to building a durable AI application startup?
Put simply, a full-stack AI application company is one that owns the model, the data, and the application layer. This led me to a blueprint for how and why the application layer may not be commoditized in AI. Value accrues when the application layer is tightly coupled with the foundational layer — so with more usage, the data becomes more diverse but targeted and proprietary, and the strength of the model improves as it’s fine tuned making inference serving more accurate and performant with less hallucination — as the application layer grows in utility feeding valuable user data back to the model. This is the classic data flywheel effect for AI.
At its heart OpenAI is itself a full stack AI startup — they own the model, the data and the application layer and they are now building an ecosystem for developers and users. In the case of a full-stack AI application startup, the application is not just for reference; it’s not just the product being sold, but the engine that improves the data and the model.
*Note — this post may not be super relevant for startups building at the infrastructure and tooling layer. Those companies may not need to be full stack for durability — they have to build unique technology architectures with deeper and wider surface area with coverage across models and data. We can jam on that in a subsequent post.
Fast Track Full-Stack with Capital Efficiency
Say the first 4 words of that that statement 10x faster lol. When I wrote my post on foundational [open] ecosystems earlier this year, I imagined that it would be challenging to continue to find truly seed stage companies building full stack AI — at the application layer and the foundation layer — given the amount of capital and compute required to build and train a foundation model from the onset coupled with the product chops it takes to build an application users love. With the proliferation of open source models, open source data, open source frameworks for fine tuning and training, reduced compute costs for smaller models and the relative ease of building AI applications, I think there is now a potentially winning formula for building capital efficient full stack AI startups.
Here is a simplified paradigm of how to go about this.
Identify a Narrow but Novel Use Case
First, founders need to do deep primary and secondary research to get a full understanding of the AI ecosystem and the end user needs before embarking on building an AI application. The key here is to identity a narrow use case that’s truly novel and offers value to end users while also unique enough to create even more white space. This isn’t easy. The narrowness of the use case needs to be traded off for the eventuality of widening the product functionality. So founders need to thread this needle carefully to pick a use case and build an initial product with a path to broadening the reach and functionality of the eventual offering. Think of this as a layer cake that grows over time but starts with a unique and easy to tackle problem.
One can argue that OpenAI will always be a threat if you build a narrow application. While possibly true, you have to thread the needle with narrowness, novelty and expansion. If you’re diligent, thoughtful and careful enough you should be fine especially with the full stack paradigm. Pick a use case where the depth of the solution will be defensible and the ultimate full stack nature of the offering compounds that defensibility to go to market with a durable product and company.
Generally, vertical specific applications and use cases are ripe for the taking and can later be leveraged to go full stack and even horizontal.
Hypothesis Testing Thin-ish Wrappers
Once you’ve identified a use case, I believe fast moving startups should test their hypothesis by building an application as a thin-ish wrapper on top of the large models that OpenAI or other closed source providers offer. This is counter to the prevailing VC narrative but it allows startups to quickly test if their use case creates value before investing precious time and capital in sophisticated scaffolding and deep technical research and development. This will also provide unique insight and learnings as you go down to the model layer. You will also be able to quickly test if your idea has legs.
*Note: The thin-ish wrapper approach may not be the right starting point for companies building outside of natural language — computer vision, media, science, bio, robotics etc. You may have to skip to forking an open source model from the outset but you can still do so in a thin-ish and cost effective manner.
Fine Tune an Open Source Model with Open Source Data and Frameworks
Once you have built a unique application as a thin-ish wrapper and have some strong usage (either with developers or enterprise design users), you can then fork and train one of the many readily available open source models like Llama 2, Mistral or domain focused smaller models with vertical specific open source data sets based on your use case. You don’t have to use a large model, actually smaller open models can be more performant and the better trained they are, the less they will hallucinate. For instance, we’ve heard that smaller models can be trained with 10,000 pieces of narrow use case data for very cost effective compute (think thousands not millions of dollars) and you will beat GPT4 from a performance and accuracy perspective because of the narrower use case and specific data. Additionally, you can also leverage open source frameworks like Openpipe to train your model.
*An important point about data: You don’t need to start with a huge data set, especially when it’s domain specific. In the beginning, quality matters more than quantity. In addition to open source data, you can also train on synthetic data for your domain. And you can begin with what I’m calling a vertical small X model (SXM), then over time expand it to a large X model (LXM), with the X standing for any type of model.
Build Scaffolding Around the Application
Once you’ve trained an open source model with vertical specific data, you can then point the application you’ve built to your new custom proprietary model. Once the stack is complete you can then innovate at the application layer, build more functionality, integrations and product scaffolding (my partner Hadley talks about our application layer AI thesis here) and continue to accumulate usage data as you reinforce the foundation layer you now own. You will have greater control over your destiny, create defensibility and have true differentiation in the vertical use case you chose by owning the full stack. This creates lasting value and durability. Over time you can expose your stack to other developers through APIs, SDKs and open source frameworks to create an ecosystem and go horizontal.
I am making this out to be more simplistic than it is in practice, but I truly believe full stack AI application startups can be more durable and create lasting value. And with the prevalence of open source models as well as open source data sets and frameworks + cheaper compute, this path is now tractable in a capital efficient way for seed stage startups.
Is this a path to durability? I think so! If you’re building a full stack AI application startup I’d love to hear from you! @vicsingh