Cracking the AI Open Source Code: Will History Repeat Itself in the Next Platform Shift? Part I
Part 1 of a 3 Part Series: Mapping the Open Source AI Stack
tl;dr: Earlier this year, we published an article on foundational open ecosystems, where we discussed a framework for companies building open, fundamental layers of technology that serve as platforms for broader interconnected ecosystems of companies and developers. As discussed, we believe this approach drives long-term innovation across sectors like AI, bio, compute, data, energy, finance, genomics, healthcare, identity, gaming, infra, IoT, media, music, privacy, robotics, science, social, storage and more.
In this 3-part series, we dive into a new open ecosystem that has emerged: open-source AI. In this post, we unpack the open-source AI tech stack surfacing the landscape of open source AI, illustrate the current debate occurring in the industry, and discuss the market landscape. Stay tuned for parts 2 and 3, where we uncover perspectives from industry experts (part 2), and summarize how we at Eniac invest in the space (part 3)!
At Eniac, we’re constantly looking to invest in companies building at the forefront of the next paradigm shift. Throughout our decades of investing, one thing we’ve noticed is how, time and time again, open collaboration accelerates the adoption of such technological advancements by empowering innovation across companies and geographies. We saw this with Linux and MySQL catalyzing the Internet revolution in the ’90s. We saw it with Kubernetes and Docker powering the enterprise cloud adoption over the past decade. And we see it today with Android’s dominance in mobile OS due to its open platform DNA.
It’s become clear that artificial intelligence (AI) represents the next major wave of technological change, on par with the rise of the Internet, cloud, and mobile. As such, we wanted to better understand if the precedent set by prior platform shifts will remain true: will we see success in open-source AI like we saw success in other open ecosystems? If yes, how should we think about investing in these types of companies and where does the opportunity lie?
To better understand the market, we conducted a research study, consisting of both primary and secondary research to form our perspective on the category. We interviewed many experts— builders, founders, investors, operators, researchers in the space — and conducted a survey with dozens of practitioners to better inform our thesis. In this 1st of 3 posts we dive into the emerging landscape and build a market map of the open source AI stack.
Defining the Market
It’s important to clearly define the topography of Open Source AI, as it’s constantly changing. At the moment, the key players within the foundational large-language model space are OpenAI (GPT), Anthropic (Claude), Cohere (Command), Google (BERT, T5, PaLM), Meta (LLaMA), Technology Innovation Institute (Falcon), and Mistral (Mistral-7B). Of these, only Meta’s LLaMA, TII’s Falcon, and Mistral’s Mistral-7B models are considered open-source foundation models.
Recently, there’s been a debate happening in the ecosystem related to open vs. closed-source foundational large-language models (LLMs). Experts in both the research and VC communities generally believe that an open ecosystem will have gigantic downstream benefits to companies and consumers alike; however, others in Washington are concerned with the potential negative externalities that can arise from new, unchecked technology. As discussed in our previous article, we at Eniac wholeheartedly believe that open ecosystems will thrive in the long run and are therefore investing accordingly.
In the world of AI, though, the growing landscape contains more than just foundation models. It encapsulates everything from hardware (i.e., GPUs, TPUs, etc.), to training data, frameworks, observability, security, agents, and more. To illustrate the market, we developed a high-level overview of what we consider to be the current stack of open-source AI companies.
Eniac’s Open-Source AI Market Map
Whenever VCs build market maps, there are inevitably many startups that are omitted, either purposefully or not, which tends to cause some amount of backlash. We want to caveat that the market map developed above is no exception. The AI space is changing almost every day, so as soon as we publish one version, it’s immediately outdated. That being said, we wanted to at least capture a general illustration of the market as we see it today.
Aside from the obvious big-tech players (Nvidia, Meta, etc.), each startup on this list has released an open-source version of their offering(s). Our map contains the following vertical categories and is then broken down into sub-categories that fit within each vertical. Starting from the bottom:
- Compute: The raw computing power that serves as the fundamental hardware needed to run computations for training and executing AI models at scale. More advanced hardware enables faster training times and inference.
- Chip Manufacturers: Companies like Nvidia that are designing specialized chips optimized specifically for AI workloads, providing greater efficiency and performance versus general-purpose hardware. These chips greatly accelerate the development and deployment of AI models.
- Model Developers: Creators of large-language models, like Meta (LLaMA) or TII (Falcon). These companies are developing the core AI models that provide the basic capabilities. and form the foundation of the stack.
- Model Deployment: Startups focused on taking models from research to production-ready deployment so they can reliably be integrated into applications
- AI Development Platforms: Companies like HuggingFace that provide a library of tools and services to allow other companies to build, deploy, and manage AI models themselves.
- Vector DBs: Startups developing vector databases optimized for storing and accessing the vector representations used by AI models to enable faster training and inference, which improve model performance.
- Synthetic Training Data: Companies are generating massive artificial training datasets to overcome the limitations of human-labeled data for training models since data quality, diversity and and mass are critical for performant models.
- Chaining & Fine-tuning Frameworks: Tools like OpenPipe that enable customizing, combining, and fine-tuning models for specific use cases and tasks.
Deployment, Management, & Orchestration
- General Purpose Libraries: Reusable code libraries like PyTorch that simplify the deployment of AI solutions.
- Version Control & Experiment Tracking: Startups that provide MLOps tools for managing iterations, experiments, and model versions to improve development workflows.
- Prompt Engineering: Solutions developed for optimizing prompts to get the best performance from models.
Security & Observability
- Model Validation & Monitoring: Critical for testing models for biases, errors, and monitoring their performance especially as they interact with real-world data.
- Open-Source Telemetry: Standards for monitoring and analyzing open-source projects to improve transparency, iterate, and measure performance.
- Coding: AI-assisted coding tools and IDE plugins that augment developer experiences and increase productivity.
- Agents: Virtual agents built using conversational models that can provide customer service and other capabilities as intelligent applications.
In summary, the open-source AI landscape is rapidly evolving across the full tech stack, from hardware up through application frameworks. While the debate around open versus closed-source foundation models continues, many experts agree that open collaboration has historically led to greater innovation and adoption of new technologies leading to the creation of massive companies. As the AI wave continues to build, an open ecosystem upon which developers can rapidly deploy new applications and tools will unlock incredible value across industries.
We at Eniac look forward to partnering with the visionary founders building the open-source AI stack of the future. If you’re one of those builders, please reach out firstname.lastname@example.org and @vicsingh on Twitter. Regardless, stay turned for our next post, Part 2 of 3, which documents expert insights from Eniac’s open-source AI research study!