Why Startups Should Build Their Own Models

March 31, 2024 • ☕️ 4 min read

In this recent YC video, the YC partners spoke about HOW YC startups are able to build their own models with limited resources. Being resourceful is a great quality of a startup founder in general. But today I wanted to talk about WHY it is important for startups to build and train their own models, the following is based on my experience building Play over the past few years, so YMMV!

“People who are really serious about software should make their own hardware” - Alan Kay

I believe that people who are really serious about their products should train their own models. To build the greatest and most differentiated products, you need to have control over as many components of your system or supply chain as possible.

Startups should understand which components are critical early on so they can concentrate their efforts there, but over time, they should branch out towards controlling their entire stack end-to-end. The more control you have over the system, the more control you have over your customers experience and value.

The above point is nothing new, you can see that in many existing great products:

  1. The iPhone, end-to-end maintains control over both the software and hardware. Now, 17 years in, we see how making their own chips is giving them more differentiation. In contrast, the only decent Android phone I’ve seen was the one built by Google, where they try to control the hardware as well. other Companies like Xiaomi ended up creating their own Android fork.

  2. Facebook’s Infra, FB built their own data centers and server chips. When I was at WhatsApp, the amount of infrastructure built globally to support the app’s reliability worldwide is stunning. However, that level of control gave them a unique value compared to alternatives hosted on the cloud.

  3. Tesla manufactures many of their components which are usually supplied by third parties.

The same principles apply to AI companies. Startups shouldn’t shy away from training their own models or fine-tuning them. This will not only give them an edge and differentiate them, but also allow them to offer much higher value to customers over time.

Start Early

Building the training muscle in your team takes time and deliberate effort. It requires experimentation to understand the right data for your modality (text, audio, images, video, etc), hiring the right team, building the infra for your models, sourcing the GPUs, and doing effective distributed training runs.

For these reasons, and assuming training a model will help provide higher value to your customers, by the time you are 100% sure you need to start training your own models, it may already be too late. It is better to start early, even if by just finetuning some models or training smaller ones just to start getting the feeling of it and understand the gaps in your team, infra, and data.

Another major benefit of starting early with smaller models and experimentation is reducing potential future risk. Training large models is time-consuming, and many mistakes, due to tokenization, dataset, model architecture, or other aspects, cannot be rooted out of the model without starting again from scratch, which will cost more money and time and you will more likely get it wrong this time due to time pressure.

Scaling Too Early

Startups usually get only a few shots at building the right prodcut before running out of money and time, so before you go spend all your funding on GPUs and researchers. Two points to keep in mind when you start training larger, time and money intensive, models:

  1. Be Resourceful, Don’t scale until you have the necessary resources (GPUs). In most cases, you can be resourceful and either engineer your way into specific behavior by fine-tuning opensource models, clever prompting, multiple agents, training smaller task-specific models, or all of the above. You can even get SOTA results with that approach. There are countless papers and projects demonstrating this.

  2. Model-market-fit (I made this up), Only scale your training and team when you have enough signs of product market fit that you know training a model will make a big difference in your user experience, product value, and growth.

If I were to start again today, I would begin with point 1 above to quickly deliver value to users and validate it. Once the value is clear, I can raise funding (or ideally use customer revenue, which is what we initially did at Play) and train/scale my own models to greatly improve the experience.

How training our models helped us at Play:

Building truly conversational human-like voices is what we (and our users) care about most. Last year, we wanted to optimize the latency of our models. The only way was to invest in the inference and training architecture of the model itself. Because we own our voice models end-to-end, we were able to train a new model and improve its inference at scale to offer users almost instant speech generation (200ms) at lower cost and large scale.

We released our Turbo model in October and it significantly helped our growth. It brought us closer to our vision of creating the most realistic, human-like conversational experiences. If the model takes seconds to respond to every query, that won’t be a human-like experience.

You don’t need to control your entire system from the beginning. Understand your product’s core value and control that. Then expand into more value and features over time, which for an AI company will mostly come from training more advanced models for your specific use cases.