Data is the only moat

Nevermark · 2026-01-16T00:49:01 1768524541

Data.

Vertical integration.

Horizontal integration.

Cross- and/or mass-relationship integration.

Individual relationship investment/artifacts.

Reputation for reliability, stability, or any other desired dimension.

Constant visibility in the news (good, neutral, sometimes even bad!)

A consistent attractive story or narrative around the brand.

A consistent selective story or narrative around the brand. People prefer products designed for "them".

On the dark side: intimidation. Ruthless competition, acquisitions, law suits, reputation for dominance, famously deep pockets.

To keep someone is easier. Tiny things hold onto people: An underlying model that delivers results with less irritation/glitches/hoops. Low to no-configuration installs and operation. Windows that open, and other actions that happen, instantly. Simple attention to good design can create fierce loyalty, for those for whom design or friction downgrades feel like torture.

Obviously, many more moats in the physical world.

PaulHoule · 2026-01-16T02:37:55 1768531075

AI-based product that slips past the defenses of people who think they hate AI, get turned off by branding like Copilot + PC, etc. A lot of people are really hoping it all dries up and blows away the way NFTs did.

Or maybe the honest to God non-dull tool that has nothing to do with AI. Like a Photoshop clone that does everything in linear light, makes gorgeous images, and doesn't crash when you open the font chooser.

bloppe · 2026-01-16T04:02:21 1768536141

Developers! Developers! Developers! Developers!

visarga · 2026-01-16T03:22:55 1768533775

Context is the moat, you can't eat so that I feel satiated, my context my benefits, it is nonfungible

jondwillis · 2026-01-16T05:34:55 1768541695

n to the limit

jackfranklyn · 2026-01-16T13:37:24 1768570644

The tricky thing about "data is the only moat" is that it depends heavily on what kind of data you're talking about.

Proprietary training data for foundation models? Sure, that's a real moat - until someone figures out how to generate synthetic equivalents or a new architecture makes your dataset less relevant.

But the more interesting moat is often contextual data - the stuff that accumulates from actual usage. User preferences, correction patterns, workflow-specific edge cases. That's much harder to replicate because it requires the product to be useful enough that people keep using it.

The catch is you need to survive long enough to accumulate it, which usually means having some other differentiation first. Data as a moat is less of a starting position and more of a compounding advantage once you've already won the "get people to use this thing" battle.

bezusfaphoon · 2026-01-16T16:04:39 1768579479

well said

jackfranklyn · 2026-01-16T09:42:50 1768556570

Building in a niche B2B space and this resonates. The data moat isn't just volume though - it's the accumulated understanding of edge cases.

In my domain, every user correction teaches the system something new about how actual businesses operate vs how you assumed they did when you wrote the first version. Six months of real usage with real corrections creates something a competitor can't just replicate by having more compute or a bigger training set.

The tricky part is that this kind of moat is invisible until you try to build the same thing. From the outside it looks simple. From the inside you're sitting on thousands of learned exceptions that make the difference between "works on demos" and "works on real data."

stevesimmons · 2026-01-16T13:17:05 1768569425

We totally found this doing financial document analysis. It's so quick to do an LLM-based "put this document into this schema" proof-of-concept.

Then you run it on 100,000 real documents.

And so you find there actually are so, so many exceptions and special cases. And so begins the journey of constructing layers of heuristics and codified special cases needed to turn ~80% raw accuracy to something asymptotically close to 100%.

That's the moat. At least where high accuracy is the key requirement.

estearum · 2026-01-16T13:02:15 1768568535

In case you haven't come across the idea yet, this concept is all the rage among the VC thoughtbois/gorls. Not sure if Jaya Gupta at Foundation coined or just popularized it but: context graph.

Could be a good fundraising environment for you if you find the zealots of this idea.

light_triad · 2026-01-15T22:48:28 1768517308

Distribution, brand, network effects, regulatory positioning, and execution speed all create defensibility; "data helps" doesn't imply "data is everything"

Also as foundation models improve, today's "hard to solve" problems become tomorrow's "easy to solve" problems

weinzierl · 2026-01-16T05:07:01 1768540021

Why is it that we have agents that can prospect for sales leads and answer support tickets accurately, but we don’t seem to be able to consistently generate high quality slides?

I don't know about prospecting, but "answer support tickets accurately"? Seriously, this must be ironic, right?

hmry · 2026-01-16T08:37:44 1768552664

It's great to hear you've already tried X twice. But have you tried reading our FAQ section on X? Also, try using this setting that doesn't exist or this dialog that was removed in 2022

netdevphoenix · 2026-01-16T09:19:24 1768555164

Efficiency will ultimately decide if LLMs become feasible long-term. Right now, the LLM industry is not sustainable. Investors were promised literally the future in the present and it is now undeniable that ASI, AGI or even moderately competent general purpose quasi-autonomous systems won't happen anytime soon. The reality is that there is not space for all these players in the market in the long-term. LLMs won't go away but the vast majority of mainstream providers will definitely do

whatever1 · 2026-01-15T21:20:01 1768512001

Information was always the moat for everything. We literally have spies who risk their lives to try to gain access to information.

eloisant · 2026-01-15T22:21:44 1768515704

Yes, during the 2000's there was the "mashup" fads. People creating companies around mashing data from one service to another. Like putting Craigslist listings on a Google Map.

And guess what, all those mashup companies didn't last a couple of years. Because they didn't have a direct access to data.

gopher_space · 2026-01-16T02:01:53 1768528913

Ideas that didn't scale past a comfortable income for the three people originally involved.

calvinmorrison · 2026-01-16T01:39:31 1768527571

yet sites like, gasbuddy and builtwith.com do seem to have a strong presence and a valuable one.

tehjoker · 2026-01-15T22:02:09 1768514529

This is heavily context dependent... There are plenty of situations where everyone knows the relevant factors, it's who has possession of land, resources, people, etc.

burntcaramel · 2026-01-15T23:45:44 1768520744

Don’t forget people’s minds.

- Which brands do people trust? - Which people do people of power trust?

You can have all the information in the world but if no one listens to you then it’s worthless.

behnamoh · 2026-01-16T00:34:27 1768523667

> Which brands do people trust? - Which people do people of power trust?

These are often at odds with each other. So many times engineers (people) prefer the tool that actually does the job, but the PMs (people of power) prefer shiny tools that are the "best practice" in the industry.

Example: Claude Code is great and I use it with Codex models, but people of power would rather use "Codex with ChatGPT Pro subscription" or "CC with Claude subscription" because those are what their colleagues have chosen.

NiloCK · 2026-01-16T03:03:38 1768532618

Data has historically been a moat, but I think now more than ever it's a moat of bounded size / utility.

The biggest data hoarders now compress their data into oracles whose job is to say whatever to whoever - leaking an ever-improving approximation of the data back out.

DeepSeek was a big early example of adversarial distillation, but it seems inevitable to me that frontier models can and will always be siphoned off in order to produce reasonably strong fast-follow grey market competition.

Hrun0 · 2026-01-17T00:09:48 1768608588

I find the premise that coding is one of the hardest problem for LLMs flawed. Isn't coding the easiest area for AI, with lots of data to train and easily verifiable?

andy99 · 2026-01-16T00:43:19 1768524199

What if the only moat is domains where it’s hard to judge (non superficial) quality?

Code generation, you don’t see what’s wrong right away, it’s only later in project lifecycle that you pay for it. Writing looks good to skim, is embarrassingly bad once you start reading it.

Some things (slides apparently) you notice right away how crappy they are.

I don’t think it’s just better training data, I think LLMs apply largely the same kind of zeal to different tasks. It’s the places where coherent nonsense ends up being acceptable.

I’m actually a big LLM proponent and see a bright future, but believe a critical assessment of how they work and what they do is important.

aero142 · 2026-01-16T01:20:40 1768526440

If had to answer this question 2 years ago, I wouldn't have said software was a "don't see it's bad until later" category, with compilers and it needing to actually do something very specific. However, business slides are full of exacting facts and definitely never contains generic business speak masquerading as real insight /s.

This feels like telling a story after the fact to make it fit.

crabmusket · 2026-01-16T08:36:01 1768552561

I agree, and by all accounts the success of coding agents is due to code being amenable to very fast feedback (tests, screenshots) so you can immediately detect bad code.

That's in terms of functionality, not necessarily quality though. But linters can provide some quick feedback on that in limited ways.

ralusek · 2026-01-15T22:13:32 1768515212

I feel like algorithmic/architectural breakthroughs are still the area that will show the most wins. The thing is that insights/breakthroughs of that sort that tend to be highly portable. As Meta showed, you can just pay people 10 million to come tell you what they're doing over there at that other place.

inb4 "then why do Meta's models still suck?"

nomel · 2026-01-15T23:59:35 1768521575

Hasn't this been proven true, many times now? Just look at the difference between ChatGPT 3 and 3.5, for example (which used the same dataset). That, and all the top performing models have large gains from thinking, using the exact same weights.

And, all the new research around self learning architectures has nothing to do with the datasets.

PeterStuer · 2026-01-16T18:58:20 1768589900

Anything scarce can be a moat. At the moment, getting the compute hardware is a pretty decent moat as well.

dangoodmanUT · 2026-01-16T02:56:51 1768532211

saying they swear by the cursor composer model doesn't give me a ton of confidence

adverbly · 2026-01-16T13:22:49 1768569769

Is this really where we are at now for analysis?

You get some anecdotal evidence and immediately post a hot take claiming to have discovered a new invariant?

I guess a bunch of us, including myself have taken the engagement bait here but does it really take somebody saying something stupid to start a conversation on something?

CuriouslyC · 2026-01-16T13:02:50 1768568570

Marketing/relationships is the only moat, not data. You can have amazing data and make an amazing product, and some asshat with a product that barely works and really tight marketing will crush you. Then people will ask why there isn't a product like yours on the market, all while ignoring all your marketing material.

richard___ · 2026-01-16T14:02:11 1768572131

Evidence?

niemandhier · 2026-01-16T14:29:06 1768573746

User data is a leaky moat, since you can convince people to get their data via GDPR request and hand it over to you.

The law even demands that the data is machine readable.

The only real moat is your own, observational data.

jongjong · 2026-01-15T22:24:36 1768515876

Attention is the only moat.

Companies always try to make it seem like data is valuable. Attention is valuable. With attention, you get the data for free. What they monetize is attention. Data is a small part to optimize the sale of ads but attention is the important commodity.

Why else are celebrities so well paid?

wan23 · 2026-01-16T16:26:10 1768580770

Attention is not a moat, it's the thing that's in the castle's treasure room. Without something that makes your service sticky attention may well just walk right out the door.

ndr · 2026-01-15T22:54:07 1768517647

This surely works with consumer product. Does it equally apply to b2b?

CuriouslyC · 2026-01-16T13:06:44 1768568804

Try to launch a B2B without marketing skills in 2026 and find out.

brodouevencode · 2026-01-16T18:21:30 1768587690

FAFO - forgo advertising and find out

wolttam · 2026-01-15T22:33:20 1768516400

User attention to get user data?

I feel like the the data to drive the really interesting capabilities (biological, chemical, material, etc, etc, etc) is not going to come in large part from end users.

OkayPhysicist · 2026-01-15T22:56:45 1768517805

It's the other way around. You gather user data so that you can better capture the user's attention. Attention is the valuable resource here: with attention you can shift opinions, alter behaviors, establish norms. Attention is influence.

wolttam · 2026-01-16T17:09:16 1768583356

Yeah I understood that but I don’t think we need influence over masses to train better models with novel data

iwontberude · 2026-01-15T23:30:27 1768519827

Corruption is the only moat. Oligarchs can buy anything and funnel attention and money into it, creating financial success for shareholders despite poor leadership, zero social responsibility, suboptimal ideas and execution (see: Tesla)

Just commit fraud repeatedly while owning the people who run DoJ, easy peasy, no amount of attention or cash flow can displace that.

ares623 · 2026-01-16T08:29:57 1768552197

Go further. Violence (or the threat of it) is the only moat.

guelo · 2026-01-16T00:53:03 1768524783

What's annoying is that companies capture user data and then lock it into their platforms, transform it, and resell it. But it is really the user's data that they're selling back to us. I would like regulation here, you capture my data then I can pick who you must and must not share it with.

cudgy · 2026-01-16T10:33:10 1768559590

And they simply ignore your choices anyway.

Pickingobot · 2026-01-16T15:56:28 1768578988

"Let me help you out of the water, or you'll drown!", the friendly monkey said placing the fish carefully on the tree...