Speaking at the TiE Delhi NCR India Innovation Day – 2026, on Friday, Raghavan categorised data into several types, including “data at rest”, synthetic data, and “data in motion”, while responding to a question on the current state of resource constraints in India.
“Data at rest”, he said, refers to information already available on the internet or through traditional sources, which has effectively been used by everyone in the industry. This category of data is widely accessible and no longer offers a significant competitive advantage.
The second category is synthetic data, generated by AI models themselves. Raghavan noted that companies with strong technical capabilities can now create high-quality datasets using sophisticated pipelines, particularly for domain-specific applications.
However, he identified usage data as the highest-quality form of data in AI. Describing it as “data in motion”, he said it is generated when people actively use AI applications and systems.
This form of data is especially valuable because it is created in the process of building and operating AI products. Unlike publicly available or synthetic datasets, usage data reflects real-world interactions, behaviour, and workflows.
Raghavan added that the biggest constraint in accessing such data is distribution and adoption. Startups need market share to generate and access “data in motion”, giving them a competitive edge in the evolving AI ecosystem.
“Data of usage is the most valuable data. How did Claude get so good at coding? It’s because people used it, and that data is really the highest-quality data. Data created in the service of building an AI application is, by itself, extremely valuable. For that, you need market share — without it, you don’t get that kind of data,” Raghavan added.