Building for data providers - Amplify Data

Solving the challenges of DaaS businesses through software

Intro

We had the opportunity to sit down with the Co-Founder & CEO of https://www.amplifydata.io/ Karthik Kumar to discuss data liquidity, building a DaaS company and how to take advantage of an evolving world where almost every company is in one way or another a data company. 

Background

Before we dive into the details, could you please tell us a little about your background and what led you to start Amplify?

Sure - I’ve spent my entire career around B2B tech, and specifically data startups. I’ve been on both the VC investing side, where at NEA I invested in early-stage data startups and on the operating side where I helped launch Aera Technology, a predictive/prescriptive analytics company. 

Throughout my experience I kept seeing how the real challenge wasn’t the algorithm, the application, or the analytics, but the underlying data itself, so much so that companies were willing to pay big money for that data. Data is the real product (and increasingly so), but unfortunately there aren't many service providers building for the emerging DaaS market. My cofounder, who previously led an analytics team at Mastercard experienced similar challenges on the technical side with data sharing and so we decided to team up and start Amplify!   

Who do you come up against in this space, and how does Amplify differentiate itself from them?

There’s not actually very many service providers building for Data Providers, but from what I see, there are three major categories - first the “connectors” like the marketplaces, the brokers, and communities. They’re great at discovering data and connecting buyers and sellers, effectively generating leads for marketing teams but don’t solve the operational needs of the Sales, CS, and Data teams.

Then there are the data tools - the companies that are really good at processing or moving data from point A to B. These are tools that solve the technical & infrastructure challenges of cross-cloud data management but don’t necessarily solve for the operational workflows of the business users. The platform providers like Snowflake and Databricks fit in here as well.

Finally there’s DIY - where a lot of Data Providers have hacked together internal tools using something like Retool or other people-processes. This is probably what we see and compete with most often and it’s definitely not a good use of their time to do this.   

We want to give Data Providers an integrated platform to actually run their business - their common workflows. For example, as a sales rep, I want to quickly create a custom data sample for a prospect without needing an engineer or writing SQL. As a CSM, I want to onboard and monitor the health of a customer without logging into 10 different systems. And as a Data engineer, I want to focus on creating the data products themselves and not fielding a million requests from the GTM team.

By integrating data-layer processing and orchestration with application-layer workflow software, we allow Data Providers to focus on their unique value proposition - the data itself, and offload the undifferentiated mechanics to us.  

How do you respond to the overachieving technical folks who say “hey I can just set this up on my own, hack together some SQL queries, create a few views and be on my merry way?”

The answer is simple, you absolutely could, but is building everything out and maintaining it the best use of your time? I think it's important to lean into what you do well. And a data provider's unique value proposition is their data right? They exist because of the data assets that they are monetizing. And so every minute anyone on the team spends that's not collecting, cleaning, harmonizing, indexing, packaging your data product or products for ultimate consumption is a minute wasted. Let us handle that and give you a best in class experience so that you can focus on what really matters.

Additionally, you have to realize that most companies are composed of technical and non-technical individuals like the sales and CS folks who may not be as SQL savvy. We empower them to be autonomous and not rely on an engineering team which is underwater fixing technical debt 90 Jira tickets deep. 

You’re building vertical software for the DaaS community - how do the needs of DaaS differ from wider Saas? In general, what are the biggest differentiators? 

Honestly, a DaaS business is still at the end of the day a business - so the core needs are the same: A good way to find and attract customers, efficient delivery mechanism / time to-value,  a way to ensure customer satisfaction, and constant innovation.

Marketplace or not, you still need a sales team. On the product side, I think since the product here is the data itself and not an application, things manifest differently - so for example onboarding is less about solving the cold-start problem that most application SaaS companies have and more about integrating datasets quickly. Customer engagement is measured not in DAUs or MAUs but rather data consumptions, and innovation comes in the form of new attributes / increased data coverage instead of new features.

Solving the Problem:

What are the main challenges businesses face when trying to monetize their data assets?

Again, I personally think it’s best to think of data monetization just like you would any other business venture and start with the customer demand first: 

  • What data do my customers want?

  • Where are they searching for it?

  • Why do they want it / What is it worth to them?

  • How do they want to receive it?

I see too many businesses starting with their dataset first and then searching for customers that want it. That’s akin to a solution looking for a problem. 

There are numerous data marketplaces and data discovery platforms available today, where do you think they’re succeeding, and where do they fall short?

In one sense, I think the marketplaces are great in that they level the playing field for smaller data providers to get their name out there. Where they might be falling short is in the expectations they set. It’s not like you list your data on a marketplace and overnight get money - Instead what you get are leads, that you then have to nurture and close like any other sales process. 

Within marketplaces, so long as they’re all competing on just the number of providers/datasets listed, it’s effectively an Uber vs Lyft situation, where a buyer is indifferent to which marketplace they use. I think marketplaces need to move lower down in the funnel and compete on additional value-add services like sampling, customization, contracting, and delivery. The good news is that as I’ve spoken to most of the independent marketplace providers, this is the direction they’re heading! Meanwhile the major platform marketplaces will do this by increasing their integrations deeper into their own stacks. 

Your platform offers white labeling options - was that a response to market demands or part of the original design?

Actually yes, we’ve been white-labeled from the start. Coming from the data world, we were used to every tool being focused on internal use cases - everything from your data warehouse to BI tool is an internally-facing tool. We thought there was a gap in the market for an externally facing data-sharing tool. 

And this made sense when we spoke to Data Providers - when it comes to technical talent, most data providers are staffed with data engineers and not software engineers, so the idea of creating a “frontend” for their data is a daunting task. Much easier to just whitelabel Amplify than it is to build an external-facing application from scratch, especially since this is our core business. 

Can you discuss any success stories where Amplify has significantly improved a company's data sharing efforts?

Oh yeah - one of our recent customers is a major consumer transaction data provider. Previously they were delivering their data by granting access to an S3 bucket and any customization required the data engineering team to create brand new versions of the files. With Amplify, they not only have a frontend that their sales teams can use to demo data, their CS teams can autonomously customize data products by ticker, industries, geographies and so forth without needing to bother the data engineering team.

On the data side, they can now offer delivery to over 10 destinations and for a fraction of the infrastructure cost. We’re talking a 12+ month compression of product roadmap, addressable market expansion to corporates / smaller funds, and a multiple X reduction in infrastructure cost. 

On Failure

If you were starting Amplify again today, what would you do differently? Is there anything that’s been either an unexpected success or a spectacular failure? 

Focus! We built Amplify with data providers as our ICP and data monetization as the use-case from the start. But about a year into our journey, there were some other opportunities that came in focused on embedded-ETL. We mistakenly took the bait and spent 3-4 months building and selling outside of our core focus zone. We got a LOT of meetings but 0 signed contracts… Lesson learned! As a startup we can’t do everything and we need to stay laser focused on what we do well.

On the unexpected success side: community! The DaaS community is awesome. I wish I had invested my time in it sooner. I’ve met so many amazing, helpful people through communities like World of DaaS, Neudata, Eagle Alpha, and all the data happy hours. In fact today, that’s where most of our early customers, prospects, and product feedback comes from! 

Looking Forward:

What trends do you foresee in the data sharing economy over the next 5-10 years?

This is both something I see but also a wishlist, I envision a world where a Data Product and the underlying technology-layer is decoupled. People will transact “data access” and delivery will become an implementation detail. Data Products will be personalized for every customer and monetized on relevance and consumption as opposed to technical constraints like GBs and and files.

When that happens, you can see data commerce becoming very personalized and almost distributed in a way - imagine a world where you can purchase X attributes from one vendor and Y attributes from another vendor and then intelligently stitch them together to create your own personal Data Product, with things like pricing attribution and infrastructure seamlessly flowing through.      

There are very few publicly traded DaaS companies, and even fewer unicorns. Is there a reason you think data companies are less likely to go public? 

I think if done right, they just don’t need to. Going public, other than being a signal for business maturity, is ultimately a tool to diversify risk or provide liquidity to early shareholders (aka VCs). I think this stems from the asset-light nature of data. In theory it is infinitely scalable, very high margin, and requires little upfront capex or build time. This means new data providers can get up and running quickly (testing the market via marketplaces) and don’t need much upfront investment from VCs. That means there is less shareholder pressure to go public and instead I’d expect most DaaS companies to shoot for profitability as their end-state rather than an IPO. This makes them ideal PE candidates, which is ultimately what we see, like Nielsen’s take private in 2022.

What role do you think AI and machine learning will play in the future of data liquidity?

Like our original thesis - the algorithms, applications, and analytics will all either become commoditized or become trivial to build. It’s the underlying data that will decide the outcome, and so to that extent I hope it serves as a call to action for new DaaS providers to rise to the occasion. 

Realistically, I expect that AI/ML will be increasingly incorporated into the data commerce transaction to help create truly personalized Data Products, speed up data diligence, and cut out manual integration.  

On the topic of AI, there’s been a lot of talk about data scarcity limiting the rate of development for LLMs. Do you think it’s overblown or do you expect companies with data exhaust assets trying to finally monetize their data to meet this demand?

I think it’s a bit of both. Naturally there will be new companies rising to monetize the crazy data demand. Here I’d caution what I said before - data is not oil, it is not fungible. And as such these new companies need to consider what the customers want before diving in.

On the flip side, LLMs are already really advanced - are they true AGI, no, but do they need to be to deliver value? I think we’ve barely scratched the surface of what we can do with the current state of LLMs and so rather than worrying about the pace of LLM development, I’d rather focus on accelerating how companies solve problems using LLMs today. 

Closing Remarks

What are you working on next and what is the best way to keep updated about the latest developments? 

This is when I have to play the AI card myself! In the spirit of helping data providers solve their operational challenges, I’m personally exploring ways to incorporate AI in a way that is not gimmicky but rather helpful. 

Otherwise we continue to add new capabilities to our platform that meet the needs of modern data providers - things that only folks in our industry care about like “automatically creating a 90-day lagged data product that contains multiple related tables”

I post pretty regularly on Linkedin so please follow me and our company page or catch me at the next data event!

Reply

or to participate.