Title Image

Blog

5 Questions with Sarah Catanzaro By Karli Burghoff

  |   RSS Feeds

The modern data stack is often defined by the type of technologies that exist within it. Cloud-based, open source, low/no code tools, ELT, and reverse ETL. But surely there’s more to it… isn’t there?
The post 5 Questions with Sarah Catanzaro appeared first on data.world.

The new year meant it was time for another trek around the sun. It’s also time to make a few predictions about what might change in data & analytics. Is it time for data practitioners to participate in boardroom conversations? Will we see data teams embedded in business units? What new technologies will become part of the modern data stack? 

Tim and Juan were joined on the Catalog & Cocktails podcast by Sarah Catanzaro from Amplify Partners, to discuss what’s in store in 2022. Below are a few questions excerpted and lightly edited from the show.

 

Juan Sequeda:

Honest, no BS. What does that even mean to have data practitioners in the boardroom?

 

Sarah Catanzaro:

I think for the past several years, we’ve been talking about data-driven decision making. Decision making and data-driven decision making will be an interplay between data, different people’s interpretations of it, and the way in which it’s communicated. But what we’re doing right now is effectively saying, “We’re going to put some charts in a boardroom and expect people, even those without any formal trading and data analysis, or statistics, or data biz, or whatever else to figure it out.

I think ultimately that’s creating missed opportunities on both sides, both on the part of the executive leadership who can’t really interact with the data. But also, on behalf of those who are producing the data, who don’t have the context for what decisions are being made, what questions would be asked, what kind of quality is needed from the data, etc. So, I don’t think it’s enough to put data in the boardroom. I think you need specialists, you need data analysts or other data practitioners who can understand the way in which data is being consumed, modify the way in which data is being produced based on that context. But also, guide some of the decision making process with additional information and context

 

Juan:

What are you seeing that’s missing from an executive strategy process?

 

Sarah:

It seems for the past five years or so, particularly in the most recent one to two years, companies have really focused heavily on their data. But, in our efforts to improve data quality to shift focus from just the analysis to the data, we’ve lost sight of what you do with the data. This is a question that I often ask data leaders, “you’re investing all of this time, effort, energy in improving the quality of your data. So, when your data is good, what do you do differently?” And the answer that people often give me is like, “Well, then our analysis is better.” I’m like, “Sure, that’s a start.” But, I think we need to be focused more on thinking about not just what can we do better but what can we do that we could not do before with data of insufficient quality?

I think there is a lot of opportunity related to more advanced experimentation, related to observational causal inference. And perhaps if we even push the boundaries further into areas like parametric design or even automated hypothesis generation. And I think there are other topics, too where we can start to engage in more advanced forms of the data analysis.

 

Tim Gasper:

Where are these things going to manifest most? More tooling that supports the data scientist? Is it going to kind of manifest itself in advanced features of the BI layer, or maybe all of the above?

 

Sarah:

I’ll start off in the nearest term. I think what will probably happen imminently is that some executive is going to go say to the data team later like, “You’ve been investing all of this time, all of this effort, all of this money into these data models, into these gold data sets. Why? Show me the ROI.” And I think most of us can recognize that until data is acted upon, until it is transformed into some sort of product experience or until it informs a decision, it’s not valuable. Data sitting in your data warehouse is just not valuable until you do something with it. I think the next step that we’re seeing now is kind of the rise of operational data of some of these reverse ELT platforms where the data is at least being piped into these surface areas, into the context where people are making decisions.

You’re seeing more and more data apps. But I think the ROI, it may not be enough. And so I guess, the question then becomes, “where are data teams going to need to justify their existence the most?” Who has invested most heavily in these initiatives related to data quality where they’re going to need to really explore the frontiers of what is possible in order to justify those investments? I don’t know yet, but certainly the obvious answer is the companies with the biggest data teams, the companies that hired armies of people saying like, “Okay, as soon as we figure out this data quality thing, goodness is going to happen.”

Juan:

One could argue that we’ve just been spending the last decade to making sure we have strong foundations of everything, which is what we need. So what’s next?

 

Sarah:

I think where I see the more advanced companies are at is experimentation. So you’ve got your data models, you’ve got your metrics, how do you generate ROI from them? Well, you run experiments. And I think many executives can also understand the value of experiments because so many of us were trained in the scientific method in elementary and middle school. You can understand the notion that you run an experiment, you collect data, you analyze that data. And from that data, you generate a conclusion and that conclusion has impact. So experimentation seems to be where many companies are at right now, but then you start kind of dragging innovation into experimentation. You start moving from two sample t-tests to contextual bandits. And I think that is where you start to see some really exciting things where you’re not just testing hypotheses, you’re testing multiple hypotheses. That’s kind of the area that’s super exciting to me, like moving beyond the A/B test to advanced experimentation.

I think the other thing that is also exciting is observational causal inference; it has been relatively under-explored, perhaps for some good reasons. But I’m hoping that within the next several years, some of the technologies, including, for example, to validate causal models will improve, exist and unlock more opportunities there.

Juan:

Where does this responsibility of data land?

 

Sarah:

I would take issue with the notion that everybody is going to become engineers. But if I had any gripe about data and the data ecosystem, it would be the number of companies that just act as if analysis is intuition. It’s not. That’s why there’s analysis and there’s intuition. I think my objection to the notion that engineers should be responsible and accountable for data quality and the quality of data products was that engineers have to be responsible and accountable for the quality and delivery of software and features. And we can’t keep adding things to the list of job requirements for any given role. I think we need to start thinking more about teams. And I think the notion that product teams should be responsible for data quality for data products is very reasonable.

Think about it, what if product teams or product engineering teams were not just responsible for delivering features, but if the successful delivery of a feature hinged on being able to objectively evaluate if that feature changed user behavior or impacted the company’s other strategic objectives in a meaningful way… I think that definition of software development, of feature delivery, application delivery, it is more kind of inclusive of data. I think it’s that sort of behavioral change or that sort of perspective that helps us move closer to data driven decision making.

 

Key Takeaways:

Recent years, companies have been focused on crafting data models, documenting, and monitoring.
We’ve lost sight of what we do with the data.
Companies and the board room should have “data interpreters.”

Visit Catalog & Cocktails to listen to the full episode with Sarah. And check out other episodes you might have missed.

The post 5 Questions with Sarah Catanzaro appeared first on data.world.