Interview with Big Data Federation

It’s Pronounced Data + Big Data Federation Discussion

April 2025

You’ve hopefully seen the posts for Big Data Federation and their OdinUltra platform over the past few weeks in my newsletter as well as some of the other industry newsletters. A big thank you to BDF for sponsoring this newsletter and helping to keep it free for readers. I was really intrigued by BDF’s offering and set up time last week for a conversation with Pouya Taaghol, Founder & CEO, and Daniel Goldberg, an Advisor to BDF and long-time investment and data industry exec who originally introduced me.

Pouya outlined how the company leverages vast quantities of public and commercial data to build highly accurate financial forecasts and macroeconomic models. Their proprietary platform, OdinUltra, combines deep preprocessing, ecosystem modeling, and ensembling of 27+ forecasting models to deliver daily-updating insights for 1,300+ companies. The forecasts focus on metrics like revenue, EBITDA, and adjusted gross income, and are used by both quantitative and discretionary investors. The conversation also covered Pouya's background, the accidental founding of BDF, and his goal to expand the model internationally, starting with Japan.

Here are a few key takeaways and I’ve also included a transcript of our full discussion with some screenshots below. Be sure to check out BDF and let me know if you’d like an intro.

Key Takeaways

1. Origin Story and Philosophy

  • Pouya has a PhD in electrical engineering and worked on the first 3G systems. He was then CTO of the Mobile Wireless Group at Intel and CTO of the Home Networking Business Unit at Cisco.

  • He and some colleagues started analyzing satellite imagery and eventually forecasting.

  • The forecasting platform was unplanned, driven by curiosity and experimentation.

  • His outsider perspective challenges Wall Street norms, focusing more on operational signals and ecosystem understanding than on traditional financial modeling.

2. Forecasting Innovation Beyond Traditional Data

  • BDF does not use panel data (credit cards, email receipts) but instead pulls from ~1,400 public and commercial sources, processing 3 trillion data points daily.

  • Their forecasting emphasizes data combinations, not just single-variable correlations. For example, home price prediction improves with lower-correlation variables like age and location, illustrating how complex models outperform simplistic ones.

3. OdinUltra: The Core Forecasting Platform

  • OdinUltra powers company-level forecasts across revenue, gross income, and EBITDA.

  • It uses a suite of 27 models including autoregressive and ecosystem-based models, refined with ensembling techniques.

  • Forecasts are stable and updated daily, often 45 days ahead of earnings, and include probability of beat/miss guidance and driver explanations.

4. Ecosystem-Based Modeling as a Differentiator

  • BDF’s standout model is the ecosystem model, which maps intercompany relationships (supply chains, customer dependencies, capital markets influence).

  • Example: Nvidia’s forecast considers data center financing and competitors’ earnings, not just Nvidia’s own metrics.

5. Transparency and Model Independence, and Complementarity

  • BDF provides transparency into forecast drivers, but not the formulas as these are dynamic and change daily.

  • Encourages clients to treat BDF models as independent complements to their own, rather than looking to build them into an existing model.

  • The platform is increasingly attractive to quantitative funds and gaining traction among discretionary managers.

6. Roadmap and Vision

  • Exploring international expansion, starting with Japan due to similar market dynamics.

  • Continuous enhancement of models and coverage of additional tickers.

  • Recent launch of earnings guidance forecasting has been well received.

Full Transcript

DE: Dan Entrup

PT: Pouya Taaghol

DG: Daniel Goldberg

PT: We started with forecasting fundamentals and really large sets of data, public data. We do not use any panel data - no credit card, none of that. And then we also started to develop strategies to make money from it - we thought, why don’t we just use this data, we realize fundamental itself isn’t sufficient, there are a bunch of other metrics that move asset prices. So we started developing all these technologies, and then we bundled the forecasting and backtesting capabilities under one tool, OdinUltra. We also started a hedge fund in 2021 that is kind of correlated, multi-strategy, has about $50M in it and is going pretty well.

So fundamental is very very critical to the market, especially the US market, which is one of the few where actually fundamental rules. If you knew the forward revenue of the S&P 500 at any given time, one quarter out, you can actually see the stock price of the index very much matching the fundamentals. Generally we come back to the fundamental. Earnings itself is actually pretty important on a single day - after the report of the companies, there is $2.2T of market cap change in 1-day after earnings for just the S&P 500. 51% of stocks move up.

Forecasting is really hard, in general we’re talking about forecasting. It’s just the analysts are pretty off already. If you actually look at the companies themselves, they are guiding less and less and less. If you actually look in the S&P 500, about half of them guide something on an annual basis, like EPS. Some guide annual revenue. You come closer to the quarter, not many people guide anything. This one [quarterly] is much more important, because who cares what happens a year from now, people want to know where we are in the quarter. There is less and less insight from the enterprise themselves.

The way we approach forecasting is, the data itself is kind of important, but at the same time it is not enough. Here’s an example, quite a factual dataset on home prices.

Let’s say you want to predict home prices. You take home size, home age, and home location. These locations are all factorized out there. You can actually see, and we know that the home prices are very much correlated to home size, right? Bigger home, bigger price, we know that. If you try to just use that, like if you have just email receipts on Amazon sales, it has a high correlation but doesn’t really help much. If you take just home size and try to predict home prices, you get almost 9% error. Here we’re just using very simple multi-regression and linear regression. For our system we use much more complex modeling. If you take for example the size and you put the location in, all of the sudden the error drops to below 3%, even though the home location has a lower correlation than size. So correlation is not a good indicator. If you include home age, which has a negative correlation, the error drops even to 2%. So this is a very interesting phenomenon - when you’re actually forecasting things you will see some of this data is able to help you on places where an individual correlation is not able to help.

DG: I would say that in the conversations I’ve been involved in when talking to data buyers, this really resonates well, because people just don’t think - if we have 1 dataset, say credit card data that’s 98% correlated to revenues, that’s all we need. But here it is clear that you put other factors in there and build a model and you get even better results. A lot of the buyside who use data don’t really think of it that way.

DE: I think people talk about it a lot - you need more datasets, i.e. geolocation next to email receipts next to credit card data, but this lets me actually see it in a very clear cut example showing the correlation as independent and then showing the combination.

PT: We have about 1,400 data sources. Things like TSA, drug sales, government, cyber attacks, diseases, import/export, commodity prices, all this published information. All of these we capture from reasonably valid government sources or associations or other companies.

We process about 3 trillion data points per day. Something like this would not have been possible 20 years ago. There are 15 million time series involved in our forecast.

DE: Is it all public or free datasets?

PT: it’s a combination of free datasets and commercially licensed datasets.

We do a lot of preprocessing on the data, like aligning the company quarters. Company quarters do not always match data source frequency. There are a bunch of things we have to do like adjust for seasonality. The data, even though it comes from really good sources, you have to still do some cleaning. You still have to do some proprietary aggregation.

Then there are 27 models we use to do the forecasting. These models could be based on ecosystem, seasonal autoregression, which basically says “I don’t know anything about anything, I’m just looking at the trend and adjusting for seasonality.” This autoregression works quite well for SaaS enterprise software companies. There’s the alternative data case where some data is mapped to certain companies, like airlines and TSA data are mapped to each other automatically. We also do some forecasts just purely on macro - something like Walmart, or something really huge, there’s a lot of macro impact. We use the last 2 years of data for validation and the rest of the data for training. So in other words, we’re not overfitting. When these numbers are produced, they go through some ensembling. The ensembling may pick one or may combine them. This is really automated and really intense. The forecasts we produce - just to run this on its own - takes between 6-12 hours per day, every day to process these forecasts.

What we produce is a bunch of things. Forecasts at the company level on Revenue, Adjusted Gross Income, Adjusted EBITDA - top line down to bottom line. Confidence metrics, Industry metrics. We cover 1,300 companies. Historical data goes back to 2016. We do a bottom-up macro forecast. We can aggregate these up to macro.

It’s available through our data feed and our portal. We started selling it last year and we signed a few very large quants already and have a bunch in the pipeline. We also do stuff with Guidance (only available through the portal). If the company guides, we provide a probabilistic value that they will beat or miss. What is the expected? We also provide the key drivers that were used to forecast these metrics. The forecasts are available stable 45 days prior to the print. The accuracy is pretty high. We publish our accuracy every quarter. In this chart, you can see we delivered 1.4% on the accuracy on the most efficient asset class in the world - the US stock market. Delivering this kind of edge is very hard.

DE: The midcap numbers are really impressive. They’re all impressive but I’m looking at the midcap line in particular.

PT: It changes quarter to quarter.

One quarter out we know what the revenue is of the S&P 500 which this level of accuracy, 0.20% aggregated absolute forecast error.

I can give you a bunch of examples, best is to show you on the tool. Let’s look at Five Below. You can actually see a forecast. Analysts are pretty good on this one. You can see a daily revision on how our numbers evolve throughout the quarter.

DG: This is definitely something that the buyside finds valuable. It’s updated daily. It’s not changing significantly on a daily basis but you can see the estimates evolve throughout the quarter, as opposed to sell-side where two weeks before earnings they change their numbers drastically. You can monitor this on a daily basis.

You can do this on revenue, gross income, and ebitda

PT: Sometimes we also have some operating metrics. We put the probability on their beat or miss on what they are going to guide. Five Below guides on 4 metrics. Our number does not change, but the estimate changes in probability.

Something like Nvidia for example, you can actually see how close we are to the actual. So this is again, the guidance comes in, they only guide revenue, we’re 50/50.

This is the restaurant chain, Darden, or Olive Garden. We also show what metric was used to forecast. It’s pretty fascinating, this is all generated from machines across hundreds of metrics. We show the top 20 for companies. There are different drivers for gross income, EBITDA, FCF, CapEx.

For revenue, you see it is picking up a lot of apparel drivers. Not just apparel revenue, but apparel profit. It’s not very easy to understand because they are so used to looking for credit card data for this. This is fascinating because what it’s really saying is, how will the apparel have higher gross income if they are selling more expensive clothes? Why would people buy more expensive clothes? Because they are going out. What do people do when they go out? They eat. Fascinating connection we can explain. Sometimes we cannot explain it.

DE: you’re following the customer journey.

PT: For example, here on the Nvidia example it is picking up POWR’s net income, which is a chip maker. You can also see the competitors, Nvidia reports after everyone else. We are looking at the whole ecosystem. It picks up quite a bit of capital market players. Why are these picked up? Because data center spending requires financing. It is closely related to the capital markets environment.

DE: I see the industrial REITs on there.

PT: It also picks up THO. Thor, they make RVs. It is correlated to datacenters in that they are also using financing to be purchased. RVs are non essential.

I have plenty more details on how we handle biases and those scenarios.

Back to our forecasts, there are multiple models but the model that is all powerful is the ecosystem model. The reason for that is that the companies are actually very connected to each other. The companies are interconnected, nothing happens in a vacuum. If Nvidia is deploying and selling chips to datacenters for AI, there are a whole bunch of players that are affected, it just doesn’t happen overnight. Nvidia is not driving the market, something else you can actually see is. You can look at the supply chain, customers, and actually see factual data. They all report at different times, so you can actually get the ecosystem and information from there to actually see where things are headed.

DG: I would just add that the ecosystem data has been very eye opening to people who haven’t really dug in or even thought about it. A few people have suggested, if I’m meeting with management at a conference and I know that historically XYZ company's net income has been very correlated, I know they just reported a terrible quarter. Can I push the company management on that and see what they say?

PT: Here, we just did the aggregation of the sector-level. You can take a sector like Communication Services and look at the components in the index and add up their financials. You get a really good view that this is a seasonal business. You can see there was pretty healthy growth but we’re expecting slowdown. Let’s look at industrials

We also do some of the CPI components for example. We’re pretty good at the headline, the core, the forecast is heading down. Not necessarily good news because it seems like people are spending less. You can go into the components of CPI, like Energy and Food. We use about 3,000 food items to capture the CPI metrics.

DE: Egg prices may have thrown that off!

Awesome, I really appreciate the overview there. Just to understand a little bit more about the journey and starting the company, I’d love to hear more about your and the team’s backgrounds and how this came about. When did you have that lightbulb moment of starting BDF? You also have a really impressive tech background before starting this, I’d love to hear more about that journey.

PT: My background is as an electrical engineering PhD. I did my PhD in the UK and I was involved in the first 3G systems. They sent me to Japan in my twenties. I ate the sushi - now I love sushi but at the time I hated sushi - I was eating McDonald’s for months. There was KFC as well. Then I came to the US because I wanted to be in silicon valley. I had no idea how expensive it was, but anyway. I came over with Motorola and then I thought, let’s do some startups. We did one startup I joined and they failed 6 months later. We regrouped and built another company, Mobility Networks, we sold it to Rukus Wireless, the Rukus Wireless was acquired by Brocade which was acquired by Broadcom. So our stuff ended up in Broadcom somewhere. Then I joined Intel. I became CTO of the wireless group. Then I joined Cisco doing another CTO job. Still to date, no one knew what to use data for. Everyone is just doing simple analysis. Some people are using more complex analytics. I got really interested in satellite imagery because my PhD was actually in mobile satellites. I had a friend that went to Intellsat and I called him up and asked him to share some images with us - I just want to count cars. This was back in 2012. This was before many of the companies were productizing satellite imagery data. We counted cars - I worked with a bunch of people who literally would sit there and put the count of cars in a spreadsheet. We focused on Home Depot and Lowe’s. We had no idea what AI or machine learning were going to be. I used that data to figure out the transactions. Lowe’s used to publish the number of transactions per quarter. I didn’t have the pricing but the pricing was going through predictable seasonality. We did non-linear modeling. We did that and then a friend of mine went to a hedge fund. We were having a beer and I told him what I was doing and he said “cool maybe we can use it.” He connected me with the PM and he started trading on what we were giving them. I didn’t know much about trading - he made a bunch of money - he was hooked. He then kept asking me to cover more companies. I didn’t have a company - I had to create one to get paid. It was purely an accident.

We also messed up at times. We licensed some data and found out it was a complete mess that didn’t actually help on some names. Then I started bringing in data scientists and we were selling our forecasts on a website - we signed up quite a lot of people. We started developing models to trade, so we stopped selling the forecasts, and then some investors out of Asia showed up and said “we were using your stuff, why aren’t you selling it? Do you need money? Do you want investments?” I thought it was a scam at first and then a lunch became a dinner and they gave us a few million, 10 days after I met the guy. We then brought in more data scientists and got much more systematic. Our first fund was chaos. We created the second fund in 2021 and last year we did 43%, the year before we did 47%. That was the journey, it’s purely an accident. We got into this and we’ve been doing it for a long time. We didn’t know anything about finance - now we lecture on it.

DG: It is a refreshing perspective on finance that Pouya comes from outside of the wall street bubble. There is no “I’ve done it this way for 20 years.”

DE: Having worked in financial services and on the corporate side, that resonates with me as well. When you start to talk to people at corporations who are making decisions and doing some type of forecasting, they are trying to figure out a meaningful shift in consumer behavior and what’s going to come. Whereas I still think so much of finance is focused on everyone saying they’re not calling quarters, but they’re all calling quarters, or using forecasting more from a pure math based standpoint off of historical data, not necessarily what’s going on in the operations of a business.

PT: Every day I’m learning. The other day I was like “wow, the market is so complex.” Actually the market going down may influence people’s sentiment which would cause a recession, not because we are going to have a recession, but because the fear of recession drives us to it. It is so complex, this whole thing. Wall street before you know it, everyone says the same thing. I don’t want analyst reports, it doesn’t really help me. I don’t know where they got their numbers, they don’t really say it.

The quant people take our data, run their own models, and build their own strategies. We like them. Discretionary people build their own forecasting models, which is bizarre. They are constantly wanting to take our key drivers and punch it into their models, and we tell them, no, you have your own models, this is an independent one. You get a better outcome with two separate models rather than mixing them. Then they want to understand the formula. The formula changes every day. And the number inside the formula changes every day too. You want to do that? Sure, here’s the formula but it’s only good for the next 24 hours. We’re trying to get through this with the discretionary people because it’s as open a model as it can get, but they need to think differently.

DG: People ask all the time, “oh, we have credit card data, we have email receipt data, is that in your models?” As Pouya said, they’re not using any panel data. But it’s very complementary to the panel data. I think there are very positive synergies that can work together as opposed to including it in a particular model. That’s where the funds will get their proprietary advantage.

PT: The US economy is ⅔ consumers, but the US stock market isn’t mapping that at all. Who is on the top? Apple, Nvidia, Microsoft. They are related to consumer but we’re not really mapping ⅔ of the economy at all. So the credit card data does not really help you figure out how Microsoft revenue or Nvidia revenue looks. What really matters - credit card data cannot help you at all. And if you have the credit card data, which we assume everyone has, and let’s say everyone can see what Walmart is doing, then it gets reflected in the sellside estimates. If everyone has the same data, then that is not where the real challenge is. The challenge is filling out these non-consumer, non-direct-spend companies, which is leadership in this market right now.

DE: What are you thinking next? What’s on the roadmap?

PT: We’re thinking about some overseas markets. The Japanese market for example is very similar to the US market with a similar breakdown of domestic vs. international in their revenue sources. It’s like the Avatar movie - we found another planet that looks like ours! We’re also constantly monitoring our models and adding new models so we can predict more tickers.

DG: Earning guidance was officially launched the other week and we got some really good feedback on that so far.

DE: Pouya, Daniel, great chatting with you and thanks for sharing more on the platform - it’s really exciting.