Nick Cronshaw – Demand Planning, S&OP/ IBP, Supply Planning, Business Forecasting Blog

How To Use Facebook’s Prophet

Nick Cronshaw — Wed, 14 Mar 2018 12:26:02 +0000

Working in a SME with limited resources, the kind of sophisticated forecasting tools used by the major multinationals can seem far out of reach. For people in smaller companies like mine, the abundance of free to use, open-sourced, state of the art software like Facebook Prophet offers access to game-changing functionality.

The last couple of years have seen several major internet names open sourcing powerful predictive analytic API’s, making them free to use for developers and professionals. Google’s TensorFlow deep-learning library is probably the most widely used and influential library, but there are many more libraries that provide valuable functionality for Demand Planners and Purchasers that don’t require the kind of GPU computing power that deep-learning requires.

What Is Facebook Prophet?

Facebook describe the software as “a procedure for forecasting time series data. It is based on an additive model where non-linear trends are fit with yearly and weekly seasonality, plus holidays. It works best with daily periodicity data with at least one year of historical data. Prophet is robust to missing data, shifts in the trend, and large outliers.”

Prophet was designed to tackle the problem that quality forecasts are required faster than analysts can produce them, while automated forecasting techniques are too inflexible to incorporate useful assumptions or rules gleaned from experience. In a business context we’ve all seen automatically generated forecasts that don’t factor in change points in demand such as market breakout for a booming trend of the slump of a major customer moving to a competitor. At the same time, with thousands or tens of thousands of SKU’s to monitor, we know that finding highly skilled analysts to complete the workload consistently and rapidly can be a major challenge.

Why Use Facebook Prophet For Forecasting?

Firstly, Prophet is stupidly easy to use and generates reasonable results without having to worry about choosing between models and tuning hyperparameters.

Secondly, Prophets parameters allow for customization in ways that make sense to a non-expert and in a business context, such as the ability to inject S&OP information about how the forecast will likely change, the ability to set caps on possible demand based on experience and market knowledge, and the ability to model irregular holidays like Chinese New Year or Easter.

As a keen Pythonista, one of the best things for me about Prophet is that it can be used in Python and is easily installed from either pip or conda. Generally, R has had the edge over Python for time series regression problems. The auto.arima function in R is hard to beat for ease of use and accuracy of results. R also has some recent additions for dealing with time series problems – CausalImpact from Google which identifies the causal effects of things like marketing campaigns on sales, and AnomalyDetection/BreakoutDetection from Twitter, that help identify anomalies and shifts in trends. Facebook Prophet is therefore a very welcome addition to the Python ecosystem.

What is Facebook Prophet Optimized To Solve?

Prophet was designed by Facebook, so it’s well suited to regularly spaced timeseries observations and works best with at least a year’s observations to catch seasonal trends. It has a very useful function for incorporating national holidays, which depending on your business might represent peaks (television ratings during holidays) or dips (stores closed or open half-days on national holidays). It also has a useful parameter called a ‘changepoint’ which enables you to specify a point after which demand is likely to change, such as the launch of a competitor’s product or a major television campaign.

One important note is that Prophet is an additive regression model built up from trend, annual seasonality, weekly seasonality, and a user-specified holiday list. If you’re concerned that your model might be inherently multiplicative in nature, then it might be worth log-transforming the data and then using the inverse log to normalize the predictions.

I should also make clear from the start that as impressive as Prophet is, you can get better results by stacking the model in an ensemble of various techniques if you have the computing power to do so. On my laptop (and my work desktop), it would take several days to fit a very sophisticated ensemble model, whereas Prophet is able to do a reasonable job on all 3000 SKU’s for my current company in a matter of minutes, so there is a trade-off of accuracy against computational/time cost.

Installing Facebook Prophet in Python

Prophet can be installed very easily in Python, either through pip or through conda install. I used the conda installation which also loads all the dependencies and is very convenient: https://anaconda.org/conda-forge/fbprophet

Installing Prophet in Python is straightforward.

Preparing Facebook’s Prophet Datasets

Prophet accepts the primary dataset of time series data and an optional list of holidays. I read these with this into Python with Pandas read_csv function passing parse_dates=True. If you’re unfamiliar with the Pandas commands, you can gain a quick understanding with the excellent 10 minutes to Pandas guide here: https://pandas.pydata.org/pandas-docs/stable/10min.html

The Prophet documentation shows that the variables for the primary dataset should be labelled ‘ds’ for the time series and ‘y’ for the variable. The holiday list should also be labelled ‘ds’ for the time series and ‘holiday’ for the list of notable events. Time series should be sorted and formatted as datetime datatype. This is easily done inside the workflow. I’ve take a single line of crystal glass tableware as an example having already sliced the dataframe down to one SKU:

Setting Up The Process

I tend to import a range of packages to perform basic exploratory data analysis, but the only essential packages for this will be pandas and fbprophet.

Here’s a quick plot of the time series:

Facebook Prophet time series

Eyeballing the data, you should notice an increase in both frequency and volume of orders in a regular annual cycle with perhaps a faint downward trend over the three years.

Anyone familiar with sci-kit learns fit_predict/fit_transform methods will find Prophet follows a very similar pattern.

Here I instantiate the model with an uncertainty window of 95% (Prophet defaults to 80% even though 95% is normally standard in many business fields). I feed it my holiday list as a parameter, and then fit the model to my filtered dataframe (Ndf). I then project a future dataframe of around 3 months using Prophets ‘make_future_dataframe’ function.

Once the model is fit. All that remains to do is to predict the model over the future dates and have a look at the dataframe to insanity check the results.

Then using the model.plot(forecast) we can have a look at the fit and projected values:

As you can see the model has done an excellent job in finding the seasonal pattern and correctly identified the downward trend over the last three years. One of the best features of Prophet is that it will return the model components. Here we can see the overall trend and holidays have been isolated:

Facebook prophet seasonal patterns

And the weekly and annual seasonality analysis is a very good fit for my experience in the table top HORECA trade, with annual demand peaking slightly at the start of summer and rising to a peak in the lead up to Christmas when parties fill up the hotels, restaurants and bars.

As you can also see, we close for weekends, so not so many invoices raised then. The busiest weekday is a Wendesday.

Final Thoughts On Facebook Prophet

With very little coding and without setting any of the numerous other hyperparameters, Prophet did an excellent job on the time series, despite the large number of outliers in the data, achieving a coefficient of determination of 0.84.

There are certainly many advantages for an SME purchaser or demand planner considering forecasting with Prophet:

The software is free to use, open-source code and is ridiculously easy to deploy.

Prophet is extremely quick, taking only a few seconds even on my now badly outdated laptop – more advanced neural networks are notorious for requiring multiple GPU’s or burning out the CPU’s of machine’s after several days running.

The optional hyperparameters as intuitive even to the less technically minded demand planner.

The predictions are returned with a confidence interval around the forecast, which can often be more useful that the predicted value itself when making decisions about stock levels.

All in all, R still has the edge when it comes to comprehensively tackling time series regression tasks, but if you’re a Pythonista working in demand planning and you want to upgrade your forecasting accuracy, then I’d strongly recommend Prophet as a tool to consider.

How To Use Microsoft Azure

Nick Cronshaw — Mon, 29 Jan 2018 20:12:25 +0000

If, like me, you work in a small to medium sized enterprise where forecasting is still done with pen and paper, you’d be forgiven for thinking that Machine Learning is the exclusive preserve of big budget corporations. If you thought that, then get ready for a surprise. Not only are advanced data science tools largely accessible to the average user, you can also access them without paying a bean.

If this sounds too good to be true, let me prove it to you with a quick tutorial that will show you just how easy it is to make and deploy a predictive webservice using Microsoft’s Azure Machine Learning (ML) Studio, using real-world (anonymised) data.

What is Azure ML?

To most people the words ‘Microsoft Azure’ conjure up vague ideas of cloud computing and TV adverts with bearded-hipsters working in designer industrial lofts, and yet, in my opinion, the Azure Machine Learning Studio is one of the more powerful and leading predictive modelling tools available on the market. And again, its free. What’s more, because it has a graphical user interface, you don’t need any advanced coding or mathematical skills to use it. It’s all click and drag. In fact, it is entirely possible to build a machine learning model from beginning to end without typing a single line of code. How’s that for a piece of gold?

You can make a free account or sign in as a guest here – https://studio.azureml.net The free account or guest sign-in to the Microsoft Azure Machine Learning Studio gives you complete access to their easy-to-use drag and drop graphical user interface that allows you to build, test, and deploy predictive analytics solutions. You don’t need much more.

Microsoft Azure Tutorial Time!

I promised you a quick tutorial on how to make a forecast that drives purchasing and other planning decisions in Azure ML, and a quick tutorial you shall have.

If you’re still with me, here are a couple of resources to help you get rolling:

A great hands on lab: https://github.com/Azure-Readiness/hol-azure-machine-learning

Edx courses you can access for free: https://www.edx.org/course/principles-machine-learning-microsoft-dat203-2x-6

https://www.edx.org/course/data-science-essentials-microsoft-dat203-1x-6

Having pointed you in the direction of more expansive and detailed resources, it’s time to get into this quick demo. Here are the basic steps we’ll go through:

Uploading datasets
Exploring and visualising data
Pre-processing and transforming
Predictive modelling
Publishing a model and using it in Excel

Uploading Datasets To Microsoft Azure

So, you’ve signed up. Once you’re in, you’re going to want to upload some data. I’m loading up the weekly sales data of a crystal glass product for the years 2016 and 2017 which I’m going to try and forecast. You can read in a flat file csv. format by clicking on the ‘Datasets’ icon and clicking the big ‘+ New’:

Then you’re going to want to load up your data from the file location and give it a name you can find easily later. Clicking on the ‘flask’ icon and hitting the same ‘+ New’ button will open a new experiment. You can drag your uploaded dataset from the ‘my datasets’ list on to the blank workflow:

Exploring and Visualizing

Right clicking on the workflow module number (1) will give you access to exploratory data analysis tools either through ‘Visualise’, or by opening a Jupyter notebook (Jupyter is an open source web application) in which to explore the data in either Python or R code. If you want to learn how to use and apply Python to your forecasting, practical insights will also be revealed at IBF’s upcoming New Orleans conference on Predictive Business Analytics & Forecasting.

Clicking on the ‘Visualise’ option calls up a view of the data, summary statistics and graphs. A quick look at the histogram of sales quantity shows that the data has some very large outliers. I’ll have to do something about those during the transformation step. You also get some handy summary statistics for each feature. Let’s have a look at the sales quantity column.

I’m guessing that zero will be Christmas week, when the office is closed. The max is likely to be a promotional offer. I can also see that the standard deviation is nearly 12,000 pieces, which is high compared to the mean. You can also compare columns/features to each other to see if there is any correlation:

Looking at a scatter plot comparison of sales quantity to the consumer confidence index value, that really doesn’t seem to be adding anything to the data. I’ll want to get rid of that feature. I’ve also included a quick Python line plot of sales over the two-year period.

As you can see, there is a lot of variability in the data and perhaps a slight downward trend. Without some powerful explanatory variables, this is going to be a challenge to accurately forecast. A lot of tutorials use rich datasets which the Machine Learning systems can predict well to give you a glossy version. I wanted to keep this real. I work in an SME and getting even basic sales data is an epic battle involving about fifty lines of code.

Pre-processing and Transforming

Now it’s time to transform the data. For simplicity, I’ve loaded a dataset with no missing or invalid entries by cleaning up and resampling sales by week with Python, but you can use the ‘scrub missing values’ module or execute a Python/R script in the Azure ML workspace to take care of this kind of problem.

In this case, all I need to do is change the ‘week’ column into a datetime feature (it loaded as a string object) and drop that OECD consumer confidence index feature as it wasn’t helping. I could equally have excluded the column without code using the select columns module:

One of the other things I’m going to do is to trim outliers from the dataset using another ‘Execute Python Script’ module to identify and remove outliers from the sales quantity column so the results are not skewed by rare sales events.

Again, I could have accomplished a similar effect by using Azure’s inbuilt ‘Clip Values’ module. You genuinely do not have to be able to write code to use Azure (but it helps.)

There are too many possible options within the transformation step to cover in a single article. I will mention one more important step. You should normalise the data to stop differences in scale of the features leading to certain features dominating over others. 90% of the work in forecasting is getting and cleaning the data so that it is usable for analysis (Adobe, take note. Pdf’s are evil and everyone who works with data hates them.) Luckily, you can do all your wrangling inside the machine model, so that when you use the service, it will do all the wrangling automatically based on your modules and code.

The Normalize data module allows you to select columns and choose a method of normalisation including Zscores and Min-Max.

Predictive Modelling In Microsoft Azure

Having completed the data transformation stage, you’re now ready to move on to the fun part – making a Machine Learning model. The first step is to split the data into a training set and a testing set. This should be a familiar practice for anyone working in forecasting. Before you let your forecast out into the wild you want to test how well it performs against the sales history. It’s that or face a screaming sales manager wanting to know where his stock is. I like my life as stress-free as possible.As with nearly everything in Azure ML, data splitting can be achieved by selecting a module. Just click on the search pane and type in what you want to do. I’m going to split my data 70-30.

The next step is to connect the left output of the ‘Split Data’ module to the right input of a ‘Train Model’ module, the right output of the ‘Split Data’ to a ‘Score Model’ module, and a learning model to the right input of the ‘Train model’.

At first this might seem a little complicated, but as you can see, the left output of the ‘Split Data’ is the training dataset which goes through the training model and then outputs the resulting learned technique to the ‘Score Model’ where this learned function is tested against the testing dataset which comes in through the right data input node. In the ‘Train Model’ module you must select a single column of interest. In this case it is the quantity of product sold that I want to know.

Microsoft offer a couple of guides to help you choose the right machine learning algorithm. Here’s a broad discussion and if short on time, check this lightning quick guidance. In the above I’ve opted for a simple Linear Regression module and for comparison purposes I’ve included a Decision Forest Regression by adding connectors to the same ‘Split Data’ module. One of the great things about Azure ML is you can very quickly add and compare lots of models during your building and testing phase, and then clear them down before launching your web service.

Azure ML offers a wide array of machine learning algorithms from linear and polynomial regression to powerful adaptive boosted ensemble methods and neural networks. I think the best way to get to know these is to build your own models and try them out. As I have two competing models at work, I’ve added in an ‘Evaluate Model’ module and linked in the two ‘Score Model’ modules so that I can compare the results. I’ve also put in a quick Python script to graph the residuals and plot the forecasts against the results.

Here’s the Decision Forest algorithm predictions against the actual sales quantity:

Clearly something happened around May 2016 that the Decision Forest model is unable to explain, but it seems to do quite well in finding the peaks over the rest of the period 2017. Looking at the Linear Regression model, one can see that it does a better job of finding the peak around May 2016 but is consistently overestimating in the latter half of 2017.

Clicking on the ‘Evaluate Model’ module enables a more detailed statistical view of the comparative accuracy of the two models. The linear regression model is the top row and the decision forest model is the bottom row.

Coefficient of determinations of 0.60 and 0.72. The models are explaining between half and three-quarters of the variance in sales. The Decision Forest overall scored significantly better. As results go, neither brilliant nor terrible. A perfect coefficient of determination of 1 would suggest the model was overfitted and therefore unlikely to perform well on new data. The range of sales was from 0 to nearly 80,000, so I’ll take 4421 pieces of mean absolute error without a complaint.

It would really be ideal if we had a little more information at the feature engineering stage. The ending inventory in-stock value from each week, or customer forecasts from the S&OP process as features would help accuracy.

One of the benefits of forecasting in this way is you can incorporate features without having to worry about how accurate they are as the model will figure that out for you. I’d recommend having as many as possible and then pruning. I think the next step for this model would be to try incorporating inventory and S&OP pipeline customer forecasts as a feature. Building a model is an iterative process and one can and should keep improving it over time.

Publishing A Model And Consuming It In Excel

Azure ML makes setting up a model as a webservice and using it in Excel very easy. To deploy the model, simply click on the ‘Setup Web Service’ icon at the bottom of the screen.

Once you’ve deployed the webservice, you’ll get an API (Application Programming Interface) key and a Request Response URL link. You’ll need these to access your app in Excel and start predicting beyond your training and testing set. Finally, you’re ready to open good old Excel. Go to the ‘Insert tab’ and select the ‘Store’ icon to download the free Azure add-in for Excel.

Then all you need to do is click the ‘+ Add web service’ button and paste in your Response Request URL and your secure API key, so that only your team can access the service.

After that it’s a simple process to input the new sales weeks to be predicted for the item and the known data for other variables (in this case promotions, holiday days in the week, historic average annual/seasonal sales pattern for the category etc.). You can make this easy by clicking on the ‘Use sample data’ to populate the column headers so you don’t have to remember the order of the columns used in the training set.

Congratulations! You now have a basic predictive webservice built for producing forecasts. By adding in additional features to your dataset and retraining and improving the model, you can rapidly build up a business specific forecasting function using Machine Learning that is secure, shareable and scalable.

Good luck!

If you’re keen to leverage Python and R in your forecasting, we also recommend attending IBF’s upcoming Predictive Analytics, Forecasting & Planning conference in New Orleans where attendees will receive hands-on Python training. For practical and step-by-step insight into applying Machine Learning with R for forecasting in your organization, check out IBF’s Demand Planning & Forecasting Bootcamp w/ Hands-On Data Science & Predictive Business Analytics Workshop in Chicago.