A portfolio-grade analytics project that turns café transactions into (1) item-level demand forecasts and (2) an inventory reorder policy (Reorder Point + Safety Stock), then validates the policy with a Monte Carlo stockout-risk simulation.
This repo is built around a real decision:
“For each item, when should I reorder (ROP), how much should I target (Order-up-to), and what stockout risk am I accepting?”
This project uses the Kaggle dataset:
- Cafe Sales Dataset (Clean) by aramelheni
- Dataset URL: https://www.kaggle.com/datasets/aramelheni/cafe-sales-dataset-clean
Thanks to the author and Kaggle for making the data publicly available.
- What this project delivers
- Dashboard tour (with screenshots)
- How it works (end-to-end)
- Figures explained (analysis outputs)
- Quickstart
- CLI options
- Outputs
- Project structure
- Decision safety + limitations
- Next upgrades
Raw data is transaction-level (each row = one purchase event). Inventory decisions need a daily signal.
This repo creates:
- Daily demand (units) per item
- Daily revenue per item
- Zero-filled missing days so each item becomes a continuous time series (no “holes”)
Instead of jumping to complex models, we start with strong baselines that many real businesses use because they’re:
- simple,
- stable,
- fast,
- easy to communicate.
We backtest multiple baseline models and choose the best per item.
Forecasts are translated into:
- Safety Stock
- Reorder Point (ROP)
- Order-up-to level (S)
So the output is not “a forecast chart”, but an actionable reorder policy.
Inventory decisions are probabilistic. Even a “good” policy can stock out if demand spikes.
We simulate demand variability and inventory updates to estimate:
- stockout-day rate (%)
- average inventory on-hand
- unmet demand (lost sales proxy)
What you’re seeing
- Days covered / Items / Total revenue: sanity-check the dataset size and total business footprint.
- Revenue ranking by item: quickly identifies which items matter most financially.
- Backtest summary: compares forecasting baselines on your data, not in theory.
Why it matters This tab answers: “Is the dataset valid, what’s worth focusing on, and which forecasting approach is behaving best overall?”
What you’re seeing
- Historical daily demand: raw behavior (including spikes and noisy days).
- History + forecast:
- “history” line = observed demand
- “forecast” line = next-horizon baseline forecast
How to interpret
- If demand is mostly stable, a moving average/EWMA baseline can be strong.
- If demand is weekly-patterned (weekday/weekend differences), seasonal naïve often wins.
- Large spikes should trigger a “data and operations” question: promotion? bulk purchase? data entry?
Why it matters This tab answers: “What does normal demand look like for this item, and what is the next-month expectation?”
This is the decision tab. It is intentionally practical.
Columns you’ll see:
- μ (mu_daily_demand): mean daily demand (units/day)
- σ (sigma_daily_demand): demand variability
- Service level: target probability of no stockout during lead time
- z: the z-score equivalent of that service level
- Lead time (days): supplier delay window
- Safety stock: buffer inventory for variability
- ROP: reorder when inventory drops below this
- Order-up-to (S): target inventory level after ordering
Interpretation rule
- Higher μ → higher ROP (more demand to cover during lead time)
- Higher σ or higher service level → higher safety stock → higher ROP
This table shows simulated outcomes under variability:
- avg_stockout_day_rate: % of days with stockout
- avg_onhand_units: average inventory you carry
- avg_unmet_demand_units: expected demand you fail to serve (lost sales proxy)
What “good” looks like
- Low stockout-day rate with a reasonable on-hand level.
- If you want lower risk, raise service level or reduce lead time (if operationally possible).
This section translates formulas into story:
- “This item sells μ/day with σ variability.”
- “With lead time L and service level SL, you need safety stock.”
- “Therefore reorder at ROP and refill to S.”
This tab is the ethics/engineering discipline of analytics:
- Forecasts are models, not truth.
- Inventory is assumption-sensitive (lead time + variability + service level).
- Without cost data, we avoid claiming “optimal” ordering.
We standardize columns and types:
- parse dates
- coerce quantities and totals to numeric
- normalize item labels (case/whitespace)
- drop invalid rows safely
Inventory is decided daily (or at least reviewed daily/weekly).
We compute per date × item:
- demand_qty = sum(quantity)
- revenue = sum(total_spent)
- txn_count = number of transactions
Then we fill missing days with 0 so every item has a continuous timeline:
- This is essential for forecasting and backtesting.
- Otherwise, “missing day” might be mistaken as “missing data” rather than “zero sales”.
We run a small, defendable model zoo per item:
-
Seasonal naïve (weekly)
“Next Monday ≈ last Monday.” -
Moving average (7-day)
Forecast = mean of last 7 days. -
EWMA (exponential smoothing)
Weighted mean where recent days matter more (stable and responsive).
Backtesting tests models on data they haven’t “seen”.
- hold out the last N days (default 28)
- fit baseline using prior days
- predict the held-out days
- compute metrics:
- MAE (mean absolute error)
- RMSE (root mean squared error)
- MAPE (percentage error when actual > 0)
Model selection rule
- pick the model with lowest MAE per item (simple + robust)
For each item we estimate:
- μ = mean daily demand
- σ = standard deviation of daily demand
Given:
- lead time L (days)
- service level SL (e.g., 0.95)
- z = z-score for SL
We compute:
- Safety Stock ≈ z × σ × sqrt(L)
- ROP ≈ μ × L + Safety Stock
Meaning
- μ × L covers expected demand during lead time
- safety stock covers uncertainty during lead time
Even if ROP is correct “on average”, randomness causes stockouts.
We simulate many runs:
- daily demand sampled from a distribution around μ, σ
- inventory updates daily
- if inventory ≤ ROP → order up to S
- order arrives after lead time L
Outputs:
- stockout-day probability
- unmet demand
- average inventory held
These are the figures generated by the pipeline and used in analysis + README.
What it shows
- Each bar = average MAE across items for that model.
- Lower MAE = better accuracy.
How to read it
- If Moving Average wins, your demand is mostly stable.
- If Seasonal Naïve wins, your demand is strongly weekly-patterned.
- If EWMA wins, demand shifts slowly and recent behavior matters most.
Why it matters This chart tells you whether a “simple” method is already excellent (often true), and prevents over-engineering.
What it shows
- Multiple top items plotted together:
- solid = historical demand
- dashed = forecast horizon
What to watch
- Spikes: single-day events can dominate perception. Investigate whether they’re real (bulk purchase, promotion) or data artifacts.
- Noise vs signal: if demand is mostly low and spiky, inventory should be conservative and service-level assumptions matter more.
Why it matters This is the bridge from “forecasting metrics” to “human trust”: you can see whether the forecast feels reasonable.
What it shows
- Total revenue contribution per item over the full dataset.
How to use it
- High-revenue items:
- forecast and inventory policy matter more (financial impact)
- service level may be worth increasing
- Low-revenue items:
- may tolerate higher stockout risk if holding cost is high
- or might be discontinued depending on business logic
Why it matters This prevents “equal effort for unequal impact.”
What it shows
- x-axis: mean daily demand (μ)
- y-axis: reorder point (ROP)
- labels: item names
Interpretation
- ROP increases with μ (more demand to cover lead time).
- ROP increases with σ and service level (more safety stock).
- Two items with similar μ can have different ROP if one is more volatile.
Why it matters This chart is a quick audit:
- If an item’s ROP looks “too high”, check σ or lead time assumptions.
- If ROP looks “too low”, expect stockouts in simulation.
python -m venv .venv
# Windows: .venv\Scripts\activate
# macOS/Linux:
source .venv/bin/activate
pip install -r requirements.txtpython -m src.pipeline --input data/raw/cafe_sales.csvstreamlit run app/app.pypython -m src.pipeline \
--input data/raw/cafe_sales.csv \
--out outputs \
--figures reports/figures \
--horizon 30 \
--backtest 28 \
--lead-time 3 \
--service-level 0.95 \
--sim-runs 300Guidance
- Increase
--service-levelif stockouts are expensive. - Increase
--lead-timeto reflect slow suppliers (ROP will increase). - Increase
--sim-runsfor more stable risk estimates.
After running the pipeline, check:
-
outputs/daily_item_demand.csvClean daily series (date × item) with demand and revenue. -
outputs/backtest_scores.csvItem-level metrics per model. -
outputs/item_model_selection.csvBest model per item (lowest MAE). -
outputs/forecast_next_30d.csvNext-horizon daily quantity forecasts. -
outputs/reorder_policy.csvInventory policy per item (safety stock, ROP, S). -
outputs/simulation_summary.csvStockout-day risk and unmet demand estimates. -
reports/figures/*.pngShareable charts used above.
cafe-demand-forecasting-reorder-simulator/
app/
app.py # Streamlit dashboard
data/
raw/
cafe_sales.csv
processed/
daily_item_demand.csv
outputs/
*.csv + run_metadata.json
reports/
figures/
backtest_summary.png
forecast_examples.png
item_revenue_ranking.png
rop_vs_demand.png
figures/ui/
dashboard_overview.png
dashboard_item_explorer.png
dashboard_inventory_policy.png
dashboard_notes.png
src/
clean.py
aggregate.py
forecast.py
inventory.py
reporting.py
pipeline.py
README.md
requirements.txt
LICENSE
-
Baselines are intentional: they are strong, explainable, and fast.
-
Inventory math is assumption-sensitive:
- lead time, service level, and variability drive ROP.
-
Without cost data, we avoid claiming “optimal” inventory:
- true optimization needs holding cost, stockout cost, order cost.
-
Spikes matter:
- one promotional day can inflate σ and raise ROP.
- consider spike handling if you know “event days”.
If you want to push this from “portfolio-grade” to “production-grade”:
-
Cost-based optimization Add holding vs stockout vs order cost and recommend service levels.
-
Item-specific lead times Different suppliers → different reorder behavior.
-
Promotion/event tagging Separate “normal demand” from “event demand”.
-
Richer time-series models (after baselines) Prophet / SARIMAX / LightGBM features, etc. (only if baselines are not enough)
-
Data quality checks
- outlier audit
- missing-day audit
- duplicate transaction detection