"A recipe is just a directed acyclic graph…"

Written by Piers Cawley on

In the last post I handwaved the way I represented bakery formulae in the bakery database, so here’s a little more detail. It helps to think of a bakery formula as a node on a directed acyclic

If you ignore the fact that a starter is made of flour, water and starter. Which, of course, we’re going to.

graph with weighted edges, where the weights are literally weights. Here’s the graph a for a couple of products

In the last post I handwaved the way I represented bakery formulae in the bakery database, so here’s a little more detail. It helps to think of a bakery formula as a node on a directed acyclic

If you ignore the fact that a starter is made of flour, water and starter. Which, of course, we’re going to.

graph with weighted edges, where the weights are literally weights. Here’s the graph a for a couple of products

And here’s how we represent that in the database

This table is the result of a query on my real database, where the quantities are in kg, as opposed to the graph representation which was handrolled and adjusted to use bakers’ percentages which is how formulae are traditionally written.

:

nameingredientformat
Small Seedy MaltSeedy Malt Dough0.63 kg
Small White WildBasic White Sour0.63 kg
Basic White SourOrganic white flour2.00 kg
Basic White SourSea salt0.06 kg
Basic White SourWater1.10 kg
Basic White Sour80% starter1.80 kg
Seedy Malt Dough5 Seed Soaker4.00 kg
Seedy Malt DoughWater3.80 kg
Seedy Malt DoughSea salt0.22 kg
Seedy Malt Dough80% starter3.60 kg
Seedy Malt DoughOrganic light malthouse flour8.00 kg
5 Seed SoakerWater1.20 kg
5 Seed Soaker5 seed mix1.00 kg
MotherWater3.20 kg
MotherOrganic white flour4.00 kg

Suppose we have an order for 8 Small White loaves. We need to know how much starter to mix tonight. We know that we need 0.63 kg of dough for each loaf, so that’s a total of 5.04 kg of Basic White Sour. The formula for Basic White Sour makes a total of \(1.10 + 1.80 + 0.06 + 2.00 = 4.96 \mathrm{kg}\) of dough. So we need to multiply each quantity in that formula by the weight of dough we need divided by the total weight of the recipe \((5.04/4.96 = 1.016)\). This is straightforward enough for flour, water and salt, which are basic ingredients, but we’ll need to do a similar calculation to work out how much flour and water we’ll need to make \(1.016 × 1.8 = 1.829 \mathrm{kg}\) of starter. You can see how this might become a little tedious.

If I were going to be doing these calculations by hand, it would definitely pay me to normalize my intermediate formulae so they all made a total of 1 kg of stuff. But screw that, we have a computer, so we can make it do the work.

I’m going to simplify things a little (the real database understands about dates, and we need to know a little more about recipes, products and ingredients than will fit in the recipe_item table that describes the graph) but this should give you an idea of the recursive queries that drive production planning.

Let’s introduct a production_order table, where we stash our orders

The real table has extra information about customers and order dates:

productquantity
Small White Wild5
Small Seedy Malt5

And that’s all we need to fire off a recursive query.

I’m writing this using the literate programming capabilities of org-mode, so the code you see is being run against my production database, and the results are using my working formulae. Which is why we’re not querying the real production_order table.

WITH RECURSIVE po(product, quantity) AS (
    SELECT 'Small White Wild', 5
  UNION
    SELECT 'Large White Wild', 5
), rw(recipe, weight) AS (
    SELECT recipe, sum(amount)
      FROM bakehouse.recipe_item
  GROUP BY recipe
), job(product, ingredient, quantity) AS (
    SELECT po.product,
           ri.ingredient,
           po.quantity * ri.amount
      FROM po
      JOIN bakehouse.recipe_item ri ON po.product = ri.recipe
      JOIN rw ON ri.recipe = rw.recipe
  UNION
    SELECT job.ingredient, ri.ingredient, job.quantity * ri.amount / rw.weight
      FROM job
      join bakehouse.recipe_item ri on job.ingredient = ri.recipe
      join rw on job.ingredient = rw.recipe
)
SELECT product formula, ingredient, ROUND(sum(quantity),2) quantity from job group by job.product, job.ingredient order by formula;

Which gives the following result:

formulaingredientquantity
Basic White SourSea salt0.09
Basic White SourWater1.72
Basic White SourMother2.81
Basic White SourOrganic white flour3.13
Large White WildBasic White Sour4.65
MotherOrganic white flour1.56
MotherWater1.25
Small White WildBasic White Sour3.10

A quick sanity check seems to show this is correct (we’re making 7.75kg of Basic White Sour, which tallies with the weights needed to make the loaves). So what’s going on in the query? In SQL, WITH is a way of giving names to your intermediate results, akin to let in a Lisp. We fake up a table to hold our production orders (po) and the rw clause is totals the weights of all our recipes (in the real database, it’s a view). The magic really starts to happen when you use the WITH RECURSIVE form. With RECURSIVE in play, the last query is treated differently. Instead of being a simple two part UNION what happens is that we first run:

SELECT po.product, ri.ingredient, po.quantity * ri.amount
  FROM po
  JOIN bakehouse.recipe_item ri on po.product = ri.recipe
  JOIN rw on ri.recipe = rw.recipe

and call the results job and then run the second query, adding any extra rows generated to the results, and repeating that query until the result set stops growing. If we didn’t have WITH RECURSIVE available, and we knew the maximum depth of recursion we would need, we could fake it by making a bunch of intermediate clauses in our WITH. In fact, until I worked out how WITH RECURSIVE works, that’s exactly what I did.

Have you spotted the mistake? I didn’t, until a few bakes when horribly wrong.

Here’s what happens when we have an order for 3 small loaves and two large ones

formulaingredientquantity
Basic White SourSea salt0.02
Basic White SourWater0.41
Basic White SourMother0.68
Basic White SourOrganic white flour0.75
Large White WildBasic White Sour1.86
MotherOrganic white flour0.38
MotherWater0.30
Small White WildBasic White Sour1.86

We’re only making 1.86 kg of dough? What’s going on?

It turns out that the way a UNION works is akin to doing SELECT DISTINCT on the combined table, so it selects only unique rows. When two orders end up requiring exactly the same amount of the ‘same’ dough, they get smashed together and we lose half the weight. This is not ideal.

It’s especially not ideal when you don’t spot there’s a problem and end up making far fewer loaves than you expect. Or on one really annoying occasion, making a dough that was far too dry because we lost some water along the way. You can correct this during the mix, but it was a nasty shock.

I fixed it by adding a ‘path’ to the query, keeping track of how we arrived at a particular formula. Something like:

WITH RECURSIVE po(product, quantity) AS (
    SELECT 'Small White Wild', 3
  UNION
    SELECT 'Large White Wild', 2
), rw(recipe, weight) AS (
    SELECT recipe, sum(amount)
      FROM bakehouse.recipe_item
  GROUP BY recipe
), job(path, product, ingredient, quantity) AS (
    SELECT po.product,
           po.product,
           ri.ingredient,
           po.quantity * ri.amount
      FROM po
      JOIN bakehouse.recipe_item ri ON po.product = ri.recipe
      JOIN rw ON ri.recipe = rw.recipe
  UNION
    SELECT job.path || '.' || job.ingredient,
           job.ingredient,
           ri.ingredient,
           job.quantity * ri.amount / rw.weight
      FROM job
      join bakehouse.recipe_item ri on job.ingredient = ri.recipe
      join rw on job.ingredient = rw.recipe
)
SELECT product formula, ingredient, round(sum(quantity),2) weight from job group by formula, ingredient order by formula;

This query gives us:

formulaingredientweight
Basic White SourSea salt0.05
Basic White SourWater0.83
Basic White SourMother1.35
Basic White SourOrganic white flour1.50
Large White WildBasic White Sour1.86
MotherOrganic white flour0.75
MotherWater0.60
Small White WildBasic White Sour1.86

This time we’re making 3.74 kg of dough, which is right.

In order to see what’s going on, we can change the final SELECT to SELECT formula, path, ingredient, round(quantity,2) weight FROM job, and now we get:

formulapathingredientweight
Large White WildLarge White WildBasic White Sour1.86
Basic White SourLarge White Wild.Basic White SourMother0.68
Basic White SourLarge White Wild.Basic White SourOrganic white flour0.75
Basic White SourLarge White Wild.Basic White SourWater0.41
Basic White SourLarge White Wild.Basic White SourSea salt0.02
MotherLarge White Wild.Basic White Sour.MotherWater0.30
MotherLarge White Wild.Basic White Sour.MotherOrganic white flour0.38
Small White WildSmall White WildBasic White Sour1.86
Basic White SourSmall White Wild.Basic White SourOrganic white flour0.75
Basic White SourSmall White Wild.Basic White SourSea salt0.02
Basic White SourSmall White Wild.Basic White SourWater0.41
Basic White SourSmall White Wild.Basic White SourMother0.68
MotherSmall White Wild.Basic White Sour.MotherOrganic white flour0.38
MotherSmall White Wild.Basic White Sour.MotherWater0.30

Which shows that we’re considering two lots of Basic White Sour with exactly the same weights, but we (and more importantly, the database engine) know that they’re distinct amounts because we get to them through different routes. Hurrah! The problem is solved and we can accurately work out what we should be mixing.

What’s still missing

As a baker, I know if I’ve got an order for bread on Friday, then I need to mix the starters on Wednesday night, then spend Tuesday mixing, fermenting and shaping the loaves, which will spend the night in the retarder ready to be baked at 4 on Friday morning. But the schema I’ve outlined here doesn’t. In my full bakehouse schema, I have a few extra tables which hold timing data and such. In particular, I have a product table, which knows about everything I sell. This table knows holds info about how many I can make per hour of work and the bake time and temperature. Then there’s a recipe table which holds information about how long a formula needs to rest.

This could be the bulk fermentation time if it’s a formula for a dough or a starter, a proof time if it’s a loaf, or a soaking time for a soaker (a soaker is usually a mixture of seeds or fruit and a liquid, usually water, but occasionally fruit juice or booze depending on the final product).

The real queries take this into account to allow us to work back from the due_date of a real order to the day we need to do the work. If you want to dig into how I handle dates you can check out the repository at https://github.com/pdcawley/bakehouse/.

The perils of writing stuff up

Never write your work up for your blog. Especially if you’re mostly happy with it. As I was writing this, I realised there’s an annoying bit of code duplication that I think I can eliminate. In the current code, I repeat what’s essentially the same query structure in a couple of different views, but the formula graph is essentially static unless I add or adjust a recipe. Now I’m wondering if I could make a materialised view that has enough information to shortcut the calculations for both making the production list (what needs to be mixed, when) and for working out my costings (to put a price on a loaf, you need to know how much the raw ingredients cost, and that involves walking the tree again. Maybe a table like:

productsub_formulaingredientfactorlead_time
Large White WildBasic White SourWhite Flour0.4031 day
Large White WildBasic White SourSalt0.0121 day
Large White WildBasic White SourWater0.2221 day
Large White WildBasic White Sour80% Starter0.4621 day
Large White Wild80% StarterWhite Flour0.2882 days
Large White Wild80% StarterWater0.1732 days

If we have that table, then two days before our bread is due, if we have an order for 10 white loaves, we’ll need to mix \(9.3 × .288 \approxeq 2.68\) kg of flour and \(9.3 × 0.173 \approxeq 1.61\) kg of water. Which we can do with a simple non-recursive SELECT. Something like:

NB: I’ve not tested this because I don’t have the precalculated table, but it seems like it should work. In fact, thinking about it, we could probably build the precalc table so that we can simply do precalc.factor * po.quantity, since any change that affects recipe weight will also affect our precalculated table.

WITH weighted(formula, ingredient, weight, due) AS (
    SELECT precalc.sub_formula,
           precalc.ingredient,
           precalc.factor * po.quantity * rw.weight,
           po.due_date - precalc.lead_time
      FROM precalc
      JOIN production_order po ON precalc.product = po.product
      JOIN recipe_weight rw ON precalc.product = rw.recipe
)
  SELECT formula, ingredient, sum(weight)
    FROM weighted
   WHERE due = 'today'
GROUP BY formula, ingredient

We can use the same table to calculate the raw material costs for a given recipe, using a simple non-recursive query too.

I think, however, I’m going to leave it alone until I have to write another recursive view that walks the same graph, at which point I’ll bite the bullet and do the pre-calculated version.

  • 0 likes
  • 0 reposts
  • 0 replies
  • 0 mentions