Polytope - A multidimensional data engine

Polytope - A multidimensional data engine



In the prior article, I described my experience taking a small, long-running compiler project and using AI coding tools to extend it with first a TUI and later a web-based mini-IDE. That work taught me a great deal about:

  • Refining a specification through back-and-forth conversation with an LLM
  • Defining outcomes and tests up front
  • Keeping the spec and implementation plan outside of the LLM’s context (as project files)
  • Paying close attention to the recurring traps that emerge once implementation begins

One of the earliest pieces of advice I encountered about using LLMs effectively was: don’t ask them to jump too far in a single step. Breaking a large goal into smaller, focused, testable sub-goals tends to produce far better results.

That said, every few months the latest models arrive with noticeably stronger capability — and increasingly they can perform some of this “divide and conquer” automatically. Once I began using proper coding environments (Claude Code, Gemini Code Assist, and Codex), it became obvious that:

  • Maintaining TODO lists
  • Defining phases
  • Checkpointing
  • Writing at least some tests

…is now built into the philosophy of all major tools. Attentive monitoring is still essential, but the workflows themselves are structured to encourage good engineering practice.

LLMs nevertheless remain prone to the occasional odd behaviour — especially as their working context becomes full. One of my “favourite” Claude Code quirks is when it suddenly announces: “This is getting complicated, let’s do something simpler.”

It then quietly redefines the entire problem to suit the simpler goal!

Still, with a bit of guidance — and with the willingness to stop a chain of actions whenever something odd appears — you can usually keep the “junior–intern robot programmer” more or less on track.

A New Experiment: Building an Analytic Data Engine

With a little experience under my belt, I wanted to attempt something more ambitious: creating an analytic data engine that could eventually be embedded into psimulang.

This is a substantial undertaking — large enough to:

  • Provide an excellent testbed for comparing different coding models
  • Refine my own workflow for AI-assisted development
  • Explore how well LLMs handle sophisticated architectural design

Comparisons between models age quickly, of course, but they remain interesting. Capabilities differ in surprising ways, and sometimes for reasons you can infer from design choices, context window behaviour, or planning features. Even within a single vendor’s lineup, offerings have diverged: context-window utilization, planning depth, “thinking time,” and scheduling heuristics have all evolved markedly over the past year.

Goals of the Data Engine Development

The high-level goals were:

1. Build a Haskell Engine Library

  • DSL for dimensions and multidimensional cubes (“ORTHOs”), with calculations and consolidation hierarchies.
  • loader DSL for projecting tabular data into a chosen multidimensional shape.
  • reporting DSL for defining report renderings over populated ORTHOs.
  • Storage strategies suitable for both dense and sparse data.
  • runtime evaluator capable of executing calculations and consolidations.
  • loader that can ingest data into an ORTHO.
  • reporting engine that can execute report definitions against ORTHO data.

2. Integrate the Engine Into psimulang

  • Extend psimulang syntax with blocks corresponding to each DSL object.
  • Have the psimulang analyzer emit and serialize DSL structures.
  • Pass serialized DSL to new runtime intrinsics.
  • Maintain a polytope environment within the psimulang runtime.
  • Add commands for polytope lifecycle management and report execution.

These are non-trivial tasks. I followed the same pattern as before: each step began with a requirements and implementation-plan session, refined through conversation. Once ready, the LLM produced:

  • master plan committed to the project
  • bootstrap prompt describing invariants, architectural expectations, library preferences, and anti-patterns to avoid

Only after this review did I begin an implementation session referencing the bootstrap prompt.

Designing the DSLs

For the DSLs, I adapted the patterns I knew from working on an early MOLAP engine in the 1990s. The key objects are:

  • DIMENSION — a coordinate set, e.g., “country” or “measure”.
  • ORTHO — a multidimensional shape defined by a list of DIMENSIONs; each intersection is a cell with associated storage.
  • RULE — a calculation relating one cell to others (a multidimensional formula).
  • MODEL — a set of RULEs applied to one or more ORTHOs.
  • REPORT — a set of visual objects (tables, charts) each bound to data, plus a sectioning construct for multidimensional iteration.

All objects live within a ConcurrentEnvironment, the core resource container. Objects legally created by the constructor DSL are added to the environment and then manipulated via lifecycle commands. These commands are queued using Haskell’s Software Transactional Memory (STM) and applied atomically when executed. Object-level effects will use a generational scheme with atomic visibility, and dependency deletion is guarded.

This level of complexity is ideal for testing the limits of AI-assisted coding — and for deepening my understanding of effective prompting, process control, and long-form LLM collaboration at scale.

As is often the case, the initial DSL generation from Claude was fast — and impressively confident.

[LWE TBD: tidy and elaborate this section]

Observations and Problems:

  • Evolution of the Environment, started simple, then added concurrent features, but left the old non-concurrent version in place for "backward compatibility". Took a fair bit to excise this.
  • Kept forgetting to serialize with Store, which we had chosen, when back to Show/Read, despite contextual code having this
  • Forgot that Store was derivable from Generic, build dozens of instances long-hand.
  • Lots of test failures as we progressed, but these did work nicely for regression/fixing - so long as you remembered to remind the LLM to do this
  • LLM forgetting to implement whole swathes of development phases and declaring 'done'. Always review at suitable intervals - ask for a code quality review (covering missing code, TODOs, vestigial code, anti-patterns, performance issues etc.)
  • Claude gets stuck and starts saying things like "X is hanging, let's simplify".
    Claude is a powerful coder, but ChatGPT is great at deep analysis - get Claude to create a briefing and prompt for ChatGPT.


-- Business Dashboard Demo in Psimulang
-- Defines all dimensions, ORTHOs, data loads, and report layout

-- ---------------------------------------------------------------------
-- Dimension Definitions
-- ---------------------------------------------------------------------

DIMENSION Quarter
    FIELD Q1 INDEX 1
    FIELD Q2 INDEX 2
    FIELD Q3 INDEX 3
    FIELD Q4 INDEX 4
    FIELD "All Quarters" = SUM Q1:Q4
END DIMENSION

DIMENSION Month
    FIELD Jan INDEX 1
    FIELD Feb INDEX 2
    FIELD Mar INDEX 3
    FIELD Apr INDEX 4
    FIELD May INDEX 5
    FIELD Jun INDEX 6
    FIELD Jul INDEX 7
    FIELD Aug INDEX 8
    FIELD Sep INDEX 9
    FIELD Oct INDEX 10
    FIELD Nov INDEX 11
    FIELD Dec INDEX 12
    FIELD "All Months" = SUM Jan:Dec
END DIMENSION

DIMENSION Region
    FIELD "North America"
    FIELD Europe
    FIELD "Asia Pacific"
    FIELD "Rest of World"
    FIELD "All Regions" = SUM "North America";Europe;"Asia Pacific";"Rest of World"
END DIMENSION

DIMENSION Product
    FIELD Software
    FIELD Services
    FIELD Hardware
    FIELD "All Products" = SUM Software;Services;Hardware
END DIMENSION

DIMENSION ProductCategory
    FIELD Enterprise
    FIELD Professional
    FIELD Standard
END DIMENSION

DIMENSION Segment
    FIELD Enterprise
    FIELD "Mid-Market"
    FIELD "Small Business"
END DIMENSION

DIMENSION ExpenseCategory
    FIELD "Sales & Marketing"
    FIELD "R&D"
    FIELD Operations
    FIELD Administration
END DIMENSION

DIMENSION Measure
    FIELD Cost
    FIELD GrossMargin
    FIELD NetMargin
    FIELD OperatingMargin
    FIELD Revenue
END DIMENSION

DIMENSION Metric
    FIELD "Order Fulfillment Rate"
    FIELD "Inventory Turnover"
    FIELD "Customer Satisfaction"
    FIELD "Employee Productivity"
END DIMENSION

DIMENSION KPIAttribute
    FIELD Achievement
    FIELD Target
    FIELD Status
END DIMENSION

-- Measure dimension will be populated via LOAD operations as needed

-- ---------------------------------------------------------------------
-- ORTHO Definitions
-- ---------------------------------------------------------------------

ORTHO BusinessFinancials
    DIMENSIONS Quarter, Measure
END ORTHO

ORTHO BusinessRegionalSales
    DIMENSIONS Region, Product, Quarter
END ORTHO

ORTHO BusinessMonthlySales
    DIMENSIONS Month, ProductCategory
END ORTHO

ORTHO BusinessCustomers
    DIMENSIONS Segment, Month
END ORTHO

ORTHO CustomerCounts
    DIMENSIONS Segment, Month
END ORTHO

ORTHO BusinessKPIs
    DIMENSIONS Metric, KPIAttribute
END ORTHO

ORTHO BusinessExpenses
    DIMENSIONS ExpenseCategory
END ORTHO

-- ---------------------------------------------------------------------
-- Data Loads from Polytope CSV sources
-- ---------------------------------------------------------------------

LET load_financials = LOAD "Quarterly financial data"
    READ FILE HOST "polytope/data/business_quarterly_revenue.csv"
    WRITE ORTHO BusinessFinancials
    FORMAT COLUMN DIRECTED USING "," "\""

    COLUMN 1 TEXT
    COLUMN 2 TEXT
    COLUMN 3 NUMBER

    DIMENSION Quarter FIELD FROM COLUMN 1
    DIMENSION Measure FIELD FROM COLUMN 2

    VALUE FROM COLUMN 3
    SKIP 1
END LOAD

LET load_regional = LOAD "Regional sales data"
    READ FILE HOST "polytope/data/business_regional_sales.csv"
    WRITE ORTHO BusinessRegionalSales
    FORMAT COLUMN DIRECTED USING "," "\""

    COLUMN 1 TEXT
    COLUMN 2 TEXT
    COLUMN 3 TEXT
    COLUMN 4 NUMBER

    DIMENSION Region FIELD FROM COLUMN 1
    DIMENSION Product FIELD FROM COLUMN 2
    DIMENSION Quarter FIELD FROM COLUMN 3

    VALUE FROM COLUMN 4
    SKIP 1
END LOAD

LET load_monthly = LOAD "Monthly sales data"
    READ FILE HOST "polytope/data/business_monthly_sales.csv"
    WRITE ORTHO BusinessMonthlySales
    FORMAT COLUMN DIRECTED USING "," "\""

    COLUMN 1 TEXT
    COLUMN 2 TEXT
    COLUMN 3 NUMBER

    DIMENSION Month FIELD FROM COLUMN 1
    DIMENSION ProductCategory FIELD FROM COLUMN 2

    VALUE FROM COLUMN 3
    SKIP 1
END LOAD

LET load_customers = LOAD "Customer growth data"
    READ FILE HOST "polytope/data/business_customer_data.csv"
    WRITE ORTHO BusinessCustomers
    FORMAT COLUMN DIRECTED USING "," "\""

    COLUMN 1 TEXT
    COLUMN 2 TEXT
    COLUMN 3 NUMBER
    COLUMN 4 NUMBER
    COLUMN 5 NUMBER

    DIMENSION Segment FIELD FROM COLUMN 1
    DIMENSION Month FIELD FROM COLUMN 2

    VALUE FROM COLUMN 3
    SKIP 1
END LOAD

LET load_customer_counts = LOAD "Customer counts"
    READ FILE HOST "polytope/data/business_customer_data.csv"
    WRITE ORTHO CustomerCounts
    FORMAT COLUMN DIRECTED USING "," "\""

    COLUMN 1 TEXT
    COLUMN 2 TEXT
    COLUMN 3 NUMBER
    COLUMN 4 NUMBER
    COLUMN 5 NUMBER

    DIMENSION Segment FIELD FROM COLUMN 1
    DIMENSION Month FIELD FROM COLUMN 2

    VALUE FROM COLUMN 4
    SKIP 1
END LOAD

LET load_kpis_achievement = LOAD "KPI metrics - Achievement"
    READ FILE HOST "polytope/data/business_kpi_metrics.csv"
    WRITE ORTHO BusinessKPIs
    FORMAT COLUMN DIRECTED USING "," "\""

    COLUMN 1 TEXT
    COLUMN 2 NUMBER
    COLUMN 3 NUMBER
    COLUMN 4 TEXT

    DIMENSION Metric FIELD FROM COLUMN 1
    DIMENSION KPIAttribute FIELD "Achievement"

    VALUE FROM COLUMN 2
    SKIP 1
END LOAD

LET load_kpis_target = LOAD "KPI metrics - Target"
    READ FILE HOST "polytope/data/business_kpi_metrics.csv"
    WRITE ORTHO BusinessKPIs
    FORMAT COLUMN DIRECTED USING "," "\""

    COLUMN 1 TEXT
    COLUMN 2 NUMBER
    COLUMN 3 NUMBER
    COLUMN 4 TEXT

    DIMENSION Metric FIELD FROM COLUMN 1
    DIMENSION KPIAttribute FIELD "Target"

    VALUE FROM COLUMN 3
    SKIP 1
END LOAD

LET load_kpis_status = LOAD "KPI metrics - Status"
    READ FILE HOST "polytope/data/business_kpi_metrics.csv"
    WRITE ORTHO BusinessKPIs
    FORMAT COLUMN DIRECTED USING "," "\""

    COLUMN 1 TEXT
    COLUMN 2 NUMBER
    COLUMN 3 NUMBER
    COLUMN 4 NUMBER

    DIMENSION Metric FIELD FROM COLUMN 1
    DIMENSION KPIAttribute FIELD "Status"

    VALUE FROM COLUMN 4
    SKIP 1
END LOAD

LET load_expenses = LOAD "Expense breakdown"
    READ FILE HOST "polytope/data/business_expenses.csv"
    WRITE ORTHO BusinessExpenses
    FORMAT COLUMN DIRECTED USING "," "\""

    COLUMN 1 TEXT
    COLUMN 2 NUMBER

    DIMENSION ExpenseCategory FIELD FROM COLUMN 1

    VALUE FROM COLUMN 2
    SKIP 1
END LOAD

LET consolidatedRegional = CONSOLIDATE BusinessRegionalSales
LET consolidatedFinancials = CONSOLIDATE BusinessFinancials
LET consolidatedMonthly = CONSOLIDATE BusinessMonthlySales
LET consolidatedCustomers = CONSOLIDATE BusinessCustomers
LET consolidatedCounts = CONSOLIDATE CustomerCounts
LET consolidatedExpenses = CONSOLIDATE BusinessExpenses
LET consolidatedKPIs = CONSOLIDATE BusinessKPIs

-- ---------------------------------------------------------------------
-- Business Dashboard Report Definition
-- ---------------------------------------------------------------------

REPORT BusinessDashboard
    TITLE "Q4 2023 Business Performance Dashboard"

    SECTION "Executive Summary"
        PRINT LEFT
            "Key performance indicators and business metrics for Q4 2023."
        END PRINT

        TABLE "Key Metrics"
            SOURCE ORTHO BusinessKPIs
            COLUMNS KPIAttribute
                HEADINGS <BOLD, COLOR blue>
                FIELDS Achievement <COLOR green, FIXED 1>
                FIELDS Target <FIXED 1>
                FIELDS Status <FIXED 0, RANGE 0 <COLOR red, BOLD, REPLACE "Below Target">, RANGE 1 <COLOR orange, BOLD, REPLACE "On Target">, RANGE 2 <COLOR green, BOLD, REPLACE "Good">, RANGE 3 <COLOR lightblue, BOLD, REPLACE "Excellent">>
            END COLUMNS
            ROWS Metric
                HEADINGS <BOLD>
                FIELDS Metric
            END ROWS
        END TABLE

        BESIDE
        GRAPH "Revenue by Product Line" PIE
            SOURCE ORTHO BusinessRegionalSales
            FILTER Quarter = "Q4", Region = "All Regions"
            METADATA measure = "Sales"
            FIELDS LIST Software, Services, Hardware
            PLOT Product
            MEASURES Sales
            AXIS NUMERIC
            LEGEND RIGHT
        END GRAPH

        BESIDE
        GRAPH "Quarterly Revenue Trend" BAR
            SOURCE ORTHO BusinessFinancials
            METADATA measure = "Revenue"
            PLOT Quarter
            MEASURES Revenue
            AXIS NUMERIC
            LEGEND TOP
        END GRAPH
    END SECTION

    SECTION "Sales Performance Analysis"
        PRINT LEFT
            "Detailed breakdown of sales performance across regions and product categories."
        END PRINT

        BESIDE
        SECTION "Regional Performance"
            TABLE "North America Performance"
                SOURCE ORTHO BusinessRegionalSales
                FILTER Quarter = "Q4", Region = "North America"
                COLUMNS Region
                    HEADINGS <COLOR navy>
                    FIELDS Sales <CURRENCY "usd", FIXED 2, RIGHT, COLOR green, BOLD>
                END COLUMNS
                ROWS Product
                    HEADINGS <BOLD>
                    FIELDS Product
                END ROWS
            END TABLE

            TABLE "Europe Performance"
                SOURCE ORTHO BusinessRegionalSales
                FILTER Quarter = "Q4", Region = "Europe"
                COLUMNS Region
                    HEADINGS <COLOR navy>
                    FIELDS Sales <CURRENCY "usd", FIXED 2, RIGHT>
                END COLUMNS
                ROWS Product
                    HEADINGS <BOLD>
                    FIELDS Product
                END ROWS
            END TABLE

            TABLE "Asia Pacific Performance"
                SOURCE ORTHO BusinessRegionalSales
                FILTER Quarter = "Q4", Region = "Asia Pacific"
                COLUMNS Region
                    HEADINGS <COLOR navy>
                    FIELDS Sales <CURRENCY "usd", FIXED 2, RIGHT>
                END COLUMNS
                ROWS Product
                    HEADINGS <BOLD>
                    FIELDS Product
                END ROWS
            END TABLE

            TABLE "Rest of World Performance"
                SOURCE ORTHO BusinessRegionalSales
                FILTER Quarter = "Q4", Region = "Rest of World"
                COLUMNS Region
                    HEADINGS <COLOR navy>
                    FIELDS Sales <CURRENCY "usd", FIXED 2, RIGHT>
                END COLUMNS
                ROWS Product
                    HEADINGS <BOLD>
                    FIELDS Product
                END ROWS
            END TABLE

            TABLE "All Regions Performance"
                SOURCE ORTHO BusinessRegionalSales
                FILTER Quarter = "Q4", Region = "All Regions"
                COLUMNS Region
                    HEADINGS <COLOR navy>
                    FIELDS Sales <CURRENCY "usd", FIXED 2, RIGHT, BOLD, COLOR blue>
                END COLUMNS
                ROWS Product
                    HEADINGS <BOLD>
                    FIELDS Product
                END ROWS
            END TABLE
        END SECTION

        BESIDE
        GRAPH "Regional Sales Distribution" BAR
            SOURCE ORTHO BusinessRegionalSales
            FILTER Quarter = "Q4"
            METADATA measure = "Sales", Product = "All Products"
            PLOT Region
            MEASURES Sales
            AXIS NUMERIC
            LEGEND TOP
        END GRAPH

        PRINT LEFT
            "Product performance across all regions:"
        END PRINT

        GRAPH "Product Sales by Region" BAR
            SOURCE ORTHO BusinessRegionalSales
            FILTER Quarter = "Q4"
            METADATA measure = "Sales"
            PLOT Region
            REFERENCE Product
            MEASURES Sales
            AXIS NUMERIC
            LEGEND RIGHT
        END GRAPH

        GRAPH "Monthly Sales Trend" LINE
            SOURCE ORTHO BusinessMonthlySales
            METADATA measure = "Sales"
            FIELDS RANGE Jan TO Dec
            PLOT Month
            REFERENCE ProductCategory
            MEASURES Sales
            AXIS NUMERIC
            LEGEND TOP
        END GRAPH
    END SECTION

    SECTION "Customer Analytics"
        PRINT LEFT
            "Customer segmentation and behavior analysis."
        END PRINT

        BESIDE
        TABLE "Customer Segments"
            SOURCE ORTHO BusinessCustomers
            FILTER Month = "Dec"
            COLUMNS Month
                HEADINGS <BOLD>
                FIELDS NewCustomers
            END COLUMNS
            ROWS Segment
                HEADINGS <BOLD>
                FIELDS Segment
            END ROWS
        END TABLE

        BESIDE
        GRAPH "Customer Distribution" PIE
            SOURCE ORTHO CustomerCounts
            FILTER Month = "Dec"
            METADATA measure = "CustomerCount"
            PLOT Segment
            MEASURES CustomerCount
            AXIS NUMERIC
            LEGEND RIGHT
        END GRAPH

        GRAPH "Customer Growth Trend" AREA
            SOURCE ORTHO BusinessCustomers
            METADATA measure = "NewCustomers"
            FIELDS RANGE Jan TO Dec
            PLOT Month
            REFERENCE Segment
            MEASURES NewCustomers
            AXIS NUMERIC
            LEGEND TOP
        END GRAPH
    END SECTION

    SECTION "Financial Performance"
        PRINT LEFT
            "Detailed financial metrics and profitability analysis."
        END PRINT

        TABLE "Q4 Financial Summary"
            SOURCE ORTHO BusinessFinancials
            FILTER Quarter = "Q4"
            COLUMNS Quarter
                HEADINGS <COLOR navy>
                FIELDS Value <CURRENCY "usd", FIXED 2, RIGHT, BOLD, RANGE :1000000 <COLOR red>, RANGE 1000000+:5000000 <COLOR orange>, RANGE 5000000: <COLOR green>>
            END COLUMNS
            ROWS Measure
                HEADINGS <BOLD>
                FIELDS Measure
            END ROWS
        END TABLE

        BESIDE
        GRAPH "Quarterly Revenue Trend" LINE
            SOURCE ORTHO BusinessFinancials
            METADATA measure = "Revenue"
            PLOT Quarter
            MEASURES Revenue
            AXIS NUMERIC
            LEGEND TOP
        END GRAPH

        BESIDE
        GRAPH "Q4 Expense Breakdown" BAR
            SOURCE ORTHO BusinessExpenses
            METADATA measure = "Expenses"
            PLOT ExpenseCategory
            MEASURES Expenses
            AXIS NUMERIC
            LEGEND TOP
        END GRAPH
    END SECTION

    SECTION "Operational Metrics"
        PRINT LEFT
            "Key operational performance indicators."
        END PRINT

        BESIDE
        TABLE "Operational KPIs"
            SOURCE ORTHO BusinessKPIs
            COLUMNS KPIAttribute
                HEADINGS <BOLD>
                FIELDS Achievement;Target;Status
            END COLUMNS
            ROWS Metric
                HEADINGS <BOLD>
                FIELDS Metric
            END ROWS
        END TABLE

        BESIDE
        GRAPH "KPI Achievement" BAR
            SOURCE ORTHO BusinessKPIs
            METADATA measure = "Achievement"
            PLOT Metric
            MEASURES Achievement
            AXIS NUMERIC
            LEGEND RIGHT
        END GRAPH
    END SECTION
END REPORT

LET dashboard_result = CALL BusinessDashboard
WRITE "Business dashboard generated (CALL result = " dashboard_result ")"

The psimulang code (including declarative data and report blocks for a business dashboard)


Example REPORT generated from polytope (DSL or psimulang binding syntax)


Psimulang syntax - tried gemini 2.5 Pro and codex. Codex was promising, but error rates quite high. Claude faired better. ChatGPT was excellent at debugging though, even when Claude seemed stuck.

Feedback on differences between LLMs:
gemini: big context window - great for wide code analysis, documentation etc.
codex/chatGPT (4o/5): great at complex analysis and fixing complex bugs
claude (4.1 opus, 4.5 sonnet): great at general coding