Skip to content

Add table_histogram endpoint#677

Open
will-moore wants to merge 3 commits into
ome:masterfrom
will-moore:table_histogram
Open

Add table_histogram endpoint#677
will-moore wants to merge 3 commits into
ome:masterfrom
will-moore:table_histogram

Conversation

@will-moore

@will-moore will-moore commented Jun 5, 2026

Copy link
Copy Markdown
Member

We have a histogram functionality in omero-parade and now I also need it for iviewer (ome/omero-iviewer#532), so it makes sense for this to go into omero-web.

This endpoint behaves similarly to the existing OMERO.table slice endpoint e.g. /webgateway/table/FILE_ID/slice/?columns=0&rows=0-100 and wraps the table_slice() for loading the data, then generates a histogram using numpy and returns the result.

By default, we use ALL the rows to generate the histogram.
Since we don't want to have load the table twice (to get the row-count before passing the rows = 0-row_count-1 to table_slice(), I have updated the table_slice() to allow rows=* (no change on max amount of data permitted).

So you can now do /webgateway/table/FILE_ID/slice/?columns=0&rows=*

Histogram supports the bins request parameter (int or string) - behaves as described at https://numpy.org/devdocs/reference/generated/numpy.histogram.html

Sample response to /webgateway/table/15908/histogram/?columns=2,3 on merge-ci

{
  "histograms": [
    {
      "column": "x_centroid",
      "histogram": [1449, 2750, 2982, 3161, 3393, 3455, 3012, 2643, 1161, 400],
      "bin_edges": [
        3.757423210144043, 766.061197490692, 1528.36497177124,
        2290.6687460517883, 3052.9725203323364, 3815.2762946128846,
        4577.580068893432, 5339.8838431739805, 6102.187617454529,
        6864.491391735077, 7626.795166015625
      ]
    },
    {
      "column": "y_centroid",
      "histogram": [52, 142, 32, 1388, 3627, 3905, 4269, 4326, 4111, 2554],
      "bin_edges": [
        39.39493064880371, 614.6632842636108, 1189.9316378784179,
        1765.199991493225, 2340.4683451080323, 2915.7366987228393,
        3491.0050523376467, 4066.2734059524537, 4641.541759567261,
        5216.810113182068, 5792.078466796875
      ]
    }
  ],
  "meta": {
    "columns": ["x_centroid", "y_centroid"],
    "rowCount": 24406,
    "columnCount": 13,
    "maxCells": 1000000
  }
}

@knabar

knabar commented Jun 19, 2026

Copy link
Copy Markdown
Member

The time to calculate a histogram on demand will directly depend on the number of rows in the table and likely won't be sustainable for tables with millions of rows, which we are seeing regularly now.

Our strategy is to calculate column statistics for most numeric columns at the time of table creation and store them in the table metadata (The roi column and a few others are excluded, as statistics are not meaningful there). Metadata fields created include

  • <column name>.min
  • <column name>.max
  • <column name>.mean
  • <column name>.median
  • <column name>.std
  • <column name>.skew
  • <column name>.kurtosis
  • <column name>.histogram.count
  • <column name>.histogram.division

All custom metadata fields are already returned via the webgateway/table/<id>/metadata/ endpoint.

@will-moore

Copy link
Copy Markdown
Member Author

@knabar Thanks, it would be great to see some sample code for how those stats are generated and saved to the table. I'm also curious as to how they are used to generate a histogram curve in the client?

While I appreciate that the histogram endpoint in this PR may not scale to all tables (if the row count is very large), I still think it is useful in situations where the table has a smaller row_count and no histogram/stats have previously been calculated.
Without this endpoint, the only way to create a histogram for a column is to load ALL the values into the browser and build a histogram in JavaScript which will scale less well than histogram generation in python on the web server.

Testing with a local omero-web, connecting to an idr server, using a table with 18k rows, loading a whole column with /webgateway/table/14209154/slice/?columns=2&rows=* takes 1.45 secs; the histogram for the same column takes about the same time.
Same with a bigger table webgateway/table/44583133/histogram/?columns=1 with 158k rows takes about 2.3 - 2.8 secs, and similar for the slice to load a whole column.
Because I'm running omero-web locally, both cases require the whole column to be retrieved from OMERO.server to my local web server, so we don't see a performance benefit. But when the web server is close to the OMERO.server, the histogram will benefit from moving less JSON data down the wire. The whole column slice JSON is about 1.25 MB in this case.

@knabar

knabar commented Jun 24, 2026

Copy link
Copy Markdown
Member

Since the histogram code relies on the data to fit into a single table slice call, I agree that the danger is probably minimal, since a client application could also just call the slice of the same size itself.

If I am reading the code correctly though it looks like the histogram code does not take a incomplete slice into consideration. If say a table has 3 million rows, MAX_TABLE_SLICE_SIZE will cause the table slice to only return the first million (by default) without raising an error, and the histogram will be only for those rows without indicating that fact.

@will-moore

Copy link
Copy Markdown
Member Author

The histogram endpoint response includes the same meta returned by the underlying slice function, so we can use that to check the total table size compared with the max limit:

"meta": {
  "columns": [
    "area (µm)"
  ],
  "rowCount": 1596,
  "columnCount": 15,
  "maxCells": 1000000
}

I think that the limit_generator used by the _table_slice() function will raise ValueError("Too many items") if you try to create a histogram on a whole table that is bigger than MAX_TABLE_SLICE_SIZE.
So you'll then need to use e.g. &rows=0-1000000 to retrieve a partial histogram.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants