Add table_histogram endpoint#677
Conversation
|
The time to calculate a histogram on demand will directly depend on the number of rows in the table and likely won't be sustainable for tables with millions of rows, which we are seeing regularly now. Our strategy is to calculate column statistics for most numeric columns at the time of table creation and store them in the table metadata (The
All custom metadata fields are already returned via the |
|
@knabar Thanks, it would be great to see some sample code for how those stats are generated and saved to the table. I'm also curious as to how they are used to generate a histogram curve in the client? While I appreciate that the histogram endpoint in this PR may not scale to all tables (if the row count is very large), I still think it is useful in situations where the table has a smaller row_count and no histogram/stats have previously been calculated. Testing with a local omero-web, connecting to an idr server, using a table with 18k rows, loading a whole column with |
|
Since the histogram code relies on the data to fit into a single table slice call, I agree that the danger is probably minimal, since a client application could also just call the slice of the same size itself. If I am reading the code correctly though it looks like the histogram code does not take a incomplete slice into consideration. If say a table has 3 million rows, |
|
The I think that the |
We have a histogram functionality in
omero-paradeand now I also need it foriviewer(ome/omero-iviewer#532), so it makes sense for this to go intoomero-web.This endpoint behaves similarly to the existing OMERO.table
sliceendpoint e.g./webgateway/table/FILE_ID/slice/?columns=0&rows=0-100and wraps thetable_slice()for loading the data, then generates a histogram using numpy and returns the result.By default, we use ALL the rows to generate the histogram.
Since we don't want to have load the table twice (to get the row-count before passing the
rows = 0-row_count-1totable_slice(), I have updated thetable_slice()to allowrows=*(no change on max amount of data permitted).So you can now do
/webgateway/table/FILE_ID/slice/?columns=0&rows=*Histogram supports the
binsrequest parameter (int or string) - behaves as described at https://numpy.org/devdocs/reference/generated/numpy.histogram.htmlSample response to
/webgateway/table/15908/histogram/?columns=2,3on merge-ci