Skip to content
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# OpenML AutoML Benchmark

The OpenML AutoML Benchmark provides a framework for evaluating and comparing open-source AutoML systems. The system is *extensible* because you can [add your own](https://github.com/openml/automlbenchmark/blob/master/docs/extending.md) AutoML frameworks and datasets. For a thorough explanation of the benchmark, and evaluation of results, you can read our [paper](https://openml.github.io/automlbenchmark/paper.html) which was accepted at the [2019 ICML AutoML Workshop](https://sites.google.com/view/automl2019icml/).
The OpenML AutoML Benchmark provides a framework for evaluating and comparing open-source AutoML systems. The system is *extensible* because you can [add your own](https://github.com/openml/automlbenchmark/blob/master/docs/extending.md) AutoML frameworks and datasets. For a thorough explanation of the benchmark, and evaluation of results, refer to our preprint [AMLB: an AutoML Benchmark](https://arxiv.org/pdf/2207.12560.pdf).

_**NOTE:**_ _This benchmarking framework currently features binary and multiclass classification; extending to regression is a work in progress. Please file an issue with any concerns/questions._

Expand Down Expand Up @@ -28,7 +28,7 @@ This toolkit aims to address these problems by setting up standardized environme
Documentation: <https://openml.github.io/automlbenchmark/>

### Features:
* Curated suites of [benchmarking datasets](https://openml.github.io/automlbenchmark/benchmark_datasets.html) from [OpenML](https://www.openml.org/s/218/data).
* Curated suites of [benchmarking datasets]([https://openml.github.io/automlbenchmark/benchmark_datasets.html](https://github.com/openml/automlbenchmark/blob/master/docs/benchmark_datasets.md) from [OpenML](https://www.openml.org/s/218/data).
Comment thread
PGijsbers marked this conversation as resolved.
Outdated
Comment thread
PGijsbers marked this conversation as resolved.
Outdated
* Includes code to benchmark a number of [popular AutoML systems](https://openml.github.io/automlbenchmark/automl_overview.html) on regression and classification tasks.
* [New AutoML systems can be added](./HOWTO.md#add-an-automl-framework)
* Experiments can be run in Docker or Singularity containers
Expand Down
298 changes: 298 additions & 0 deletions reports/CleanResults.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,298 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "d3d4b7a3",
"metadata": {},
"source": [
"Given the raw result files generate a final cleaned version of results:\n",
" - Only the latest results; \n",
" - not any jobs which failed because of the benchmark framework and were redone, or\n",
" - frameworks which were later excluded because issues were identified with the integration itself.\n",
" - Transfer `RandomForest` results from 1 hour to 4 hour if 1 hour jobs ran to completion.\n",
" - Impute `TunedRandomForest` results with random forest results of the same budget.\n",
" "
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "7954b9cd",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"\n",
"def filter_results(results):\n",
" results = results.sort_values(by=\"utc\", na_position=\"first\")\n",
" # There was a mistake in the old KDDCup09-Upselling task, so it was replaced with a new task.\n",
" results = results[results[\"id\"] != \"openml.org/t/360947\"]\n",
" # Use only the latest results (earlier failures don't count, only justified reruns are done)\n",
" results = results.drop_duplicates([\"framework\", \"task\", \"fold\"], keep=\"last\")\n",
" results = results[~results[\"framework\"].isin([\"autoxgboost\", \"GAMA\", \"MLPlanWEKA\"])]\n",
" return results\n"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "b9dfccc2",
"metadata": {},
"outputs": [],
"source": [
"# Pick the results to\n",
"ttype = \"regression\"\n",
"ttype = \"classification\""
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "550aaf17",
"metadata": {},
"outputs": [],
"source": [
"one = pd.read_csv(r\"http://openml-test.win.tue.nl/amlb/{}_1h8c.csv\".format(ttype))\n",
"one = filter_results(one)\n",
"four = pd.read_csv(r\"http://openml-test.win.tue.nl/amlb/{}_4h8c.csv\".format(ttype))\n",
"four = filter_results(four)"
]
},
{
"cell_type": "markdown",
"id": "a7fc8cc4",
"metadata": {},
"source": [
"\n",
"First a sanity check that 1H RF has results for every job (even if it is not a full forest)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "a229de00",
"metadata": {},
"outputs": [],
"source": [
"rf = one[one.framework == \"RandomForest\"]\n",
"assert len(rf) == (330 if ttype == \"regression\" else 710)\n",
"assert rf[\"info\"].isna().all()"
]
},
{
"cell_type": "markdown",
"id": "05f43033",
"metadata": {},
"source": [
"Impute one hour Tuned Random Forest with Random Forest:"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "f1a98a6e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Original dataset with 8519 entries of which 272 had missing results.\n",
"The new dataset with 8519 entries of which 252 are missing results because 20 results were imputed.\n"
]
}
],
"source": [
"def impute_trf_with_rf(results):\n",
" rf = results[results.framework == \"RandomForest\"]\n",
" trf = results[results.framework == \"TunedRandomForest\"]\n",
" missing_results = trf[~trf[\"info\"].isna()][[\"task\", \"fold\"]].itertuples(index=False, name=None)\n",
"\n",
" imputation_values = rf.set_index([\"task\", \"fold\"]).loc[missing_results].reset_index().copy()\n",
" imputation_values[\"framework\"] = \"TunedRandomForest\"\n",
"\n",
" trf_success = trf[trf[\"info\"].isna()]\n",
" trf_imputed = pd.concat([trf_success, imputation_values])\n",
"\n",
" no_trf = results[results.framework != \"TunedRandomForest\"]\n",
" imputed = pd.concat([no_trf, trf_imputed])\n",
" print(f\"Original dataset with {len(results)} entries of which {sum(~results['info'].isna())} had missing results.\")\n",
" print(f\"The new dataset with {len(imputed)} entries of which {sum(~imputed['info'].isna())} are missing results because {len(imputation_values)} results were imputed.\")\n",
" return imputed\n",
"\n",
"one_imputed = impute_trf_with_rf(one) "
]
},
{
"cell_type": "markdown",
"id": "79cacc03",
"metadata": {},
"source": [
"Impute four hour Random Forest with complete one hour Random Forest:"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "7bde3af8",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Original dataset had 8570 entries of which 50 were 4H RF.\n",
"The new dataset has 9230 entries of which 710 are 4H RF.\n"
]
}
],
"source": [
"rf_one = one[one.framework == \"RandomForest\"]\n",
"keep = rf_one[(rf_one[\"models_count\"] == 2000.0) & (~rf_one[\"result\"].isna())].copy()\n",
"keep[\"constraint\"] = \"4h8c_gp3\"\n",
"four_added = pd.concat([four, keep])\n",
"print(f\"Original dataset had {len(four):6d} entries of which {len(four[four.framework == 'RandomForest']):6d} were 4H RF.\")\n",
"print(f\"The new dataset has {len(four_added):6d} entries of which {len(four_added[four_added.framework == 'RandomForest']):6d} are 4H RF.\")"
]
},
{
"cell_type": "markdown",
"id": "bf74eb01",
"metadata": {},
"source": [
"Above file is useful to avoid running 4H RF experiments which would grow the same (sized) forests as the 1H budget ones. This result file was also used to `--resume` from to automatically find the remaining 4H RF experiments."
]
},
{
"cell_type": "markdown",
"id": "9ceb0e57",
"metadata": {},
"source": [
"After completing all RF results, we can use them to impute the `TunedRandomForest` where it otherwise failed."
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "b5fcdcf6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Original dataset with 9230 entries of which 451 had missing results.\n",
"The new dataset with 9230 entries of which 427 are missing results because 24 results were imputed.\n"
]
}
],
"source": [
"four_imputed = impute_trf_with_rf(four_added) "
]
},
{
"cell_type": "markdown",
"id": "15b8d68b",
"metadata": {},
"source": [
"We only need to perform the `constantpredictor` baseline once, since the result is deterministic regardless of time budget. We performed the set of experiments with a four hour time budget and transfer the to one hour:"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "d410ddc3",
"metadata": {},
"outputs": [],
"source": [
"constants = four_imputed[four_imputed.framework == \"constantpredictor\"].copy()\n",
"constants[\"constraint\"] = \"1h8c_gp3\"\n",
"one_imputed = pd.concat([one_imputed, constants])"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "ee5a7991",
"metadata": {},
"outputs": [],
"source": [
"# The constantpredictor experiments were ran without the gp3 SSD, but we can rename the result.\n",
"four_imputed.loc[four_imputed.framework == \"constantpredictor\", \"constraint\"] = \"4h8c_gp3\""
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "80869c02",
"metadata": {},
"outputs": [],
"source": [
"final = pd.concat([one_imputed, four_imputed])"
]
},
{
"cell_type": "markdown",
"id": "76860af9",
"metadata": {},
"source": [
"A few sanity checks:"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "22acb1bf",
"metadata": {},
"outputs": [],
"source": [
"if ttype == \"classification\":\n",
" jobs = 71 * 10 * 13 * 2 - 1 # tasks * folds * frameworks * time budgets - known failures\n",
"else:\n",
" jobs = 33 * 10 * 12 * 2 # autosklearn2 does not support regression\n",
"\n",
"assert len(final) == jobs\n",
"assert len(final) == len(final.drop_duplicates([\"framework\", \"task\", \"fold\", \"constraint\"]))"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "2cf9a9ce",
"metadata": {},
"outputs": [],
"source": [
"final.to_csv(f\"{ttype}_all_cleaned.csv\", index=False)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1b503d5f",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading