# Datasets

One of the major benefits of this package is the possibility to quickly and easily generate datasets of reference problems and test your algorithms against (existing) datasets.

Especially when benchmarking your novel algorithm against commonly used reference datasets, this will allow a simple reproducibility. A collection of some reference datasets can be found at https://github.com/klb2/qmkpy-datasets.

In the following, an example of how a repository for a research paper could look like, is presented.

## Research Paper Repository

The file structure can be as simple as shown in the following.

```
project
├── dataset/
│ ├── problem1.txt
│ ├── problem2.txt
│ └── ...
└── my_algorithm.py
```

The directory `dataset/`

contains all problem instances of the reference
dataset, which are saved by one of the functions in `qmkpy.io`

.

The file `my_algorithm.py`

contains the implementation of your algorithm.
It could look something like the following. Details on how to implement new
algorithms can also be found on the Implementing a Novel Algorithm page.

```
1import os
2import numpy as np
3import qmkpy
4
5def my_algorithm(profits, weights, capacities):
6 # DOING SOME STUFF
7 return assignments
8
9def main():
10 results = []
11 for root, dirnames, filenames in os.walk("dataset"):
12 for problem in filenames:
13 qmkp = qmkpy.QMKProblem.load(problem, strategy="txt")
14 qmkp.algorithm = my_algorithm
15 solution, profit = qmkp.solve()
16 results.append(profit)
17 print(f"Average profit: {np.mean(results):.2f}")
18
19if __name__ == "__main__":
20 main()
```

This simple script solves all problems of the dataset using your algorithm and prints the average total profit at the end.