ISPAI dataset

The ISPAI dataset contains 600 randomly selected sentences from 10 information security guidelines from the British National Health Service. These sentences are classified as speech acts, based on Searle's speech act theory and the following categories: assertive, commisive, declarative, directive, expressive.

Researchers can use the dataset to test, compare, and reproduce classifications of sentences in information security guidelines as speech acts. For example, it can be used to compare how well large language models classify content in information security policies.

The classiifcation was done by independently by three research. The measured Fleiss' Kappa for the classification is 0.74. This meant that out of the 600 ISP statements, the classifications differed for 147 statements. To reach a shared classification for these latter statements, the primarily used method was majority classification, i.e., if two of the researchers agreed, that classification was used. In cases where it was not possible to use the majority classification (7 statements), the researchers discussed the ISP statement to reach consensus.

When using the dataset, please reference to the paper pubclished by Aro-Sati, L, Karlsson, F & Gao, S published at the 2nd International Conference on Digital Sovereignty (ICDS) 2025.

ISPAI dataset