ISPAI-CSIMQ Dataset

The ISPAI-CSIMQ dataset contains 600 randomly selected sentences from 10 information security guidelines from the British National Health Service. These sentences are classified as speech acts, based on Searle's speech act theory and the following categories: assertive, commisive, declarative, directive, expressive.

Researchers can use the dataset to test, compare, and reproduce classifications of sentences in information security guidelines as speech acts. For example, it can be used to compare how well large language models classify content in information security policies.

Three researchers each independently classified 200 of the randomly selected ISP statements. The classification followed Searle’s taxonomy of speech acts, i.e., distinguishing between assertives, commissives, declaratives, directives, and expressives.

When using the dataset, please reference to the paper pubclished by Karlsson F, Gao S, Krogstie J, & Aro-Sati L (2026) Advancing a Speech Act-Based Model to Improve Future Quality of Information Security Policies Using Large Language Models. Complex Systems Informatics and Modeling Quarterly.

ISPAI-CSIMQ Dataset