Glossary

Definitions of terms, phrases and abbreviations used in our questions.

TERMDEFINITION
FairnessDecisions are unbiased to identity features such as gender, race, nationality, sexual orientation, religion, political opinion, skin colour, education, address and age.
AccountabilityDecisions should be traceable, reproducible and liable. All decisions and their consequences must be owned and explained by the decision maker or administrator of the decision maker.
ConfidentialityPersonal information which can be used to identify individuals or communities must be secure and above intentional or accidental misuse.
EthicsDecisions should follow sound thical considerations and not infringe on the fundamental human tights and wellbeing of people. It should also not harm the environment while it is being used or under development in any way.
TransparencyUnderstanding why a decision is made is key to establishing trust in the decision process. Accompanying decisions with explanations and analysis reports help understand the outcome.
SafetyAI systems should be robust, secure and safe throughout their entire lifecycle so that they function appropriately in conditions of normal use, foreseeable use or misuse, or other adverse conditions and do not pose an unreasonable safety risk.
DataRefers to interpretable information in the context of the world. Data can be in the form of text, numbers, images, audio, video, co-ordinates, address, email, phone numbers, names, machine logs, journals, sensor readings, etc. In our context, data encapsulates all text/media/documents related to the project and not limited to only datasets used for modelling and analyses.
ModelRefers to a function or process which uses data to draw inferences about the world. Models can be statistical, symbolic, mathematical, deterministic, stochastic, neural networks, flow charts, black-box, white-box, decision trees, etc. In our context, model represents the world in which the decisions are effective and the consequences of the decisions have repercussions.
Deploy(ment)Refers to a system which uses the model to make inferences on unseen events of the world. A deployment can be a script, a function in a program, an excel sheet, a web application, a decision making system, a toolkit, a library package, a form, a program, an application, a mobile app, a feature in a mobile app, etc. In our context, a deployment is an implementation of the model which is used to make inferences on unseen data. An implementation can refer to both online and offline use of the model.
Feature ImbalanceWhen a subset of features in the dataset contain most of the useful information to represent the datapoint. Also applies to range of the feature values.
Class ImbalanceWhen a subset of the classes are represented by most of the datapoints in the dataset. Ideally, the training set should sufficiently represent the test set.
Hyper-parameterA parameter that is set before the learning process begins. It affects the performance of the model.
Optimal Hyper-parameterA hyper-parameter value which achieves the model’s best performance is said to be optimal.
Active LearningA special case of machine learning in which a learning algorithm is able to interactively query the user to obtain the desired outputs at new data points.
Decision WorkflowA subset of the process which only contains decision nodes. Applies only to non-trivial system where an instance of the process does not explore the whole system.
DatasetA set of datapoints which serve as input and labels to a prediction model. The dataset includes all datapoints with and without corresponding labels. Only the datapoints with labels can be used in the training, validation and test sets.
Training SetA subset of the dataset used to teach the model about the data. Usually, 80% of the labelled dataset.
Validation SetA subset of the dataset used to tune the model with respect to the data. Usually, 10% of the labelled dataset.
Test SetA subset of the dataset used to evaluate the model on the data. Usually, 10% of the labelled dataset.
Inference SetAll datapoints which are not in the training, validation, test sets. These datapoints can be with and without corresponding labels.