Combining labelled and unlabelled data in the design of pattern classification systems

Bogdan Gabrys, Lina Petrakieva

Research output: Contribution to journalArticle

Abstract

There has been much interest in applying techniques that incorporate knowledge from unlabelled data into a supervised learning system but less effort has been made to compare the effectiveness of different approaches and to analyse the behaviour of the learning system when using different ratios of labelled to unlabelled data. In this paper various methods for learning from labelled and unlabelled data are first discussed and categorised into one of three major groups: pre-labelling, post-labelling and semi-supervised approaches. Their generalised formal description and extensive experimental analysis is then provided. The experimental results show that when supported by unlabelled samples much less labelled data is generally required to build a classifier without compromising the classification performance. If only a very limited amount of labelled data is available the results based on random selection of labelled samples show high variability and the performance of the final classifier is more dependent on how reliable the labelled data samples are rather than use of additional unlabelled data. In response to this finding three types of static (one-step) selection methods guided by a clustering information and various options of allocating a number of samples within clusters and their distributions have been proposed and analysed. A significant improvement compared to the random selection of the labelled samples have been observed when using these selective sampling techniques.
Original languageEnglish
Pages (from-to)251-273
Number of pages23
JournalInternational Journal of Approximate Reasoning
Volume35
Issue number3
DOIs
Publication statusPublished - Mar 2004

Fingerprint

Pattern Classification
Labeling
Pattern recognition
Learning systems
Classifiers
Supervised learning
Sampling
Learning Systems
Classifier
Supervised Learning
Experimental Analysis
Design
Clustering
Dependent
Experimental Results

Keywords

  • combined learning methods
  • supervised learning
  • unsupervised learning
  • semi-supervised clustering
  • pattern classification
  • random selection
  • preliminary selection

Cite this

@article{f76867cc885b454f9152c7a5df9c0a1f,
title = "Combining labelled and unlabelled data in the design of pattern classification systems",
abstract = "There has been much interest in applying techniques that incorporate knowledge from unlabelled data into a supervised learning system but less effort has been made to compare the effectiveness of different approaches and to analyse the behaviour of the learning system when using different ratios of labelled to unlabelled data. In this paper various methods for learning from labelled and unlabelled data are first discussed and categorised into one of three major groups: pre-labelling, post-labelling and semi-supervised approaches. Their generalised formal description and extensive experimental analysis is then provided. The experimental results show that when supported by unlabelled samples much less labelled data is generally required to build a classifier without compromising the classification performance. If only a very limited amount of labelled data is available the results based on random selection of labelled samples show high variability and the performance of the final classifier is more dependent on how reliable the labelled data samples are rather than use of additional unlabelled data. In response to this finding three types of static (one-step) selection methods guided by a clustering information and various options of allocating a number of samples within clusters and their distributions have been proposed and analysed. A significant improvement compared to the random selection of the labelled samples have been observed when using these selective sampling techniques.",
keywords = "combined learning methods, supervised learning, unsupervised learning, semi-supervised clustering, pattern classification, random selection, preliminary selection",
author = "Bogdan Gabrys and Lina Petrakieva",
note = "This paper is an invited extended version of a paper presented at the EUNITE 2002",
year = "2004",
month = "3",
doi = "10.1016/j.ijar.2003.08.005",
language = "English",
volume = "35",
pages = "251--273",
journal = "International Journal of Approximate Reasoning",
issn = "0888-613X",
publisher = "Elsevier B.V.",
number = "3",

}

Combining labelled and unlabelled data in the design of pattern classification systems. / Gabrys, Bogdan; Petrakieva, Lina.

In: International Journal of Approximate Reasoning, Vol. 35, No. 3, 03.2004, p. 251-273.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Combining labelled and unlabelled data in the design of pattern classification systems

AU - Gabrys, Bogdan

AU - Petrakieva, Lina

N1 - This paper is an invited extended version of a paper presented at the EUNITE 2002

PY - 2004/3

Y1 - 2004/3

N2 - There has been much interest in applying techniques that incorporate knowledge from unlabelled data into a supervised learning system but less effort has been made to compare the effectiveness of different approaches and to analyse the behaviour of the learning system when using different ratios of labelled to unlabelled data. In this paper various methods for learning from labelled and unlabelled data are first discussed and categorised into one of three major groups: pre-labelling, post-labelling and semi-supervised approaches. Their generalised formal description and extensive experimental analysis is then provided. The experimental results show that when supported by unlabelled samples much less labelled data is generally required to build a classifier without compromising the classification performance. If only a very limited amount of labelled data is available the results based on random selection of labelled samples show high variability and the performance of the final classifier is more dependent on how reliable the labelled data samples are rather than use of additional unlabelled data. In response to this finding three types of static (one-step) selection methods guided by a clustering information and various options of allocating a number of samples within clusters and their distributions have been proposed and analysed. A significant improvement compared to the random selection of the labelled samples have been observed when using these selective sampling techniques.

AB - There has been much interest in applying techniques that incorporate knowledge from unlabelled data into a supervised learning system but less effort has been made to compare the effectiveness of different approaches and to analyse the behaviour of the learning system when using different ratios of labelled to unlabelled data. In this paper various methods for learning from labelled and unlabelled data are first discussed and categorised into one of three major groups: pre-labelling, post-labelling and semi-supervised approaches. Their generalised formal description and extensive experimental analysis is then provided. The experimental results show that when supported by unlabelled samples much less labelled data is generally required to build a classifier without compromising the classification performance. If only a very limited amount of labelled data is available the results based on random selection of labelled samples show high variability and the performance of the final classifier is more dependent on how reliable the labelled data samples are rather than use of additional unlabelled data. In response to this finding three types of static (one-step) selection methods guided by a clustering information and various options of allocating a number of samples within clusters and their distributions have been proposed and analysed. A significant improvement compared to the random selection of the labelled samples have been observed when using these selective sampling techniques.

KW - combined learning methods

KW - supervised learning

KW - unsupervised learning

KW - semi-supervised clustering

KW - pattern classification

KW - random selection

KW - preliminary selection

U2 - 10.1016/j.ijar.2003.08.005

DO - 10.1016/j.ijar.2003.08.005

M3 - Article

VL - 35

SP - 251

EP - 273

JO - International Journal of Approximate Reasoning

JF - International Journal of Approximate Reasoning

SN - 0888-613X

IS - 3

ER -