HTML Microdata document

This HTML5 document contains 26 embedded RDF statements represented using HTML+Microdata notation.

The embedded RDF content will be recognized by any processor of HTML5 Microdata.

Namespace Prefixes

Prefix	IRI
n21	https://kar.kent.ac.uk/id/eprint/90835#
dcterms	http://purl.org/dc/terms/
n14	doi:10.22024/UniKent/
n2	https://kar.kent.ac.uk/id/eprint/
wdrs	http://www.w3.org/2007/05/powder-s#
n15	http://purl.org/ontology/bibo/status/
n17	https://kar.kent.ac.uk/id/subject/
rdfs	http://www.w3.org/2000/01/rdf-schema#
n16	https://demo.openlinksw.com/about/id/entity/https/raw.githubusercontent.com/annajordanous/CO644Files/main/
n3	http://eprints.org/ontology/
n18	http://www.loc.gov/loc.terms/relators/
bibo	http://purl.org/ontology/bibo/
n12	https://kar.kent.ac.uk/id/org/
rdf	http://www.w3.org/1999/02/22-rdf-syntax-ns#
n20	http://purl.org/ontology/bibo/degrees/
owl	http://www.w3.org/2002/07/owl#
n4	https://kar.kent.ac.uk/id/document/
n9	https://kar.kent.ac.uk/id/
xsdh	http://www.w3.org/2001/XMLSchema#
n11	https://demo.openlinksw.com/about/id/entity/https/www.cs.kent.ac.uk/people/staff/akj22/materials/CO644/
n19	https://kar.kent.ac.uk/id/person/
n8	https://kar.kent.ac.uk/90835/

Statements

Subject Item: n2:90835
rdf:type: n3:EPrint bibo:Article bibo:Thesis n3:ThesisEPrint
rdfs:seeAlso: n8:
owl:sameAs: n14:01.02.90835
n18:THS: n19:ext-r.palani@kent.ac.uk
n3:hasDocument: n4:3251383 n4:3251384 n4:3251381 n4:3251196 n4:3251207 n4:3251382
dcterms:issuer: n12:ext-e69ffaf65adbe669a239fc71d288812e n12:ext-6cd37a476c4a651d5173fe60c50f2f23
dcterms:title: Robust Deep Learning Frameworks for Acoustic Scene and Respiratory Sound Classification
wdrs:describedby: n11:export_kar_RDFN3.n3 n16:export_kar_RDFN3.n3
dcterms:date: 2021-09
dcterms:creator: n19:ext-ldp7@kent.ac.uk
bibo:status: n15:published
bibo:abstract: Although research on Acoustic Scene Classification (ASC) is very close to, or even overshadowed by different popular research areas known as Automatic Speech Recognition (ASR), Speaker Recognition (SR) or Image Processing (IP), this field potentially opens up several distinct and meaningful application areas based on environment context detection. The challenges of ASC mainly come from different noise resources, various sounds in real-world environments, occurring as single sounds, continuous sounds or overlapping sounds. In comparison to speech, sound scenes are more challenging mainly due to their being unstructured in form and closely similar to noise in certain contexts. Although a wide range of publications have focused on ASC recently, they show task-specific ways that either explore certain aspects of an ASC system or are evaluated on limited acoustic scene datasets. Therefore, the aim of this thesis is to contribute to the development of a robust framework to be applied for ASC, evaluated on various recently published datasets, and to achieve competitive performance compared to the state-of-the-art systems. To do this, a baseline model is firstly introduced. Next, extensive experiments on the baseline are conducted to identify key factors affecting final classification accuracy. From the comprehensive analysis, a robust deep learning framework, namely the Encoder-Decoder structure, is proposed to address three main factors that directly affect an ASC system. These factors comprise low-level input features, high-level feature extraction methodologies, and architectures for final classification. Within the proposed framework, three spectrogram transformations, namely Constant Q Transform (CQT), gammatone filter (Gamma), and log-mel, are used to convert recorded audio signals into spectrogram representations that resemble two-dimensional images. These three spectrograms used are referred to as low-level input features. To extract high-level features from spectrograms, a novel Encoder architecture, based on Convolutional Neural Networks, is proposed. In terms of the Decoder, also referred as to the final classifier, various models such as Random Forest Classifier, Deep Neural Network and Mixture of Experts, are evaluated and structured to obtain the best performance. To further improve an ASC system's performance, a scheme of two-level hierarchical classification, replacing the role of Decoder classification recently mentioned, is proposed. This scheme is useful to transform an ASC task over all categories into multiple ASC sub-tasks, each spanning fewer categories, in a divide-and- conquer strategy. At the highest level of the proposed scheme, meta-categories of acoustic scene sounds showing similar characteristics are classified. Next, categories within each meta-category are classified at the second level. Furthermore, an analysis of loss functions applied to different classifiers is conducted. This analysis indicates that a combination of entropy loss and triplet loss is useful to enhance performance, especially with tasks that comprise fewer categories. Further exploring ASC in terms of potential application to the health services, this thesis also explores the 2017 Internal Conference on Biomedical Health Informatics (ICBHI) benchmark dataset of lung sounds. A deep-learning frame- work, based on our novel ASC approaches, is proposed to classify anomaly cycles and predict respiratory diseases. The results obtained from these experiments show exceptional performance. This highlights the potential applications of using advanced ASC frameworks for early detection of auditory signals. In this case, signs of respiratory diseases, which could potentially be highly useful in future in directing treatment and preventing their spread.
dcterms:isPartOf: n9:repository
dcterms:subject: n17:QA76
bibo:authorList: n21:authors
bibo:degree: n20:phd