This HTML5 document contains 31 embedded RDF statements represented using HTML+Microdata notation.

The embedded RDF content will be recognized by any processor of HTML5 Microdata.

Namespace Prefixes

PrefixIRI
n21doi:10.1007/
dctermshttp://purl.org/dc/terms/
n2https://kar.kent.ac.uk/id/eprint/
n16https://kar.kent.ac.uk/id/eprint/66126#
n19https://kar.kent.ac.uk/66126/
wdrshttp://www.w3.org/2007/05/powder-s#
dchttp://purl.org/dc/elements/1.1/
n7http://purl.org/ontology/bibo/status/
rdfshttp://www.w3.org/2000/01/rdf-schema#
n10https://kar.kent.ac.uk/id/subject/
n14https://demo.openlinksw.com/about/id/entity/https/raw.githubusercontent.com/annajordanous/CO644Files/main/
n3http://eprints.org/ontology/
bibohttp://purl.org/ontology/bibo/
n11https://kar.kent.ac.uk/id/publication/
n17https://kar.kent.ac.uk/id/org/
rdfhttp://www.w3.org/1999/02/22-rdf-syntax-ns#
owlhttp://www.w3.org/2002/07/owl#
n4https://kar.kent.ac.uk/id/document/
n15https://kar.kent.ac.uk/id/
xsdhhttp://www.w3.org/2001/XMLSchema#
n13https://demo.openlinksw.com/about/id/entity/https/www.cs.kent.ac.uk/people/staff/akj22/materials/CO644/
n9https://kar.kent.ac.uk/id/person/

Statements

Subject Item
n2:66126
rdf:type
bibo:AcademicArticle bibo:Article n3:EPrint n3:ArticleEPrint
rdfs:seeAlso
n19:
owl:sameAs
n21:s00034-018-0798-4
n3:hasAccepted
n4:742792
n3:hasDocument
n4:1609412 n4:742792 n4:742841 n4:1609409 n4:1609410 n4:1609411
dc:hasVersion
n4:742792
dcterms:title
A Conditional Generative Model for Speech Enhancement
wdrs:describedby
n13:export_kar_RDFN3.n3 n14:export_kar_RDFN3.n3
dcterms:date
2018-03-13
dcterms:creator
n9:ext-5357dabc2d03ab77a731d661ac13157b n9:ext-f3123c2607d1074027fa2f517589667b n9:ext-b96a10152520f3023473c03607c96195 n9:ext-i.v.mcloughlin@kent.ac.uk
bibo:status
n7:peerReviewed n7:published
dcterms:publisher
n17:ext-1c5ddec173ca8cdfba8b274309638579
bibo:abstract
Deep learning based speech enhancement approaches like Deep Neural Networks (DNN) and Long-Short Term Memory (LSTM) have already demonstrated superior results to classical methods. However these methods do not take full advantage of temporal context information. While DNN and LSTM consider temporal context in the noisy source speech, it does not do so for the estimated clean speech. Both DNN and LSTM also have a tendency to over-smooth spectra, which causes the enhanced speech to sound muffled. This paper proposes a novel architecture to address both issues, which we term a conditional generative model (CGM). By adopting an adversarial training scheme applied to a generator of deep dilated convolutional layers, CGM is designed to model the joint and symmetric conditions of both noisy and estimated clean spectra.We evaluate CGM against both DNN and LSTM in terms of Perceptual Evaluation of Speech Quality (PESQ) and Short-Time Objective Intelligibility (STOI) on TIMIT sentences corrupted by ITU-T P.501 and NOISEX-92 noise in a range of matched and mismatched noise conditions. Results show that both the CGM architecture and the adversarial training mechanism lead to better PESQ and STOI in all tested noise conditions. In addition to yielding significant improvements in PESQ and STOI, CGM and adversarial training both mitigate against over-smoothing.
dcterms:isPartOf
n11:ext-0278081X n15:repository
dcterms:subject
n10:T
bibo:authorList
n16:authors
bibo:volume
37