This HTML5 document contains 29 embedded RDF statements represented using HTML+Microdata notation.

The embedded RDF content will be recognized by any processor of HTML5 Microdata.

Namespace Prefixes

PrefixIRI
dctermshttp://purl.org/dc/terms/
n14doi:10.1093/bib/
n2https://kar.kent.ac.uk/id/eprint/
n9https://kar.kent.ac.uk/72582/
wdrshttp://www.w3.org/2007/05/powder-s#
n20http://purl.org/ontology/bibo/status/
dchttp://purl.org/dc/elements/1.1/
n18https://kar.kent.ac.uk/id/subject/
rdfshttp://www.w3.org/2000/01/rdf-schema#
n19https://demo.openlinksw.com/about/id/entity/https/raw.githubusercontent.com/annajordanous/CO644Files/main/
n7http://eprints.org/ontology/
bibohttp://purl.org/ontology/bibo/
n21https://kar.kent.ac.uk/id/publication/
n16https://kar.kent.ac.uk/id/eprint/72582#
n6https://kar.kent.ac.uk/id/org/
rdfhttp://www.w3.org/1999/02/22-rdf-syntax-ns#
owlhttp://www.w3.org/2002/07/owl#
n5https://kar.kent.ac.uk/id/document/
n12https://kar.kent.ac.uk/id/
xsdhhttp://www.w3.org/2001/XMLSchema#
n11https://demo.openlinksw.com/about/id/entity/https/www.cs.kent.ac.uk/people/staff/akj22/materials/CO644/
n17https://kar.kent.ac.uk/id/person/

Statements

Subject Item
n2:72582
rdf:type
n7:EPrint n7:ArticleEPrint bibo:Article bibo:AcademicArticle
rdfs:seeAlso
n9:
owl:sameAs
n14:bby126
n7:hasAccepted
n5:3169374
n7:hasDocument
n5:3169379 n5:3169380 n5:3169381 n5:3169382 n5:3169374 n5:3169383
dc:hasVersion
n5:3169374
dcterms:title
Investigating the Role of Simpson’s Paradox in the Analysis of Top-Ranked Features in High-Dimensional Bioinformatics Datasets
wdrs:describedby
n11:export_kar_RDFN3.n3 n19:export_kar_RDFN3.n3
dcterms:date
2019-01-09
dcterms:creator
n17:ext-a.a.freitas@kent.ac.uk
bibo:status
n20:peerReviewed n20:published
dcterms:publisher
n6:ext-ffae441f908983694f410e3721f2491d
bibo:abstract
An important problem in bioinformatics consists of identifying the most important features (or predictors), among a large number of features in a given classification dataset. This problem is often addressed by using a machine learning-based feature ranking method to identify a small set of top-ranked predictors (i.e. the most relevant features for classification). The large number of studies in this area have, however, an important limitation: they ignore the possibility that the top-ranked predictors occur in an instance of Simpson’s paradox, where the positive or negative association between a predictor and a class variable reverses sign upon conditional on each of the values of a third (confounder) variable. In this work, we review and investigate the role of Simpson’s paradox in the analysis of top-ranked predictors in high-dimensional bioinformatics datasets, in order to avoid the potential danger of misinterpreting an association between a predictor and the class variable. We perform computational experiments using four well-known feature ranking methods from the machine learning field and five high-dimensional datasets of ageing-related genes, where the predictors are Gene Ontology terms. The results show that occurrences of Simpson’s paradox involving top-ranked predictors are much more common for one of the feature ranking methods.
dcterms:isPartOf
n12:repository n21:ext-a7e614c9e721cb162ff2ba310f59827c
dcterms:subject
n18:Q335
bibo:authorList
n16:authors
bibo:issue
2
bibo:volume
21