Not logged in : Login
(Sponging disallowed)

About: Investigating the Role of Simpson’s Paradox in the Analysis of Top-Ranked Features in High-Dimensional Bioinformatics Datasets     Goto   Sponge   NotDistinct   Permalink

An Entity of Type : bibo:AcademicArticle, within Data Space : linkeddata.uriburner.com:28898 associated with source document(s)

AttributesValues
type
seeAlso
sameAs
http://eprints.org/ontology/hasAccepted
http://eprints.org/ontology/hasDocument
dc:hasVersion
Title
  • Investigating the Role of Simpson’s Paradox in the Analysis of Top-Ranked Features in High-Dimensional Bioinformatics Datasets
described by
Date
  • 2019-01-09
Creator
status
Publisher
abstract
  • An important problem in bioinformatics consists of identifying the most important features (or predictors), among a large number of features in a given classification dataset. This problem is often addressed by using a machine learning-based feature ranking method to identify a small set of top-ranked predictors (i.e. the most relevant features for classification). The large number of studies in this area have, however, an important limitation: they ignore the possibility that the top-ranked predictors occur in an instance of Simpson’s paradox, where the positive or negative association between a predictor and a class variable reverses sign upon conditional on each of the values of a third (confounder) variable. In this work, we review and investigate the role of Simpson’s paradox in the analysis of top-ranked predictors in high-dimensional bioinformatics datasets, in order to avoid the potential danger of misinterpreting an association between a predictor and the class variable. We perform computational experiments using four well-known feature ranking methods from the machine learning field and five high-dimensional datasets of ageing-related genes, where the predictors are Gene Ontology terms. The results show that occurrences of Simpson’s paradox involving top-ranked predictors are much more common for one of the feature ranking methods.
Is Part Of
Subject
list of authors
issue
  • 2
volume
  • 21
is topic of
is primary topic of
Faceted Search & Find service v1.17_git151 as of Feb 20 2025


Alternative Linked Data Documents: iSPARQL | ODE     Content Formats:   [cxml] [csv]     RDF   [text] [turtle] [ld+json] [rdf+json] [rdf+xml]     ODATA   [atom+xml] [odata+json]     Microdata   [microdata+json] [html]    About   
This material is Open Knowledge   W3C Semantic Web Technology [RDF Data] Valid XHTML + RDFa
OpenLink Virtuoso version 08.03.3332 as of Jan 29 2025, on Linux (x86_64-generic-linux-glibc25), Single-Server Edition (378 GB total memory, 78 GB memory in use)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2025 OpenLink Software