qdscreen¶
Remove redundancy in your categorical variables and increase your models performance.
qdscreen
provides a python implementation of the Quasi-determinism screening algorithm (also known as qds-BNSL
) from T.Rahier's PhD thesis, 2018.
Most data scientists are familiar with the concept of correlation between continuous variables. This concept extends to categorical variables, and is known as functional dependency in the field of relational databases mining. We also name it determinism in the context of Machine Learning and Statistics, to indicate that when a random variable X
is known then the value of another variable Y
is determined with absolute certainty. "Quasi-"determinism is an extension of this concept to handle noise or extremely rare cases in data.
qdscreen
is able to detect and remove (quasi-)deterministic relationships in a dataset:
-
either as a preprocessing step in any general-purpose data science pipeline
-
or as an accelerator of a Bayesian Network Structure Learning method such as
pyGOBN
Installing¶
> pip install qdscreen
Usage¶
1. Remove correlated variables¶
See this example.
2. Learn a Bayesian Network structure¶
TODO see #6.
Main features / benefits¶
-
A feature selection algorithm able to eliminate quasi-deterministic relationships
- a base version compliant with numpy and pandas datasets
- a scikit-learn compliant version (numpy only)
-
An accelerator for Bayesian Network Structure Learning tasks
See Also¶
-
Bayesian Network libraries in python:
pyGOBN
(MIT license)pgmpy
(MIT license)pomegranate
(MIT license)bayespy
(MIT license)
-
Functional dependencies libraries in python:
fd_miner
, an algorithm that was used in this paper. The repository contains a list of reference datasets too.FDTool
a python 2 algorithm to mine for functional dependencies, equivalences and candidate keys. From this paper.functional-dependencies
functional-dependency-finder
connects to a MySQL db and finds functional dependencies.
-
Other libs for probabilistic inference:
-
Stackoverflow discussions:
Others¶
Do you like this library ? You might also like smarie's other python libraries
Want to contribute ?¶
Details on the github page: https://github.com/python-qds/qdscreen