API reference¶
In general, using help(symbol) is the recommended way to get the latest documentation. In addition, this page provides an overview of the various elements in this package.
Main symbols¶
qd_screen¶
def qd_screen(X: Union[pd.DataFrame, np.ndarray],
              absolute_eps: float = None,
              relative_eps: float = None,
              keep_stats: bool = False
              ) -> QDForest
Finds the (quasi-)deterministic relationships (functional dependencies) between the variables in X, and returns a QDForest object representing the forest of (quasi-)deterministic trees. This object can then be used to fit a feature selection model or to learn a Bayesian Network structure.
By default only deterministic relationships are detected. Quasi-determinism can be enabled by setting either an threshold on conditional entropies (absolute_eps) or on relative conditional entropies (relative_eps). Only one of them should be set.
By default (keep_stats=False) the entropies tables are not preserved once the forest has been created. If you wish to keep them available, set keep_stats=True. The entropies tables are then available in the <QDForest>.stats attribute, and threshold analysis methods such as <QDForest>.get_entropies_table(...) and <QDForest>.plot_increasing_entropies() become available.
Parameters:
- 
X: the dataset as a pandas DataFrame or a numpy array. Columns represent the features to compare. - 
absolute_eps: Absolute entropy threshold. Any featureYthat can be predicted from another featureXin a quasi-deterministic way, that is, where conditional entropyH(Y|X) <= absolute_eps, will be removed. The default value is0and corresponds to removing deterministic relationships only. - 
relative_eps: Relative entropy threshold. Any featureYthat can be predicted from another featureXin a quasi-deterministic way, that is, where relative conditional entropyH(Y|X)/H(Y) <= relative_eps(a value between0and1), will be removed. Only one ofabsolute_epsorrelative_epsshould be provided. - 
keep_stats: a boolean indicating if the various entropies tables computed in the process should be kept in memory in the resulting forest object (<QDForest>.stats), for further analysis. By default this isFalse. 
QDForest¶
TODO
QDSelectorModel¶
TODO
Entropies¶
TODO