Skip to content

Quality measurement library for LPMs/GCMs.

License

Notifications You must be signed in to change notification settings

SaferData/lpm-fidelity

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

lpm-fidelity

This repo was forked from here.

Disclaimer

This is pre-alpha software. We are currently testing it in real-world scenarios. In its present state, we discourage users from trying it.

Overview over fidelity component

A library for assessing fidelity between different Polars data frames. Fidelity refers to "[measures] that directly compare a synthetic dataset with a the real one. From a high-level perspective, fidelity is how well the synthetic data "statistically" matches the real data" (Jordan et al., 2022). schematic

Installation

This library is packaged with Poetry. Add this line to your pyproject.toml file:

lpm-fidelity = {git = "https://github.com/neeshjaa/lpm_fidelity.git", branch = "main"}

Usage

⚠️ this currently only works with categorical data frames files.

Using fidelity as a Python library

# Get dependencies.
import polars as pl

from lpm_fidelity.distances import bivariate_distances_in_data
from lpm_fidelity.distances import univariate_distances_in_data
from lpm_fidelity.two_sample_testing import univariate_two_sample_testing_in_data

# Read in two csv files.
df_foo = pl.read_csv("foo.csv")
df_bar = pl.read_csv("bar.csv")

# Compute univariate distance.
df_univariate_distance = univariate_distances_in_data(df_foo, df_bar, distance_metric="tvd")

# Compute bivariate distance.
df_bivariate_distance = bivariate_distances_in_data(df_foo, df_bar, distance_metric="tvd")

# Compute univariate two-sample hypothesis tests (currently only Chi^2).
df_univariate_two_sample_test = univariate_two_sample_testing_in_data(df_foo, df_bar)

Test

Tests can be run with Poetry

poetry run pytest tests/ -vvv

About

Quality measurement library for LPMs/GCMs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%