Pydoc: module structure_learning.distributions.distribution

structure_learning.distributions.distribution

index
/Users/165421/Documents/code/structure_learning/src/structure_learning/distributions/distribution.py

This module defines classes for representing and manipulating distributions, including MCMC-based distributions and OPAD re-weighting mechanisms.

Classes:
    Distribution: Base class for distributions, providing methods for particle management, normalization, and visualization.
    MCMCDistribution: Extends Distribution to support MCMC-specific operations, including rejected particle handling.
    OPAD: Implements the OPAD re-weighting mechanism for MCMC distributions.

The module also includes utility methods for computing distributions from data and scores, as well as normalization factors.

Modules

numpy
pandas
matplotlib.pyplot
seaborn

Classes



builtins.object

Distribution

MCMCDistribution

OPAD

TrueDistribution

class Distribution(builtins.object)

    Distribution(particles: Iterable = [], logp: Iterable = [], theta: Dict = None)

Base class for distributions.

Methods defined here:

__add__(self, other: Type[ForwardRef('D')]) -> Type[ForwardRef('D')]
Add two distributions together by combining their particles and frequencies.

Parameters:
    other (Distribution): The distribution to add.

Returns:
    Distribution: The resulting distribution after addition.

__contains__(self, particle)
Checks if particle is in the distribution.

__copy__(self)
Create a copy of the current distribution.

Returns:
    Distribution: A copy of the distribution.

__init__(self, particles: Iterable = [], logp: Iterable = [], theta: Dict = None)
Initialise distribution. Particles are stored internally as a dictionary to store their information.

Parameters:
    particles (Iterable):       List of particles to add in the distribution
    logp (Iterable):            Scores (log probabilities) of the particles
    theta (Dict):               Additional particles data

__len__(self)
Get the number of particles in the distribution.

Returns:
    int: The number of particles.

__sub__(self, other: Type[ForwardRef('D')]) -> Type[ForwardRef('D')]
Subtract one distribution from another by removing particles present in the other distribution.

Parameters:
    other (Distribution): The distribution to subtract.

Returns:
    Distribution: The resulting distribution after subtraction.

clear(self)
Clear all particles from the distribution.

copy(self)

hist(self, prop='freq', normalise=False)
Returns the histogram of particles.

Parameters:
    normalise (bool):       If True, return normalised counts.

normalise(self, prop='freq', log=False)
Normalise the current set of particles in the distribution.

plot(self, prop='freq', sort=True, normalise=False, limit=-1, ax=None, showxticklabels=False)
Plot a histogram of the particles in the distribution.

Parameters:
    prop (str): The property to plot (default is 'freq').
    sort (bool): Whether to sort the particles by the property values.
    normalise (bool): Whether to normalise the property values.
    limit (int): The maximum number of particles to display (default is -1, which shows all).
    ax (matplotlib.axes.Axes): The axes to plot on (default is None).

Returns:
    tuple: Bars, particles, and counts displayed in the plot.

prop(self, name)
Return data about particles.

Parameter:
    name:           Key name for the particle data

Returns:
    (list)          particle data

save(self, filename: str, compression='gzip')
Saves the Graph object to a file.

Parameters:
    filename (str): Path to the output file.

top(self, prop='freq', n=1)
Retrieve the top N particles based on a specified property.

Parameters:
    prop (str): The property to sort by (default is 'freq').
    n (int): The number of top particles to retrieve.

Returns:
    np.ndarray: Array of the top N particles.

update(self, particle, data, **kwargs)
Adds particle to distribution. If particle already exists, update its data.

Parameter:
    particle (Hashable):    Particle to add
    data (dict):            Particle data

Class methods defined here:

compute_distribution(data: pandas.core.frame.DataFrame, score: structure_learning.scores.score.Score, graph_type='dag', blocklist: numpy.ndarray = None) -> Type[ForwardRef('D')]
Compute a distribution from data and a scoring function.

Parameters:
    data (pd.DataFrame): The dataset to compute the distribution from.
    score (Score): The scoring function to evaluate particles.
    graph_type (str): The type of graph to use ('dag' or 'cpdag').

Returns:
    Distribution: The computed distribution.

load(filename: str, compression='gzip')
Loads a Graph object from a file.

Parameters:
    filename (str): Path to the input file.

Returns:
    Graph: Loaded Graph object.

plot_multiple(dists: List[Type[ForwardRef('D')]], prop='freq', normalise=False, limit=-1, ax=None, labels=None)
Plot multiple distributions on the same axes.

Parameters:
    dists (List[Distribution]): List of distributions to plot.
    prop (str): The property to plot (default is 'freq').
    sort (bool): Whether to sort the particles by the property values.
    normalise (bool): Whether to normalise the property values.
    limit (int): The maximum number of particles to display (default is -1, which shows all).
    ax (matplotlib.axes.Axes): The axes to plot on (default is None).

Returns:
    list: List of bar containers for each distribution.

Readonly properties defined here:

logp

Retrieve the log probabilities of all particles in the distribution.

Returns:
    np.ndarray: Array of log probabilities.

Data descriptors defined here:

__dict__

dictionary for instance variables

__weakref__

list of weak references to the object

class MCMCDistribution(Distribution)

    MCMCDistribution(particles: Iterable = [], logp: Iterable = [], theta: Dict = None, keep_rejected: bool = True)

Method resolution order:

MCMCDistribution

Distribution

builtins.object

Methods defined here:

__copy__(self)
Create a shallow copy of the current distribution.

Returns:
    Distribution: A shallow copy of the distribution.

__init__(self, particles: Iterable = [], logp: Iterable = [], theta: Dict = None, keep_rejected: bool = True)
Initialise MCMC distribution.

Parameters:
    particles (Iterable):       List of particles to add in the distribution
    logp (Iterable):            Scores (log probabilities) of the particles
    theta (Dict):               Additional particles data
    keep_rejected (bool):       Store rejected particles

to_iterates(self)
Convert an MCMCDistribution to iteration data.

Returns:
    dict: A dictionary where keys are iteration numbers and values are data about the particles.

to_opad(self, plus=False)
Convert the current MCMC distribution to an OPAD distribution.

Parameters:
    plus (bool): If True, include rejected particles in the OPAD distribution.

Returns:
    OPAD: The OPAD distribution.

update(self, particle, data, iteration, **kwargs)
Adds particle to distribution. If particle already exists, update its data.

Parameter:
    particle (Hashable):    Particle to add
    iteration (int):        Iteration number at which the particle was generated.
    data (dict):            Particle data

Class methods defined here:

from_iterates(iterates: dict)
Create an MCMCDistribution from iteration data.

Parameters:
    iterates (dict): A dictionary where keys are iteration numbers and values are data about the particles.

Returns:
    MCMCDistribution: The resulting MCMC distribution.

Methods inherited from Distribution:

__add__(self, other: Type[ForwardRef('D')]) -> Type[ForwardRef('D')]
Add two distributions together by combining their particles and frequencies.

Parameters:
    other (Distribution): The distribution to add.

Returns:
    Distribution: The resulting distribution after addition.

__contains__(self, particle)
Checks if particle is in the distribution.

__len__(self)
Get the number of particles in the distribution.

Returns:
    int: The number of particles.

__sub__(self, other: Type[ForwardRef('D')]) -> Type[ForwardRef('D')]
Subtract one distribution from another by removing particles present in the other distribution.

Parameters:
    other (Distribution): The distribution to subtract.

Returns:
    Distribution: The resulting distribution after subtraction.

clear(self)
Clear all particles from the distribution.

copy(self)

hist(self, prop='freq', normalise=False)
Returns the histogram of particles.

Parameters:
    normalise (bool):       If True, return normalised counts.

normalise(self, prop='freq', log=False)
Normalise the current set of particles in the distribution.

plot(self, prop='freq', sort=True, normalise=False, limit=-1, ax=None, showxticklabels=False)
Plot a histogram of the particles in the distribution.

Parameters:
    prop (str): The property to plot (default is 'freq').
    sort (bool): Whether to sort the particles by the property values.
    normalise (bool): Whether to normalise the property values.
    limit (int): The maximum number of particles to display (default is -1, which shows all).
    ax (matplotlib.axes.Axes): The axes to plot on (default is None).

Returns:
    tuple: Bars, particles, and counts displayed in the plot.

prop(self, name)
Return data about particles.

Parameter:
    name:           Key name for the particle data

Returns:
    (list)          particle data

save(self, filename: str, compression='gzip')
Saves the Graph object to a file.

Parameters:
    filename (str): Path to the output file.

top(self, prop='freq', n=1)
Retrieve the top N particles based on a specified property.

Parameters:
    prop (str): The property to sort by (default is 'freq').
    n (int): The number of top particles to retrieve.

Returns:
    np.ndarray: Array of the top N particles.

Class methods inherited from Distribution:

compute_distribution(data: pandas.core.frame.DataFrame, score: structure_learning.scores.score.Score, graph_type='dag', blocklist: numpy.ndarray = None) -> Type[ForwardRef('D')]
Compute a distribution from data and a scoring function.

Parameters:
    data (pd.DataFrame): The dataset to compute the distribution from.
    score (Score): The scoring function to evaluate particles.
    graph_type (str): The type of graph to use ('dag' or 'cpdag').

Returns:
    Distribution: The computed distribution.

load(filename: str, compression='gzip')
Loads a Graph object from a file.

Parameters:
    filename (str): Path to the input file.

Returns:
    Graph: Loaded Graph object.

plot_multiple(dists: List[Type[ForwardRef('D')]], prop='freq', normalise=False, limit=-1, ax=None, labels=None)
Plot multiple distributions on the same axes.

Parameters:
    dists (List[Distribution]): List of distributions to plot.
    prop (str): The property to plot (default is 'freq').
    sort (bool): Whether to sort the particles by the property values.
    normalise (bool): Whether to normalise the property values.
    limit (int): The maximum number of particles to display (default is -1, which shows all).
    ax (matplotlib.axes.Axes): The axes to plot on (default is None).

Returns:
    list: List of bar containers for each distribution.

Readonly properties inherited from Distribution:

logp

Retrieve the log probabilities of all particles in the distribution.

Returns:
    np.ndarray: Array of log probabilities.

Data descriptors inherited from Distribution:

__dict__

dictionary for instance variables

__weakref__

list of weak references to the object

class OPAD(MCMCDistribution)

    OPAD(particles: Iterable = [], logp: Iterable = [], theta: Dict = [], plus=False)

This class implements the OPAD re-weighing mechanism described in

Method resolution order:

OPAD

MCMCDistribution

Distribution

builtins.object

Methods defined here:

__copy__(self)
Create a copy of the current distribution.

Returns:
    Distribution: A copy of the distribution.

__init__(self, particles: Iterable = [], logp: Iterable = [], theta: Dict = [], plus=False)
Initialise MCMC distribution.

Parameters:
    particles (Iterable):       List of particles to add in the distribution
    logp (Iterable):            Scores (log probabilities) of the particles
    theta (Dict):               Additional particles data
    keep_rejected (bool):       Store rejected particles

normalise(self)
Normalise the OPAD distribution by adding rejected particles and computing probabilities.

Returns:
    OPAD: The normalised OPAD distribution.

plot(self, prop='p', sort=True, normalise=False, limit=-1)
Plot a histogram of the particles in the OPAD distribution.

Parameters:
    prop (str): The property to plot (default is 'p').
    sort (bool): Whether to sort the particles by the property values.
    normalise (bool): Whether to normalise the property values.
    limit (int): The maximum number of particles to display (default is -1, which shows all).

Returns:
    tuple: Bars, particles, and counts displayed in the plot.

to_iterates(self)
Convert an MCMCDistribution to iteration data.

Returns:
    dict: A dictionary where keys are iteration numbers and values are data about the particles.

update(self, particle, data, iteration, normalise=True)
Add new particles to the OPAD distribution and optionally renormalise.

Parameters:
    particle (Hashable): The particle to add.
    iteration (int): The iteration number at which the particle was generated.
    data (dict): Data associated with the particle.
    normalise (bool): If True, renormalise the distribution after adding the particle.

Class methods defined here:

compute_normalisation(logp: Union[List, numpy.ndarray], return_constants=True)
Compute the normalisation factor given the log scores.

Parameters:
    logp (list | np.ndarray):   The log scores
    return_constants (bool):    If True, also returns log(Z) and max score.

Returns:
    (np.array):                          Normalised scores
    (float):                             Normalisation factor
    (np.array):                          Maximum score

from_mcmc(dist: structure_learning.distributions.distribution.Distribution, plus=False)
Create an OPAD distribution from an MCMC distribution.

Parameters:
    dist (Distribution): The MCMC distribution to convert.
    plus (bool): Whether to include rejected particles in the OPAD distribution.

Returns:
    OPAD: The resulting OPAD distribution.

Methods inherited from MCMCDistribution:

to_opad(self, plus=False)
Convert the current MCMC distribution to an OPAD distribution.

Parameters:
    plus (bool): If True, include rejected particles in the OPAD distribution.

Returns:
    OPAD: The OPAD distribution.

Class methods inherited from MCMCDistribution:

from_iterates(iterates: dict)
Create an MCMCDistribution from iteration data.

Parameters:
    iterates (dict): A dictionary where keys are iteration numbers and values are data about the particles.

Returns:
    MCMCDistribution: The resulting MCMC distribution.

Methods inherited from Distribution:

__add__(self, other: Type[ForwardRef('D')]) -> Type[ForwardRef('D')]
Add two distributions together by combining their particles and frequencies.

Parameters:
    other (Distribution): The distribution to add.

Returns:
    Distribution: The resulting distribution after addition.

__contains__(self, particle)
Checks if particle is in the distribution.

__len__(self)
Get the number of particles in the distribution.

Returns:
    int: The number of particles.

__sub__(self, other: Type[ForwardRef('D')]) -> Type[ForwardRef('D')]
Subtract one distribution from another by removing particles present in the other distribution.

Parameters:
    other (Distribution): The distribution to subtract.

Returns:
    Distribution: The resulting distribution after subtraction.

clear(self)
Clear all particles from the distribution.

copy(self)

hist(self, prop='freq', normalise=False)
Returns the histogram of particles.

Parameters:
    normalise (bool):       If True, return normalised counts.

prop(self, name)
Return data about particles.

Parameter:
    name:           Key name for the particle data

Returns:
    (list)          particle data

save(self, filename: str, compression='gzip')
Saves the Graph object to a file.

Parameters:
    filename (str): Path to the output file.

top(self, prop='freq', n=1)
Retrieve the top N particles based on a specified property.

Parameters:
    prop (str): The property to sort by (default is 'freq').
    n (int): The number of top particles to retrieve.

Returns:
    np.ndarray: Array of the top N particles.

Class methods inherited from Distribution:

compute_distribution(data: pandas.core.frame.DataFrame, score: structure_learning.scores.score.Score, graph_type='dag', blocklist: numpy.ndarray = None) -> Type[ForwardRef('D')]
Compute a distribution from data and a scoring function.

Parameters:
    data (pd.DataFrame): The dataset to compute the distribution from.
    score (Score): The scoring function to evaluate particles.
    graph_type (str): The type of graph to use ('dag' or 'cpdag').

Returns:
    Distribution: The computed distribution.

load(filename: str, compression='gzip')
Loads a Graph object from a file.

Parameters:
    filename (str): Path to the input file.

Returns:
    Graph: Loaded Graph object.

plot_multiple(dists: List[Type[ForwardRef('D')]], prop='freq', normalise=False, limit=-1, ax=None, labels=None)
Plot multiple distributions on the same axes.

Parameters:
    dists (List[Distribution]): List of distributions to plot.
    prop (str): The property to plot (default is 'freq').
    sort (bool): Whether to sort the particles by the property values.
    normalise (bool): Whether to normalise the property values.
    limit (int): The maximum number of particles to display (default is -1, which shows all).
    ax (matplotlib.axes.Axes): The axes to plot on (default is None).

Returns:
    list: List of bar containers for each distribution.

Readonly properties inherited from Distribution:

logp

Retrieve the log probabilities of all particles in the distribution.

Returns:
    np.ndarray: Array of log probabilities.

Data descriptors inherited from Distribution:

__dict__

dictionary for instance variables

__weakref__

list of weak references to the object

class TrueDistribution(Distribution)

    TrueDistribution(particles=[], logp=[], theta=None)

Method resolution order:

TrueDistribution

Distribution

builtins.object

Methods defined here:

__init__(self, particles=[], logp=[], theta=None)
Initialise distribution. Particles are stored internally as a dictionary to store their information.

Parameters:
    particles (Iterable):       List of particles to add in the distribution
    logp (Iterable):            Scores (log probabilities) of the particles
    theta (Dict):               Additional particles data

normalise(self)
Normalise the current set of particles in the distribution.

Methods inherited from Distribution:

__add__(self, other: Type[ForwardRef('D')]) -> Type[ForwardRef('D')]
Add two distributions together by combining their particles and frequencies.

Parameters:
    other (Distribution): The distribution to add.

Returns:
    Distribution: The resulting distribution after addition.

__contains__(self, particle)
Checks if particle is in the distribution.

__copy__(self)
Create a copy of the current distribution.

Returns:
    Distribution: A copy of the distribution.

__len__(self)
Get the number of particles in the distribution.

Returns:
    int: The number of particles.

__sub__(self, other: Type[ForwardRef('D')]) -> Type[ForwardRef('D')]
Subtract one distribution from another by removing particles present in the other distribution.

Parameters:
    other (Distribution): The distribution to subtract.

Returns:
    Distribution: The resulting distribution after subtraction.

clear(self)
Clear all particles from the distribution.

copy(self)

hist(self, prop='freq', normalise=False)
Returns the histogram of particles.

Parameters:
    normalise (bool):       If True, return normalised counts.

plot(self, prop='freq', sort=True, normalise=False, limit=-1, ax=None, showxticklabels=False)
Plot a histogram of the particles in the distribution.

Parameters:
    prop (str): The property to plot (default is 'freq').
    sort (bool): Whether to sort the particles by the property values.
    normalise (bool): Whether to normalise the property values.
    limit (int): The maximum number of particles to display (default is -1, which shows all).
    ax (matplotlib.axes.Axes): The axes to plot on (default is None).

Returns:
    tuple: Bars, particles, and counts displayed in the plot.

prop(self, name)
Return data about particles.

Parameter:
    name:           Key name for the particle data

Returns:
    (list)          particle data

save(self, filename: str, compression='gzip')
Saves the Graph object to a file.

Parameters:
    filename (str): Path to the output file.

top(self, prop='freq', n=1)
Retrieve the top N particles based on a specified property.

Parameters:
    prop (str): The property to sort by (default is 'freq').
    n (int): The number of top particles to retrieve.

Returns:
    np.ndarray: Array of the top N particles.

update(self, particle, data, **kwargs)
Adds particle to distribution. If particle already exists, update its data.

Parameter:
    particle (Hashable):    Particle to add
    data (dict):            Particle data

Class methods inherited from Distribution:

compute_distribution(data: pandas.core.frame.DataFrame, score: structure_learning.scores.score.Score, graph_type='dag', blocklist: numpy.ndarray = None) -> Type[ForwardRef('D')]
Compute a distribution from data and a scoring function.

Parameters:
    data (pd.DataFrame): The dataset to compute the distribution from.
    score (Score): The scoring function to evaluate particles.
    graph_type (str): The type of graph to use ('dag' or 'cpdag').

Returns:
    Distribution: The computed distribution.

load(filename: str, compression='gzip')
Loads a Graph object from a file.

Parameters:
    filename (str): Path to the input file.

Returns:
    Graph: Loaded Graph object.

plot_multiple(dists: List[Type[ForwardRef('D')]], prop='freq', normalise=False, limit=-1, ax=None, labels=None)
Plot multiple distributions on the same axes.

Parameters:
    dists (List[Distribution]): List of distributions to plot.
    prop (str): The property to plot (default is 'freq').
    sort (bool): Whether to sort the particles by the property values.
    normalise (bool): Whether to normalise the property values.
    limit (int): The maximum number of particles to display (default is -1, which shows all).
    ax (matplotlib.axes.Axes): The axes to plot on (default is None).

Returns:
    list: List of bar containers for each distribution.

Readonly properties inherited from Distribution:

logp

Retrieve the log probabilities of all particles in the distribution.

Returns:
    np.ndarray: Array of log probabilities.

Data descriptors inherited from Distribution:

__dict__

dictionary for instance variables

__weakref__

list of weak references to the object

Data

D = ~Distribution
Dict = typing.Dict
Iterable = typing.Iterable
List = typing.List
Type = typing.Type
Union = typing.Union

Data
		D = ~Distribution Dict = typing.Dict Iterable = typing.Iterable List = typing.List Type = typing.Type Union = typing.Union