Pydoc: module structure_learning.data.data

structure_learning.data.data

index
/Users/165421/Documents/code/structure_learning/src/structure_learning/data/data.py

This module provides classes and utilities for handling datasets and data structures.

Classes:
    LazyDataset: A dictionary-like class that lazily loads datasets from files.
    Data: A class for managing and analyzing datasets with support for variable types and transformations.

Dependencies:
    - numpy
    - pandas
    - sklearn.preprocessing.StandardScaler
    - sklearn.model_selection.KFold
    - pathlib.Path

Modules

numpy
os
pandas

Classes



builtins.object

Data

collections.UserDict(collections.abc.MutableMapping)

LazyDataset

class Data(builtins.object)

    Data(values: Union[numpy.ndarray, pandas.core.frame.DataFrame], variables: List = None, variable_types: Dict = None)

Methods defined here:

__copy__(self)
Create a copy of the Data object.

Returns:
    Data: A copy of the Data object.

__getitem__(self, *args)
Retrieve data for the specified key(s).

Parameters:
    *args: Key(s) to retrieve data for.

Returns:
    Any: Data corresponding to the specified key(s).

__init__(self, values: Union[numpy.ndarray, pandas.core.frame.DataFrame], variables: List = None, variable_types: Dict = None)
Initialize the Data object with dataset values, variable names, and types.

Parameters:
    values (Union[np.ndarray, pd.DataFrame]): Dataset values.
    variables (List, optional): List of variable names. Required if values is a numpy array.
    variable_types (Dict, optional): Dictionary mapping variable names to their types.

Raises:
    Exception: If variable names are not supplied when values is a numpy array.

__len__(self)
Get the number of rows in the dataset.

Returns:
    int: Number of rows in the dataset.

k_fold(self, k=5, shuffle=True, seed=None)
Perform k-fold splitting of the dataset.

Parameters:
    k (int): Number of folds. Default is 5.
    shuffle (bool): Whether to shuffle the data before splitting. Default is True.
    seed (int, optional): Random seed for reproducibility.

Yields:
    Tuple[Data, Data]: Training and testing datasets for each fold.

min_max_scale(self, variables: List = None)
Scale the specified variables in the dataset to a range of [0, 1].

Parameters:
    variables (List, optional): List of variable names to scale. Defaults to all continuous variables.

Returns:
    Data: A new Data object with scaled variables.

normalise(self, variables: List = None)
Normalize the specified variables in the dataset.

Parameters:
    variables (List, optional): List of variable names to normalize. Defaults to all continuous variables.

Returns:
    Data: A new Data object with normalized variables.

Readonly properties defined here:

columns

Get the list of variable names.

Returns:
    List: List of variable names.

shape

Get the shape of the dataset.

Returns:
    Tuple[int, int]: Shape of the dataset (rows, columns).

Data descriptors defined here:

__dict__

dictionary for instance variables

__weakref__

list of weak references to the object

Data and other attributes defined here:

BINARY_TYPE = 'binary'

CATEGORICAL_TYPE = 'categorical'

CONTINUOUS_TYPE = 'continuous'

ORDINAL_TYPE = 'ordinal'

class LazyDataset(collections.UserDict)

    LazyDataset(dict=None, /, **kwargs)

Method resolution order:

LazyDataset

collections.UserDict

collections.abc.MutableMapping

collections.abc.Mapping

collections.abc.Collection

collections.abc.Sized

collections.abc.Iterable

collections.abc.Container

builtins.object

Methods defined here:

__getitem__(self, key: Any) -> Any
Retrieve the value associated with the given key. If the value is a file path, it will be loaded as a pandas DataFrame.

Parameters:
    key (Any): Key to retrieve the value for.

Returns:
    Any: The value associated with the key, or a pandas DataFrame if the value is a file path.

__setitem__(self, key: Any, item: Any) -> None

Data and other attributes defined here:

__abstractmethods__ = frozenset()

Methods inherited from collections.UserDict:

__contains__(self, key)
# Modify __contains__ and get() to work like dict
# does when __missing__ is present.

__copy__(self)

__delitem__(self, key)

__init__(self, dict=None, /, **kwargs)
Initialize self.  See help(type(self)) for accurate signature.

__ior__(self, other)

__iter__(self)

__len__(self)

__or__(self, other)
Return self|value.

__repr__(self)
Return repr(self).

__ror__(self, other)
Return value|self.

copy(self)

get(self, key, default=None)
D.get(k[,d]) -> D[k] if k in D, else d.  d defaults to None.

Class methods inherited from collections.UserDict:

fromkeys(iterable, value=None)

Data descriptors inherited from collections.UserDict:

__dict__

dictionary for instance variables

__weakref__

list of weak references to the object

Methods inherited from collections.abc.MutableMapping:

clear(self)
D.clear() -> None.  Remove all items from D.

pop(self, key, default=<object object at 0x102c881c0>)
D.pop(k[,d]) -> v, remove specified key and return the corresponding value.
If key is not found, d is returned if given, otherwise KeyError is raised.

popitem(self)
D.popitem() -> (k, v), remove and return some (key, value) pair
as a 2-tuple; but raise KeyError if D is empty.

setdefault(self, key, default=None)
D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D

update(self, other=(), /, **kwds)
D.update([E, ]**F) -> None.  Update D from mapping/iterable E and F.
If E present and has a .keys() method, does:     for k in E: D[k] = E[k]
If E present and lacks .keys() method, does:     for (k, v) in E: D[k] = v
In either case, this is followed by: for k, v in F.items(): D[k] = v

Methods inherited from collections.abc.Mapping:

__eq__(self, other)
Return self==value.

items(self)
D.items() -> a set-like object providing a view on D's items

keys(self)
D.keys() -> a set-like object providing a view on D's keys

values(self)
D.values() -> an object providing a view on D's values

Data and other attributes inherited from collections.abc.Mapping:

__hash__ = None

__reversed__ = None

Class methods inherited from collections.abc.Collection:

__subclasshook__(C)
Abstract classes can override this to customize issubclass().

This is invoked early on by abc.ABCMeta.__subclasscheck__().
It should return True, False or NotImplemented.  If it returns
NotImplemented, the normal algorithm is used.  Otherwise, it
overrides the normal algorithm (and the outcome is cached).

Class methods inherited from collections.abc.Iterable:

__class_getitem__ = GenericAlias(...)
Represent a PEP 585 generic type

E.g. for t = list[int], t.__origin__ is list and t.__args__ is (int,).

Data

Dict = typing.Dict
List = typing.List
Union = typing.Union
datasets = {'sachs': 'datafiles/sachs/1. cd3cd28.xls'}
path = PosixPath('/Users/165421/Documents/code/structure_learning/src/structure_learning/data')

Data
		Dict = typing.Dict List = typing.List Union = typing.Union datasets = {'sachs': 'datafiles/sachs/1. cd3cd28.xls'} path = PosixPath('/Users/165421/Documents/code/structure_learning/src/structure_learning/data')