graph2mat.core.data.processing

Core of the data processing.

Managing sparse matrix data in conjunction with graphs is not trivial:

Matrices are sparse.
Matrices are in a basis which is centered around the points in the graph. Therefore elements of the matrix correspond to nodes or edges of the graph.
Each point might have more than one basis function, therefore the matrix is divided in blocks (not just single elements) that correspond to nodes or edges of the graph.
Different point types might have different basis size, which makes the different blocks in the matrix have different shapes.
The different block sizes and the sparsity of the matrices supose and extra challenge when batching examples for machine learning.

This module implements BasisMatrixData, a class that

The tools in this submodule are agnostic to the machine learning framework of choice, and they are based purely on numpy, with the extra dependency on sisl to handle the sparse matrices. The sisl dependency could eventually be lift off if needed.

Classes

`BasisMatrixData`([edge_index, neigh_isc, ...])	Version of `BasisMatrixDataBase` that stores data as numpy arrays.
`BasisMatrixDataBase`([edge_index, neigh_isc, ...])	Stores a graph with the preprocessed data for one or multiple configurations.
`MatrixDataProcessor`(basis_table, ...)	Data structure that contains all the parameters to interface the real world with the ML models.
`NumpyArraysProvider`(data)	Helper classs to get attributes from the data object making sure they are numpy arrays.

class graph2mat.core.data.processing.BasisMatrixData(edge_index: ndarray | None = None, neigh_isc: ndarray | None = None, node_attrs: ndarray | None = None, positions: ndarray | None = None, shifts: ndarray | None = None, cell: ndarray | None = None, nsc: ndarray | None = None, point_labels: ndarray | None = None, edge_labels: ndarray | None = None, labels_point_filter: ndarray | None = None, labels_edge_filter: ndarray | None = None, point_types: ndarray | None = None, edge_types: ndarray | None = None, data_processor: MatrixDataProcessor = None, metadata: Dict[str, Any] | None = None, already_basis: bool = False)[source]

Bases: BasisMatrixDataBase[ndarray]

Version of BasisMatrixDataBase that stores data as numpy arrays.

See also

yield_from_batch: The more explicit option for batches, which returns a generator.
graph2mat.Formats: Class containing all the available formats which can be passed to the out_format argument.

node_attr_getters: List[Any]

one_hot_encode(point_types: ndarray) → ndarray[source]

One hot encodes a vector of point types.

It takes into account the number of different point types in the basis table.

Parameters:: point_types (np.ndarray of shape (n_points,)) – Array of point types (their index in the basis table).
Returns:: One hot encoded array of point types.
Return type:: np.ndarray of shape (n_points, n_classes)

out_matrix: Literal['density_matrix', 'hamiltonian', 'energy_density_matrix', 'dynamical_matrix'] | None = None

static sort_edge_index(edge_index: ndarray, sc_shifts: ndarray, shifts: ndarray, edge_types: ndarray, isc_off: ndarray, inplace: bool = False) → Tuple[ndarray, ndarray, ndarray, ndarray][source]

Returns the sorted edge indices.

Edges are much easier to manipulate by the block producing routines if they are ordered properly.

This function orders edges in a way that both directions of the same edge come consecutively. It also always puts first the interaction (lowest point type, highest point type), that is the one with positive edge type.

For the unit cell, the connection in different directions is simple to understand, as it’s just a permutation of the points. I.e. edges (i, j) and (j, i) are the same connection in opposite directions. However, for connections between supercells (if there are periodic conditions), this condition is not enough. The supercell shift of one direction must be the negative of the other direction. I.e. only edges between (i, j, x, y, z) and (j, i, -x, -y, -z) are the same connection in opposite directions. It is also important to notice that in the supercell connections i and j can be the same index.

Parameters:

edge_index (np.ndarray of shape (2, n_edges)) – Pair of point indices for each edge.
sc_shifts (np.ndarray of shape (3, n_edges)) – For each edge, the number of cell boundaries the edge crosses in each lattice direction.
shifts (np.ndarray of shape (3, n_edges)) – For each edge, the cartesian shift induced by sc_shifts.
edge_types (np.ndarray of shape (n_edges, )) – For each edge, its type as an integer.
isc_off (np.ndarrray of shape (nsc_x, nsc_y, nsc_z)) – Array that maps from sc_shifts to a single supercell index.
inplace (bool, optional) – Whether the output should be placed in the input arrays, otherwise new arrays are created.

Returns:

numpy arrays with the same shape as the inputs. If inplace=True, these are just the input arrays, now containing the outputs.

Return type:

edge_index, sc_shifts, shifts, edge_types

sub_point_matrix: bool = True

symmetric_matrix: bool = False

torch_predict(torch_model, geometry: Geometry)[source]

yield_from_batch(data: BasisMatrixData, predictions: Dict | None = None, threshold: float = 1e-08, as_matrix: bool = False, out_format: str | None = None) → Generator[source]

Yields matrices from a batch.

It takes into account the matrix class associated to the data processor to return the corresponding matrix type.

Parameters:

data – The batched data.
predictions –
Predictions for the matrix labels, with the keys:
- node_labels: matrix elements that belong to node blocks.
- edge_labels: matrix elements that belong to edge blocks.
If None, the labels from the data object are used.
threshold – Elements with a value below this number will be considered 0.
as_matrix – Whether to return a matrix or a BasisMatrixData object.