Data#

The geoh5 format allows storing data (values) on different parts of an Object. The data types currently supported by geoh5py are

  • Float

  • Integer

  • Text

  • Colormap

  • Well log

data

import numpy as np

from geoh5py import Workspace
from geoh5py.objects import Curve


# Re-use the previous workspace
workspace = Workspace.create("my_project.geoh5")

# Create some curve object for demo
curve = Curve.create(workspace, vertices=np.c_[np.arange(100), np.zeros((100, 2))])

Float#

Numerical float data can be attached to the various elements making up object. Data can be added to an Object entity using the add_data method.

curve.add_data(
    {
        "my_cell_values": {
            "association": "CELL",
            "values": np.random.randn(curve.n_cells),
        }
    }
)
<geoh5py.data.float_data.FloatData at 0x70c5769eece0>

The association can be one of:

  • OBJECT: Single element characterizing the parent object

  • VERTEX: Array of values associated with the parent object vertices

  • CELL: Array of values associated with the parent object cells

The length and order of the array of values must be consistent with the corresponding element of association. If the association argument is omited, geoh5py will attempt to assign the data to the correct part based on the shape of the data values, either object.n_values or object.n_cells

# Add multiple data vectors on a single call
data = {}
for ii in range(8):
    data[f"Period:{ii}"] = {
        "association": "VERTEX",
        "values": (ii + 1)
        * np.cos(ii * curve.vertices[:, 0] * np.pi / curve.vertices[:, 0].max() / 4.0),
    }

data_list = curve.add_data(data)
print([obj.name for obj in data_list])
['Period:0', 'Period:1', 'Period:2', 'Period:3', 'Period:4', 'Period:5', 'Period:6', 'Period:7']

The newly created data is directly added to the project’s geoh5 file and available for visualization:

adddata

Integer#

Same implementation as for Float data type but with values provided as integer (int32).

Text#

Text (string) data can only be associated to the object itself.

curve.add_data({"my_comment": {"association": "OBJECT", "values": "hello_world"}})
<geoh5py.data.text_data.TextData at 0x70c56c7fe5f0>

Colormap#

The colormap data type can be used to store or customize the color palette used by Geoscience ANALYST.

from geoh5py.data.color_map import ColorMap
from geoh5py.objects import Grid2D


# Create some data on a grid2D entity.
# Create the Surface object
grid = Grid2D.create(
    workspace,
    u_cell_size=2.5,
    v_cell_size=2.5,
    u_count=64,
    v_count=16,
)
# Add data
radius = grid.add_data({"radial": {"values": np.linalg.norm(grid.centroids, axis=1)}})

mycolormap

# Create a simple colormap that spans the data range
nc = 10
rgba = np.vstack(
    [
        np.linspace(radius.values.min(), radius.values.max(), nc),  # Values
        np.linspace(0, 255, nc),  # Red
        np.linspace(255, 0, nc),  # Green
        np.linspace(125, 15, nc),  # Blue,
        np.ones(nc) * 255,  # Alpha,
    ]
).T

We now have an array that contains a range of integer values for red, green, blue and alpha (RGBA) over the span of the data values. This array can be used to implicitly create a MyColorMap from the EntityType.

# Assign the colormap to the data type
radius.entity_type.color_map = rgba

The resulting ColorMap stores the values to geoh5 as a numpy.recarray with fields for Value, Red, Green, Blue and Alpha.

radius.entity_type.color_map._values
rec.array([(  1.76776695,   0, 255, 125, 255),
           ( 19.72811601,  28, 226, 112, 255),
           ( 37.68846506,  56, 198, 100, 255),
           ( 55.64881412,  85, 170,  88, 255),
           ( 73.60916317, 113, 141,  76, 255),
           ( 91.56951223, 141, 113,  63, 255),
           (109.52986128, 170,  85,  51, 255),
           (127.49021034, 198,  56,  39, 255),
           (145.45055939, 226,  28,  27, 255),
           (163.41090845, 255,   0,  15, 255)],
          dtype=[('Value', '<f8'), ('Red', 'u1'), ('Green', 'u1'), ('Blue', 'u1'), ('Alpha', 'u1')])

colormap

Files#

Raw files can be added to groups and objects and stored as blob (bytes) data in geoh5.

with open("docs.txt", mode="w") as file:
    file.write("Hello world")

file_data = grid.add_file("docs.txt")

filename

The information can easily be re-exported out to disk with the save method.

file_data.save_file(path="./temp", name="exported.txt")
import shutil


shutil.rmtree("./temp")

Well Data#

In the case of Drillhole objects, data are always stored as from-to interval values.

Depth Data#

Depth data are used to represent measurements recorded at discrete depths along the well path. A depth attribute is required on creation. Depth markers are converted internally to from-to intervals by adding a small depth values defined by the collocation_distance. If the Drillhole object already holds depth data at the same location, geoh5py will group the datasets under the same PropertyGroup.

from geoh5py.groups import DrillholeGroup
from geoh5py.objects import Drillhole


dh_group = DrillholeGroup.create(workspace)
well = Drillhole.create(workspace, collar=(0, 0, 0), parent=dh_group)

depths_A = np.arange(0, 50.0)  # First list of depth

# Second list slightly offsetted on the first few depths
depths_B = np.arange(0.01, 50.01)

# Add both set of log data with 0.5 m tolerance
well.add_data(
    {
        "my_log_values": {
            "depth": depths_A,
            "values": np.random.randn(depths_A.shape[0]),
        },
        "log_wt_tolerance": {
            "depth": depths_B,
            "values": np.random.randn(depths_B.shape[0]),
        },
    }
)
[<abc.ConcatenatedFloatData at 0x70c56c49d150>,
 <abc.ConcatenatedFloatData at 0x70c56c49c190>]

DHlog

Interval (From-To) Data#

Interval data are defined by constant values bounded by a start (FROM) and an end (TO) depth. A from-to attribute defined as a numpy.ndarray (nD, 2) is expected on creation. Subsequent data are appended to the same interval PropertyGroup if the from-to values match within the collocation distance parameter. Users can control the tolerance for matching intervals by supplying a collocation_distance argument in meters, or by setting the default on the drillhole entity (default_collocation_distance = 1e-2 meters).

# Define a from-to array
from_to = np.vstack([[0.25, 25.5], [30.1, 55.5], [56.5, 80.2]])

# Add some reference data
well.add_data(
    {
        "interval_values": {
            "values": np.asarray([1, 2, 3]),
            "from-to": from_to,
            "value_map": {1: "Unit_A", 2: "Unit_B", 3: "Unit_C"},
            "type": "referenced",
        }
    }
)

# Add float data on the same intervals
well.add_data(
    {
        "random_values": {
            "values": np.random.randn(from_to.shape[0]),
            "from-to": from_to,
        }
    }
)
<abc.ConcatenatedFloatData at 0x70c56c49d5d0>

DHinterval

Get data#

Just like any Entity, data can be retrieved from the Workspace using the get_entity method. For convenience, Objects also have a get_data_list and get_data method that focusses only on their respective children Data.

my_list = curve.get_data_list()
print(my_list, curve.get_data(my_list[0]))
['Period:0', 'Period:1', 'Period:2', 'Period:3', 'Period:4', 'Period:5', 'Period:6', 'Period:7', 'my_cell_values', 'my_comment'] [<geoh5py.data.float_data.FloatData object at 0x70c56c7fed40>]

Property Groups#

Data entities sharing the same parent Object and association can be linked within a property_groups and made available through profiling. This can be used to group data that would normally be stored as 2D array.

# Add another VERTEX data and create a group with previous
curve.add_data_to_group([obj.uid for obj in data_list], "my_trig_group")
<geoh5py.groups.property_group.PropertyGroup at 0x70c56c49dfc0>

propgroups

workspace.close()