Utilities

General Utilities

OrderedSignal

class lazyflow.utility.OrderedSignal(hide_cancellation_exceptions=False)[source]

A simple callback mechanism that ensures callbacks occur in the same order as subscription.

__call__(*args)[source]

Emit the signal. Calls each callback in the subscription list, in order, with the specified arguments.

subscribe(fn, **kwargs)[source]

Subscribe the given callable to be called when the signal is fired. If the callable is already subscribed to the signal, it is relocated to the end of the callback list.

Parameters:
  • fn – The callable to add to this signal’s list of callbacks. Must be hashable.
  • kwargsDEPRECATED. Additional parameters to include when the signal calls the function. Instead of using this parameter, consider binding arguments to your callable with functools.partial or (better) ilastik.bind.
unsubscribe(fn)[source]

Unsubscribe the given function from the signal’s callback list. If the callable was not found in the list, this function returns silently.

Parameters:fn – The callable to remove from the subscription list.

Note

This relies on the callable’s __eq__ operator. Note that functools.partial objects do not implement special support for __eq__. If your callback is of that type, you must provide the exact instance when unsubscribing. Note that ilastik.bind objects ARE equality comparable. For those callables, it is not necessary to provide the exact instance of the callable that was used for subscription. An equivalent ilastik.bind object (same target and bound args) will suffice.

Trace Logging

class lazyflow.utility.Tracer(logger, level=10, msg='', determine_caller=True, caller_name='')[source]

Context manager to simplify function entry/exit logging trace statements.

Example Usage:

>>> # Create a TRACE logger
>>> import sys, logging
>>> traceLogger = logging.getLogger("TRACE.examplemodule1")
>>> traceLogger.addHandler( logging.StreamHandler(sys.stdout) )
>>> # Use the context manager
>>> def f():
...     with Tracer(traceLogger):
...         print "Function f is running..."
>>> # If TRACE logging isn't enabled, there's no extra output
>>> f()
Function f is running...
>>> # Enable TRACE logging to see enter/exit log statements.
>>> traceLogger.setLevel(logging.DEBUG)
>>> f()
(enter) f 
Function f is running...
(exit) f
>>> # Disable TRACE logging by setting the level above DEBUG.
>>> traceLogger.setLevel(logging.INFO)
lazyflow.utility.traceLogged(logger, level=10, msg='', caller_name='')[source]

Returns a decorator that logs the entry and exit of its target function. Uses the the Tracer context manager internally.

Example Usage:

>>> # Create a TRACE logger
>>> import sys, logging
>>> traceLogger = logging.getLogger("TRACE.examplemodule2")
>>> traceLogger.addHandler( logging.StreamHandler(sys.stdout) )
>>> # Decorate a function to allow entry/exit trace logging.
>>> @traceLogged(traceLogger)
... def f():
...     print "Function f is running..."
>>> # If TRACE logging isn't enabled, there's no extra output
>>> f()
Function f is running...
>>> # Enable TRACE logging to see enter/exit log statements.
>>> traceLogger.setLevel(logging.DEBUG)
>>> f()
(enter) f 
Function f is running...
(exit) f
>>> # Disable TRACE logging by setting the level above DEBUG.
>>> traceLogger.setLevel(logging.INFO)

Path Manipulation

class lazyflow.utility.PathComponents(totalPath, cwd=None)[source]

Provides a convenient access to path components of a combined external/internal path to a dataset. Also, each of the properties listed below is writable, in which case ALL properties are updated accordingly.

__init__(totalPath, cwd=None)[source]

Initialize the path components.

Parameters:
  • totalPath – The entire path to the dataset, including any internal path (e.g. the path to an hdf5 dataset). For example, totalPath='/some/path/to/file.h5/with/internal/dataset'
  • cwd – If provided, relative paths will be converted to absolute paths using this arg as the working directory.
extension

Example: .h5

externalDirectory

Example: /some/path/to

externalPath

Example: /some/path/to/file.h5

filename

Example: file.h5

filenameBase

Example: file

internalDatasetName

Example: /dataset

internalDirectory

Example: /with/internal

internalPath

Example: /with/internal/dataset

totalPath()[source]

Return the (reconstructed) totalPath to the dataset.

lazyflow.utility.getPathVariants(originalPath, workingDirectory)[source]

Take the given filePath (which can be absolute or relative, and may include an internal path suffix), and return a tuple of the absolute and relative paths to the file.

FileLock

Implementation of a simple cross-platform file locking mechanism. This is a modified version of code retrieved on 2013-01-01 from http://www.evanfosmark.com/2009/01/cross-platform-file-locking-support-in-python. (The original code was released under the BSD License. See below for details.)

Modifications in this version:
  • Tweak docstrings for sphinx.
  • Accept an absolute path for the protected file (instead of a file name relative to cwd).
  • Allow timeout to be None.
  • Fixed a bug that caused the original code to be NON-threadsafe when the same FileLock instance was shared by multiple threads in one process. (The original was safe for multiple processes, but not multiple threads in a single process. This version is safe for both cases.)
  • Added purge() function.
  • Added available() function.
  • Expanded API to mimic threading.Lock interface: - __enter__ always calls acquire(), and therefore blocks if acquire() was called previously. - __exit__ always calls release(). It is therefore a bug to call release() from within a context manager. - Added locked() function. - Added blocking parameter to acquire() method
WARNINGS:
  • The locking mechanism used here may need to be changed to support old NFS filesystems: http://lwn.net/Articles/251004 (Newer versions of NFS should be okay, e.g. NFSv3 with Linux kernel 2.6. Check the open(2) man page for details about O_EXCL.)
  • This code has not been thoroughly tested on Windows, and there has been one report of incorrect results on Windows XP and Windows 7. The locking mechanism used in this class should (in theory) be cross-platform, but use at your own risk.

ORIGINAL LICENSE:

The original code did not properly include license text. (It merely said “License: BSD”.) Therefore, we’ll attach the following generic BSD License terms to this file. Those who extract this file from the lazyflow code base (LGPL) for their own use are therefore bound by the terms of both the Simplified BSD License below AND the LGPL.

Copyright (c) 2013, Evan Fosmark and others. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

The views and conclusions contained in the software and documentation are those of the authors and should not be interpreted as representing official policies, either expressed or implied, of the FreeBSD Project.

class lazyflow.utility.fileLock.FileLock(protected_file_path, timeout=None, delay=1, lock_file_contents=None)[source]

A file locking mechanism that has context-manager support so you can use it in a with statement. This should be relatively cross compatible as it doesn’t rely on msvcrt or fcntl for the locking.

__init__(protected_file_path, timeout=None, delay=1, lock_file_contents=None)[source]

Prepare the file locker. Specify the file to lock and optionally the maximum timeout and the delay between each attempt to lock.

acquire(blocking=True)[source]

Acquire the lock, if possible. If the lock is in use, and blocking is False, return False. Otherwise, check again every self.delay seconds until it either gets the lock or exceeds timeout number of seconds, in which case it raises an exception.

available()[source]

Returns True iff the file is currently available to be locked.

locked()[source]

Returns True iff the file is owned by THIS FileLock instance. (Even if this returns false, the file could be owned by another FileLock instance, possibly in a different thread or process).

purge()[source]

For debug purposes only. Removes the lock file from the hard disk.

release()[source]

Get rid of the lock by deleting the lockfile. When working in a with statement, this gets automatically called at the end.

JSON Config Parsing

Some lazyflow components rely on a special JSON config file format. The JsonConfigParser class handles parsing such files.

class lazyflow.utility.jsonConfig.JsonConfigParser(fields)[source]

Parser for json config files that match a specific schema. Currently, only a very small set of json is supported. The schema fields must be a dictionary of name : type (or pseudo-type) pairs.

A schema dict is also allowed as a pseudo-type value, which permits nested schemas.

>>> # Specify schema as a dict
>>> SchemaFields = {
...
...   "_schema_name" : "example-schema",
...   "_schema_version" : 1.0,
... 
...   "shoe size" : int,
...   "color" : str
... }
>>> 
>>> # Write a config file to disk for this example.
>>> example_file_str = \
... """
... {
...   "_schema_name" : "example-schema",
...   "_schema_version" : 1.0,
... 
...   "shoe size" : 12,
...   "color" : "red",
...   "ignored_field" : "Fields that are unrecognized by the schema are ignored."
... }
... """
>>> with open('/tmp/example_config.json', 'w') as f:
...   f.write(example_file_str)
>>> 
>>> # Create a parser that understands your schema
>>> parser = JsonConfigParser( SchemaFields )
>>> 
>>> # Parse the config file
>>> parsedFields = parser.parseConfigFile('/tmp/example_config.json')
>>> print parsedFields.color
red
>>> # Whitespace in field names is replaced with underscores in the Namespace member.
>>> print parsedFields.shoe_size
12
__init__(fields)[source]
parseConfigFile(configFilePath)[source]

Parse the JSON file at the given path into a Namespace object that provides easy access to the config contents. Fields are converted from default JSON types into the types specified by the schema.

writeConfigFile(configFilePath, configNamespace)[source]

Simply write the given object to a json file as a dict, but check it for errors first by parsing each field with the schema.

class lazyflow.utility.jsonConfig.Namespace[source]

Provides the same functionality as:

class Namespace(object):
    pass

except that self.__dict__ is replaced with an instance of collections.OrderedDict

Request Batching Utilities

These utilities provide convenient mechanisms for issuing a set of requests with controlled (specifically limited) parallelism and handling the result of each request in a serial callback.

BigRequestStreamer

Use the BigRequestStreamer if you want to retrieve a single large chunk of output data, but you want that big request broken up into many smaller blocks.

class lazyflow.utility.bigRequestStreamer.BigRequestStreamer(outputSlot, roi, blockshape=None, batchSize=None, blockAlignment='absolute', allowParallelResults=False)[source]

Execute a big request by breaking it up into smaller requests.

This class encapsulates the logic for dividing big rois into smaller ones to be executed separately. It relies on a RoiRequestBatch object, which is responsible for creating and scheduling the request for each roi.

Example:

>>> import sys
>>> import vigra
>>> from lazyflow.graph import Graph
>>> from lazyflow.operators.operators import OpArrayCache
>>> # Example data
>>> data = numpy.indices( (100,100) ).sum(0)
>>> data = vigra.taggedView( data, vigra.defaultAxistags('xy') )
>>> op = OpArrayCache( graph=Graph() )
>>> op.Input.setValue( data )
>>> total_roi = [(25, 65), (45, 95)]
>>> # Init with our output slot and roi to request.
>>> # batchSize indicates the number of requests to spawn in parallel.
>>> streamer = BigRequestStreamer( op.Output, total_roi, (10,10), batchSize=2, blockAlignment='relative' )
>>> # Use a callback to handle sub-results one at a time.
>>> result_count = [0]
>>> result_total_sum = [0]
>>> def handle_block_result(roi, result):
...     # No need for locking here if allowParallelResults=True.
...     result_count[0] += 1
...     result_total_sum[0] += result.sum()
>>> streamer.resultSignal.subscribe( handle_block_result )
>>> # Optional: Subscribe to progress updates
>>> def handle_progress(progress):
...     if progress == 0:
...         sys.stdout.write("Progress: ")
...     sys.stdout.write( "{} ".format( progress ) )
>>> streamer.progressSignal.subscribe( handle_progress )
>>> # Execute the batch of requests, and block for the result.
>>> streamer.execute()
Progress: 0 16 33 50 66 83 100 100 
>>> print "Processed {} result blocks with a total sum of: {}".format( result_count[0], result_total_sum[0] )
Processed 6 result blocks with a total sum of: 68400
__init__(outputSlot, roi, blockshape=None, batchSize=None, blockAlignment='absolute', allowParallelResults=False)[source]

Constructor.

Parameters:
  • outputSlot – The slot to request data from.
  • roi – The roi (start, stop) of interest. Will be broken up and requested via smaller requests.
  • blockshape – The amount of data to request in each request. If omitted, a default blockshape is chosen by inspecting the metadata of the given slot.
  • batchSize – The maximum number of requests to launch in parallel. This should not be necessary if the blockshape is small enough that you won’t run out of RAM.
  • blockAlignment – Determines how block the requests. Choices are ‘absolute’ or ‘relative’.
  • allowParallelResults – If False, The resultSignal will not be called in parallel. In that case, your handler function has no need for locks.
execute()[source]

Request the data for the entire roi by breaking it up into many smaller requests, and wait for all of them to complete. A batch of N requests is launched, and subsequent requests are launched one-by-one as the earlier requests complete. Thus, there will be N requests executing in parallel at all times.

This method returns None. All results must be handled via the resultSignal.

progressSignal

Progress Signal Signature: f(progress_percent)

resultSignal

Results signal. Signature: f(roi, result). Guaranteed not to be called from multiple threads in parallel.

RoiRequestBatch

If you have an image-like slot and a set of rois you’re interested in retrieving, use RoiRequestBatch to request the whole set with a custom level of parallelism.

class lazyflow.utility.roiRequestBatch.RoiRequestBatch(outputSlot, roiIterator, totalVolume=None, batchSize=2, allowParallelResults=False)[source]

A simple utility for requesting a list of rois from an output slot. The number of rois requested in parallel is throttled by the batch size given to the constructor. The result of each requested roi is provided as a signal, which the user should subscribe() to.

Example usage:

>>> import sys
>>> import vigra
>>> from lazyflow.graph import Graph
>>> from lazyflow.operators.operators import OpArrayCache
>>> # Example data
>>> data = numpy.indices( (100,100) ).sum(0)
>>> data = vigra.taggedView( data, vigra.defaultAxistags('xy') )
>>> op = OpArrayCache( graph=Graph() )
>>> op.Input.setValue( data )
>>> # Create a list of rois to iterate through.
>>> # Typically you'll want to automate this
>>> #  with e.g. lazyflow.roi.getIntersectingBlocks
>>> rois = []
>>> rois.append( ( (0, 0), (10,10) ) )
>>> rois.append( ( (0,10), (10,20) ) )
>>> rois.append( ( (0,20), (10,30) ) )
>>> rois.append( ( (0,30), (10,40) ) )
>>> rois.append( ( (0,40), (10,50) ) )
>>> # Init with our output slot and list of rois to request.
>>> # `batchSize` indicates the number of requests to spawn in parallel.
>>> # Provide `totalVolume` if you want progress reporting.
>>> batch_requester = RoiRequestBatch( op.Output, iter(rois), totalVolume=500, batchSize=2 )
>>> # Use a callback to handle sub-results one at a time.
>>> result_count = [0]
>>> result_total_sum = [0]
>>> def handle_block_result(roi, result):
...     # No need for locking here if allowParallelResults=True.
...     result_count[0] += 1
...     result_total_sum[0] += result.sum()
>>> batch_requester.resultSignal.subscribe( handle_block_result )
>>> # Optional: Subscribe to progress updates
>>> def handle_progress(progress):
...     if progress == 0:
...         sys.stdout.write("Progress: ")
...     sys.stdout.write( "{} ".format( progress ) )
>>> batch_requester.progressSignal.subscribe( handle_progress )
>>> # Execute the batch of requests, and block for the result.
>>> batch_requester.execute()
Progress: 0 20 40 60 80 100 100 
>>> print "Processed {} result blocks with a total sum of: {}".format( result_count[0], result_total_sum[0] )
Processed 5 result blocks with a total sum of: 14500
__init__(outputSlot, roiIterator, totalVolume=None, batchSize=2, allowParallelResults=False)[source]

Constructor.

Parameters:
  • outputSlot – The slot to request data from.
  • roiIterator – An iterator providing new rois.
  • totalVolume – The total volume to be processed. Used to provide the progress reporting signal. If not provided, then no intermediate progress will be signaled.
  • batchSize – The maximum number of requests to launch in parallel.
  • allowParallelResults – If False, The resultSignal will not be called in parallel. In that case, your handler function has no need for locks.
execute()[source]

Execute the batch of requests and wait for all of them to complete. A batch of N requests is launched, and subsequent requests are launched one-by-one as the earlier requests complete. Thus, there will be N requests executing in parallel at all times.

This method returns None. All results must be handled via the resultSignal.

progressSignal

Progress Signal Signature: f(progress_percent)

resultSignal

Results signal. Signature: f(roi, result). Guaranteed not to be called from multiple threads in parallel.

IO Utilities

These utilities provide access to special data formats supported by lazyflow.

Blockwise Data Format

For big datasets, lazyflow supports a special input/output format that is based on blocks of data stored as hdf5 files in a special directory tree structure. The dataset is described by a special json file.

A small example explains the basics. Consider a dataset with axes x-y-z and shape 300x100x100. Suppose it is stored on disk in blocks of size 100x50x50. Let’s start by inspecting the dataset description file:

$ ls
data_description_params.json  my_dataset_blocks
$
$ cat data_description_params.json
{
    "_schema_name" : "blockwise-fileset-description",
    "_schema_version" : 1.0,

    "name" : "example_data",
    "format" : "hdf5",
    "axes" : "xyz",
    "shape" : [300,100,100],
    "dtype" : "numpy.uint8",
    "block_shape" : [100, 50, 50],
    "dataset_root_dir" : "./my_dataset_blocks",
    "block_file_name_format" : "blockFile-{roiString}.h5/volume/data"
}

This listing shows how the directory tree is structured:

$ ls my_dataset_blocks/*/*/*/*.h5
my_dataset_blocks/x_00000000/y_00000000/z_00000000/blockFile-([0, 0, 0], [100, 50, 50]).h5
my_dataset_blocks/x_00000000/y_00000000/z_00000050/blockFile-([0, 0, 50], [100, 50, 100]).h5
my_dataset_blocks/x_00000000/y_00000050/z_00000000/blockFile-([0, 50, 0], [100, 100, 50]).h5
my_dataset_blocks/x_00000000/y_00000050/z_00000050/blockFile-([0, 50, 50], [100, 100, 100]).h5
my_dataset_blocks/x_00000100/y_00000000/z_00000000/blockFile-([100, 0, 0], [200, 50, 50]).h5
my_dataset_blocks/x_00000100/y_00000000/z_00000050/blockFile-([100, 0, 50], [200, 50, 100]).h5
my_dataset_blocks/x_00000100/y_00000050/z_00000000/blockFile-([100, 50, 0], [200, 100, 50]).h5
my_dataset_blocks/x_00000100/y_00000050/z_00000050/blockFile-([100, 50, 50], [200, 100, 100]).h5
my_dataset_blocks/x_00000200/y_00000000/z_00000000/blockFile-([200, 0, 0], [300, 50, 50]).h5
my_dataset_blocks/x_00000200/y_00000000/z_00000050/blockFile-([200, 0, 50], [300, 50, 100]).h5
my_dataset_blocks/x_00000200/y_00000050/z_00000000/blockFile-([200, 50, 0], [300, 100, 50]).h5
my_dataset_blocks/x_00000200/y_00000050/z_00000050/blockFile-([200, 50, 50], [300, 100, 100]).h5

But you shouldn’t really have to worry too much about how the data is stored. The BlockwiseFileset and RESTfulBlockwiseFileset classes provide a high-level API for reading and writing such datasets. See the documentation of those classes for details.

class lazyflow.utility.io_util.BlockwiseFileset(descriptionFilePath, mode='r', preparsedDescription=None)[source]

This class handles writing and reading a ‘blockwise file set’. A ‘blockwise file set’ is a directory with a particular structure, which contains the entire dataset broken up into blocks. Important parameters (e.g. shape, dtype, blockshape) are specified in a JSON file, which must match the schema given by BlockwiseFileset.DescriptionFields. The parent directory of the description file is considered to be the top-most directory in the blockwise dataset hierarchy.

  • Simultaneous reads are threadsafe.
  • NOT threadsafe for reading and writing simultaneously (or writing and writing).
  • NOT threadsafe for closing. Do not call close() while reading or writing.

Note

See the unit tests in tests/testBlockwiseFileset.py for example usage.

__init__(descriptionFilePath, mode='r', preparsedDescription=None)[source]

Constructor. Uses readDescription interally.

Parameters:
  • descriptionFilePath – The path to the .json file that describes the dataset.
  • mode – Set to 'r' if the fileset should be read-only.
  • preparsedDescription – (Optional) Provide pre-parsed description fields, in which case the provided description file will not be parsed.
exception BlockNotReadyError(block_start)[source]

This exception is raised if readData() is called for data that isn’t available on disk.

BlockwiseFileset.DescriptionFields = {'view_origin': <lazyflow.utility.jsonConfig.AutoEval object>, 'name': <type 'str'>, 'block_file_name_format': <lazyflow.utility.jsonConfig.FormattedField object>, 'format': <type 'str'>, 'dtype': <lazyflow.utility.jsonConfig.AutoEval object>, 'axes': <type 'str'>, 'drange': <lazyflow.utility.jsonConfig.AutoEval object>, 'dataset_root_dir': <type 'str'>, '_schema_version': 1.1, 'shape': <lazyflow.utility.jsonConfig.AutoEval object>, '_schema_name': 'blockwise-fileset-description', 'block_shape': <lazyflow.utility.jsonConfig.AutoEval object>, 'view_shape': <lazyflow.utility.jsonConfig.AutoEval object>, 'chunks': <lazyflow.utility.jsonConfig.AutoEval object>, 'hash_id': <type 'str'>, 'compression_opts': <lazyflow.utility.jsonConfig.AutoEval object>, 'sub_block_shape': <lazyflow.utility.jsonConfig.AutoEval object>, 'compression': <type 'str'>}

These fields describe the schema of the description file. See the source code comments for a description of each field.

BlockwiseFileset.close()[source]

Close all open block files.

BlockwiseFileset.description

The jsonConfig.Namespace object that describes this dataset.

BlockwiseFileset.exportRoiToHdf5(roi, exportDirectory, use_view_coordinates=True)[source]

Export an arbitrary roi to a single hdf5 file. The file will be placed in the given exportDirectory, and will be named according to the exported roi.

Parameters:
  • roi – The roi to export
  • exportDirectory – The directory in which the result should be placed.
  • use_view_coordinates – If True, assume the roi was given relative to the view start. Otherwise, assume it was given relative to the on-disk coordinates.
BlockwiseFileset.exportSubset(roi, exportDirectory, use_view_coordinates=True)[source]

Create a new blockwise fileset by copying a subset of this blockwise fileset.

Parameters:
  • roi – The portion to export. Must be along block boundaries, in ABSOLUTE coordinates.
  • exportDirectory – The directory to copy the new blockwise fileset to.
BlockwiseFileset.getAllBlockRois()[source]

Return the list of rois for all VIEWED blocks in the dataset.

BlockwiseFileset.getBlockStatus(blockstart)[source]

Check a block’s status. (Just because a block file exists doesn’t mean that it has valid data.) Returns a status code of either BlockwiseFileset.BLOCK_AVAILABLE or BlockwiseFileset.BLOCK_NOT_AVAILABLE.

BlockwiseFileset.getDatasetDirectory(blockstart)[source]

Return the directory that contains the block that starts at the given coordinates.

BlockwiseFileset.getDatasetPathComponents(block_start)[source]

Return a PathComponents object for the block file that corresponds to the given block start coordinate.

BlockwiseFileset.getEntireBlockRoi(block_start)[source]

Return the roi for the entire block that starts at the given coordinate.

BlockwiseFileset.getOpenHdf5FileForBlock(block_start)[source]

Returns a handle to a file in this dataset.

BlockwiseFileset.isBlockLocked(blockstart)[source]

Return True if the block is locked for writing. Note that both ‘available’ and ‘not available’ blocks might be locked.

BlockwiseFileset.purgeAllLocks()[source]

Clears all .lock files from the local blockwise fileset. This may be necessary if previous processes crashed or were killed while some blocks were downloading. You must ensure that this is NOT called while more than one process (or thread) has access to the fileset. For example, in a master/worker situation, call this only from the master, before the workers have been started.

BlockwiseFileset.readData(roi, out_array=None)[source]

Read data from the fileset.

Parameters:
  • roi – The region of interest to read from the dataset. Must be a tuple of iterables: (start, stop).
  • out_array – The location to store the read data. Must be the correct size for the given roi. If not provided, an array is created for you.
Returns:

The requested data. If out_array was provided, returns out_array.

classmethod BlockwiseFileset.readDescription(descriptionFilePath)[source]

Parse the description file at the given path and return a jsonConfig.Namespace object with the description parameters. The file will be parsed according to the schema given by BlockwiseFileset.DescriptionFields.

Parameters:descriptionFilePath – The path to the description file to parse.
BlockwiseFileset.setBlockStatus(blockstart, status)[source]

Set a block status on disk. We use a simple convention: If the status file exists, the block is available. Otherwise, it ain’t.

Parameters:status – Must be either BlockwiseFileset.BLOCK_AVAILABLE or BlockwiseFileset.BLOCK_NOT_AVAILABLE.
BlockwiseFileset.writeData(roi, data)[source]

Write data to the fileset.

Parameters:
  • roi – The region of interest to write the data to. Must be a tuple of iterables: (start, stop).
  • data – The data to write. Must be the correct size for the given roi.
classmethod BlockwiseFileset.writeDescription(descriptionFilePath, descriptionFields)[source]

Write a jsonConfig.Namespace object to the given path.

Parameters:
  • descriptionFilePath – The path to overwrite with the description fields.
  • descriptionFields – The fields to write.

Remote Volumes

class lazyflow.utility.io_util.RESTfulVolume(descriptionFilePath=None, preparsedDescription=None)[source]

This class provides access to data obtained via a RESTful API (e.g. from http://openconnecto.me). A description of the remote volume must be provided via a JSON file, whose schema is specified by RESTfulVolume.DescriptionFields.

See the unit tests in tests/testRESTfulVolume.py for example usage.

Note

This class does not keep track of the data you’ve already downloaded. Every call to downloadSubVolume() results in a new download. For automatic blockwise local caching of remote datasets, see RESTfulBlockwiseFileset.

Note

See the unit tests in tests/testRESTfulVolume.py for example usage.

__init__(descriptionFilePath=None, preparsedDescription=None)[source]

Constructor. Uses readDescription interally.

Parameters:
  • descriptionFilePath – The path to the .json file that describes the remote volume.
  • preparsedDescription – (Optional) Provide pre-parsed description fields, in which case the provided description file will not be parsed.
DescriptionFields = {'format': <type 'str'>, 'dtype': <lazyflow.utility.jsonConfig.AutoEval object>, 'hdf5_dataset': <type 'str'>, '_schema_version': 1.0, 'shape': <lazyflow.utility.jsonConfig.AutoEval object>, 'name': <type 'str'>, 'axes': <type 'str'>, 'bounds': <lazyflow.utility.jsonConfig.AutoEval object>, '_schema_name': 'RESTful-volume-description', 'origin_offset': <lazyflow.utility.jsonConfig.AutoEval object>, 'url_format': <lazyflow.utility.jsonConfig.FormattedField object>}

These fields describe the schema of the description file. See the source code comments for a description of each field.

downloadSubVolume(roi, outputDatasetPath)[source]

Download a cutout volume from the remote dataset.

Parameters:
  • roi – The subset of the volume to download, specified as a tuple of coordinates: (start, stop)
  • outputDatasetPath – The path to overwrite with the downloaded hdf5 file.
classmethod readDescription(descriptionFilePath)[source]

Parse the description file at the given path and return a jsonConfig.Namespace object with the description parameters. The file will be parsed according to the schema given by RESTfulVolume.DescriptionFields. Any optional parameters not provided by the user are filled in automatically.

Parameters:descriptionFilePath – The path to the description file to parse.
classmethod updateDescription(description)[source]

Some description fields are optional. If they aren’t provided in the description JSON file, then this function provides them with default values, based on the other description fields.

classmethod writeDescription(descriptionFilePath, descriptionFields)[source]

Write a jsonConfig.Namespace object to the given path.

Parameters:
  • descriptionFilePath – The path to overwrite with the description fields.
  • descriptionFields – The fields to write.
class lazyflow.utility.io_util.RESTfulBlockwiseFileset(compositeDescriptionPath)[source]

This class combines the functionality of RESTfulVolume and BlockwiseFileset to provide access to a remote dataset (e.g. from http://openconnecto.me), with all downloaded data cached locally as blocks stored in a directory tree of hdf5 files.

This class must be constructed with a description of both the remote dataset and the local storage format, provided in a JSON file with a composite schema specified by RESTfulBlockwiseFileset.DescriptionFields.

Note

See the unit tests in tests/testRESTfulBlockwiseFileset.py for example usage.

Here’s an example description file.

{
    "_schema_name" : "RESTful-blockwise-fileset-description",
    "_schema_version" : 1.0,

    "remote_description" : 
    {
        "_schema_name" : "RESTful-volume-description",
        "_schema_version" : 1.0,
    
        "name" : "Bock11-level0",
        "format" : "hdf5",
        "axes" : "zyx",

        "## NOTE": "The origin offset determines how coordinates are translated when converted to a url.",
        "## NOTE": "The origin_offset for the bock11 dataset must be at least 2917, because for some reason that's where it starts.",
        "origin_offset" : [2917, 0, 0],

        "## NOTE": "The website says that the data goes up to plane 4156, but it actually errors out past 4150",
        "bounds" : [4150, 135424, 119808],
        "dtype" : "numpy.uint8",
        "url_format" : "http://openconnecto.me/emca/bock11/hdf5/0/{x_start},{x_stop}/{y_start},{y_stop}/{z_start},{z_stop}/",
        "hdf5_dataset" : "cube"
    },

    "local_description" :
    {
        "_schema_name" : "blockwise-fileset-description",
        "_schema_version" : 1.0,

        "name" : "bock11-blocks",
        "format" : "hdf5",
        "axes" : "zyx",
        "shape" : "[ 4150-2917, 135424, 119808 ]",
        "dtype" : "numpy.uint8",
        "block_shape" : [32, 256, 256],
        "block_file_name_format" : "block-{roiString}.h5/cube",
        "dataset_root_dir" : "blocks-256x256x32",

        "## NOTE":"These optional parameters tell ilastik to view only a portion of the on-disk dataset.",
        "## NOTE":"view_origin MUST be aligned to a block start corner.",
        "## NOTE":"view_shape is optional, but recommended because volumina slows down when there are 1000s of tiles.",
        "view_origin" : "[0, 50*1024, 50*1024]",
        "view_shape" : "[4150-2917, 10*256, 10*256]"
    }
}
__init__(compositeDescriptionPath)[source]

Constructor. Uses readDescription interally.

Parameters:compositeDescriptionPath – The path to a JSON file that describes both the remote volume and local storage structure. The JSON file schema is specified by RESTfulBlockwiseFileset.DescriptionFields.
DescriptionFields = {'_schema_version': 1.0, 'local_description': <lazyflow.utility.jsonConfig.JsonConfigParser object>, '_schema_name': 'RESTful-blockwise-fileset-description', 'remote_description': <lazyflow.utility.jsonConfig.JsonConfigParser object>}

This member specifies the schema of the description file. It is merely a composite of two nested schemas: one that describes the remote volume, and another that describes the local storage format. See the source code to see the field names.

downloadAllBlocks(max_parallel, skip_preparation=False)[source]

Download all blocks in the local view. This is used in utility scripts for downloading an entire volume at once. This function is NOT intended to be used by multiple threads in parallel (i.e. it doesn’t protect against downloading the same block twice.)

readData(roi, out_array=None)[source]

Read data from the fileset. If any of the requested data is not yet available locally, download it first.

Parameters:
  • roi – The region of interest to read from the dataset. Must be a tuple of iterables: (start, stop).
  • out_array – The location to store the read data. Must be the correct size for the given roi. If not provided, an array is created for you.
Returns:

The requested data. If out_array was provided, returns out_array.

classmethod readDescription(descriptionFilePath)[source]

Parse the description file at the given path and return a jsonConfig.Namespace object with the description parameters. The file will be parsed according to the schema given by RESTfulBlockwiseFileset.DescriptionFields. Any optional parameters not provided by the user are filled in automatically.

Parameters:descriptionFilePath – The path to the description file to parse.
classmethod writeDescription(descriptionFilePath, descriptionFields)[source]

Write a jsonConfig.Namespace object to the given path.

Parameters:
  • descriptionFilePath – The path to overwrite with the description fields.
  • descriptionFields – The fields to write.