Utilities¶
Contents
General Utilities¶
OrderedSignal¶
-
class
lazyflow.utility.
OrderedSignal
(hide_cancellation_exceptions=False)[source]¶ A simple callback mechanism that ensures callbacks occur in the same order as subscription.
-
__call__
(*args)[source]¶ Emit the signal. Calls each callback in the subscription list, in order, with the specified arguments.
-
subscribe
(fn, **kwargs)[source]¶ Subscribe the given callable to be called when the signal is fired. If the callable is already subscribed to the signal, it is relocated to the end of the callback list.
Parameters: - fn – The callable to add to this signal’s list of callbacks. Must be hashable.
- kwargs – DEPRECATED. Additional parameters to include when the signal calls the function.
Instead of using this parameter, consider binding arguments to your callable
with
functools.partial
or (better)ilastik.bind
.
-
unsubscribe
(fn)[source]¶ Unsubscribe the given function from the signal’s callback list. If the callable was not found in the list, this function returns silently.
Parameters: fn – The callable to remove from the subscription list. Note
This relies on the callable’s
__eq__
operator. Note thatfunctools.partial
objects do not implement special support for__eq__
. If your callback is of that type, you must provide the exact instance when unsubscribing. Note thatilastik.bind
objects ARE equality comparable. For those callables, it is not necessary to provide the exact instance of the callable that was used for subscription. An equivalentilastik.bind
object (same target and bound args) will suffice.
-
Trace Logging¶
-
class
lazyflow.utility.
Tracer
(logger, level=10, msg='', determine_caller=True, caller_name='')[source]¶ Context manager to simplify function entry/exit logging trace statements.
Example Usage:
>>> # Create a TRACE logger >>> import sys, logging >>> traceLogger = logging.getLogger("TRACE.examplemodule1") >>> traceLogger.addHandler( logging.StreamHandler(sys.stdout) )
>>> # Use the context manager >>> def f(): ... with Tracer(traceLogger): ... print "Function f is running..."
>>> # If TRACE logging isn't enabled, there's no extra output >>> f() Function f is running...
>>> # Enable TRACE logging to see enter/exit log statements. >>> traceLogger.setLevel(logging.DEBUG) >>> f() (enter) f Function f is running... (exit) f
>>> # Disable TRACE logging by setting the level above DEBUG. >>> traceLogger.setLevel(logging.INFO)
-
lazyflow.utility.
traceLogged
(logger, level=10, msg='', caller_name='')[source]¶ Returns a decorator that logs the entry and exit of its target function. Uses the the
Tracer
context manager internally.Example Usage:
>>> # Create a TRACE logger >>> import sys, logging >>> traceLogger = logging.getLogger("TRACE.examplemodule2") >>> traceLogger.addHandler( logging.StreamHandler(sys.stdout) )
>>> # Decorate a function to allow entry/exit trace logging. >>> @traceLogged(traceLogger) ... def f(): ... print "Function f is running..."
>>> # If TRACE logging isn't enabled, there's no extra output >>> f() Function f is running...
>>> # Enable TRACE logging to see enter/exit log statements. >>> traceLogger.setLevel(logging.DEBUG) >>> f() (enter) f Function f is running... (exit) f
>>> # Disable TRACE logging by setting the level above DEBUG. >>> traceLogger.setLevel(logging.INFO)
Path Manipulation¶
-
class
lazyflow.utility.
PathComponents
(totalPath, cwd=None)[source]¶ Provides a convenient access to path components of a combined external/internal path to a dataset. Also, each of the properties listed below is writable, in which case ALL properties are updated accordingly.
-
__init__
(totalPath, cwd=None)[source]¶ Initialize the path components.
Parameters: - totalPath – The entire path to the dataset, including any internal path (e.g. the path to an hdf5 dataset).
For example,
totalPath='/some/path/to/file.h5/with/internal/dataset'
- cwd – If provided, relative paths will be converted to absolute paths using this arg as the working directory.
- totalPath – The entire path to the dataset, including any internal path (e.g. the path to an hdf5 dataset).
For example,
-
extension
¶ Example:
.h5
-
externalDirectory
¶ Example:
/some/path/to
-
externalPath
¶ Example:
/some/path/to/file.h5
-
filename
¶ Example:
file.h5
-
filenameBase
¶ Example:
file
-
internalDatasetName
¶ Example:
/dataset
-
internalDirectory
¶ Example:
/with/internal
-
internalPath
¶ Example:
/with/internal/dataset
-
FileLock¶
Implementation of a simple cross-platform file locking mechanism. This is a modified version of code retrieved on 2013-01-01 from http://www.evanfosmark.com/2009/01/cross-platform-file-locking-support-in-python. (The original code was released under the BSD License. See below for details.)
- Modifications in this version:
- Tweak docstrings for sphinx.
- Accept an absolute path for the protected file (instead of a file name relative to cwd).
- Allow timeout to be None.
- Fixed a bug that caused the original code to be NON-threadsafe when the same FileLock instance was shared by multiple threads in one process. (The original was safe for multiple processes, but not multiple threads in a single process. This version is safe for both cases.)
- Added
purge()
function. - Added
available()
function. - Expanded API to mimic
threading.Lock interface
: -__enter__
always callsacquire()
, and therefore blocks ifacquire()
was called previously. -__exit__
always callsrelease()
. It is therefore a bug to callrelease()
from within a context manager. - Addedlocked()
function. - Added blocking parameter toacquire()
method
- WARNINGS:
- The locking mechanism used here may need to be changed to support old NFS filesystems: http://lwn.net/Articles/251004 (Newer versions of NFS should be okay, e.g. NFSv3 with Linux kernel 2.6. Check the open(2) man page for details about O_EXCL.)
- This code has not been thoroughly tested on Windows, and there has been one report of incorrect results on Windows XP and Windows 7. The locking mechanism used in this class should (in theory) be cross-platform, but use at your own risk.
ORIGINAL LICENSE:
The original code did not properly include license text. (It merely said “License: BSD”.) Therefore, we’ll attach the following generic BSD License terms to this file. Those who extract this file from the lazyflow code base (LGPL) for their own use are therefore bound by the terms of both the Simplified BSD License below AND the LGPL.
Copyright (c) 2013, Evan Fosmark and others. All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
- Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
The views and conclusions contained in the software and documentation are those of the authors and should not be interpreted as representing official policies, either expressed or implied, of the FreeBSD Project.
-
class
lazyflow.utility.fileLock.
FileLock
(protected_file_path, timeout=None, delay=1, lock_file_contents=None)[source]¶ A file locking mechanism that has context-manager support so you can use it in a
with
statement. This should be relatively cross compatible as it doesn’t rely onmsvcrt
orfcntl
for the locking.-
__init__
(protected_file_path, timeout=None, delay=1, lock_file_contents=None)[source]¶ Prepare the file locker. Specify the file to lock and optionally the maximum timeout and the delay between each attempt to lock.
-
acquire
(blocking=True)[source]¶ Acquire the lock, if possible. If the lock is in use, and blocking is False, return False. Otherwise, check again every self.delay seconds until it either gets the lock or exceeds timeout number of seconds, in which case it raises an exception.
-
JSON Config Parsing¶
Some lazyflow components rely on a special JSON config file format. The JsonConfigParser class handles parsing such files.
-
class
lazyflow.utility.jsonConfig.
JsonConfigParser
(fields)[source]¶ Parser for json config files that match a specific schema. Currently, only a very small set of json is supported. The schema fields must be a dictionary of name : type (or pseudo-type) pairs.
A schema dict is also allowed as a pseudo-type value, which permits nested schemas.
>>> # Specify schema as a dict >>> SchemaFields = { ... ... "_schema_name" : "example-schema", ... "_schema_version" : 1.0, ... ... "shoe size" : int, ... "color" : str ... } >>> >>> # Write a config file to disk for this example. >>> example_file_str = \ ... """ ... { ... "_schema_name" : "example-schema", ... "_schema_version" : 1.0, ... ... "shoe size" : 12, ... "color" : "red", ... "ignored_field" : "Fields that are unrecognized by the schema are ignored." ... } ... """ >>> with open('/tmp/example_config.json', 'w') as f: ... f.write(example_file_str) >>> >>> # Create a parser that understands your schema >>> parser = JsonConfigParser( SchemaFields ) >>> >>> # Parse the config file >>> parsedFields = parser.parseConfigFile('/tmp/example_config.json') >>> print parsedFields.color red >>> # Whitespace in field names is replaced with underscores in the Namespace member. >>> print parsedFields.shoe_size 12
Request Batching Utilities¶
These utilities provide convenient mechanisms for issuing a set of requests with controlled (specifically limited) parallelism and handling the result of each request in a serial callback.
BigRequestStreamer¶
Use the BigRequestStreamer
if you want to retrieve a single large chunk of
output data, but you want that big request broken up into many smaller blocks.
-
class
lazyflow.utility.bigRequestStreamer.
BigRequestStreamer
(outputSlot, roi, blockshape=None, batchSize=None, blockAlignment='absolute', allowParallelResults=False)[source]¶ Execute a big request by breaking it up into smaller requests.
This class encapsulates the logic for dividing big rois into smaller ones to be executed separately. It relies on a
RoiRequestBatch
object, which is responsible for creating and scheduling the request for each roi.Example:
>>> import sys >>> import vigra >>> from lazyflow.graph import Graph >>> from lazyflow.operators.operators import OpArrayCache
>>> # Example data >>> data = numpy.indices( (100,100) ).sum(0) >>> data = vigra.taggedView( data, vigra.defaultAxistags('xy') )
>>> op = OpArrayCache( graph=Graph() ) >>> op.Input.setValue( data )
>>> total_roi = [(25, 65), (45, 95)]
>>> # Init with our output slot and roi to request. >>> # batchSize indicates the number of requests to spawn in parallel. >>> streamer = BigRequestStreamer( op.Output, total_roi, (10,10), batchSize=2, blockAlignment='relative' )
>>> # Use a callback to handle sub-results one at a time. >>> result_count = [0] >>> result_total_sum = [0] >>> def handle_block_result(roi, result): ... # No need for locking here if allowParallelResults=True. ... result_count[0] += 1 ... result_total_sum[0] += result.sum() >>> streamer.resultSignal.subscribe( handle_block_result )
>>> # Optional: Subscribe to progress updates >>> def handle_progress(progress): ... if progress == 0: ... sys.stdout.write("Progress: ") ... sys.stdout.write( "{} ".format( progress ) ) >>> streamer.progressSignal.subscribe( handle_progress )
>>> # Execute the batch of requests, and block for the result. >>> streamer.execute() Progress: 0 16 33 50 66 83 100 100 >>> print "Processed {} result blocks with a total sum of: {}".format( result_count[0], result_total_sum[0] ) Processed 6 result blocks with a total sum of: 68400
-
__init__
(outputSlot, roi, blockshape=None, batchSize=None, blockAlignment='absolute', allowParallelResults=False)[source]¶ Constructor.
Parameters: - outputSlot – The slot to request data from.
- roi – The roi (start, stop) of interest. Will be broken up and requested via smaller requests.
- blockshape – The amount of data to request in each request. If omitted, a default blockshape is chosen by inspecting the metadata of the given slot.
- batchSize – The maximum number of requests to launch in parallel. This should not be necessary if the blockshape is small enough that you won’t run out of RAM.
- blockAlignment – Determines how block the requests. Choices are ‘absolute’ or ‘relative’.
- allowParallelResults – If False, The resultSignal will not be called in parallel. In that case, your handler function has no need for locks.
-
execute
()[source]¶ Request the data for the entire roi by breaking it up into many smaller requests, and wait for all of them to complete. A batch of N requests is launched, and subsequent requests are launched one-by-one as the earlier requests complete. Thus, there will be N requests executing in parallel at all times.
This method returns
None
. All results must be handled via theresultSignal
.
-
progressSignal
¶ Progress Signal Signature:
f(progress_percent)
-
resultSignal
¶ Results signal. Signature:
f(roi, result)
. Guaranteed not to be called from multiple threads in parallel.
-
RoiRequestBatch¶
If you have an image-like slot and a set of rois you’re interested in retrieving,
use RoiRequestBatch
to request the whole set with a custom level of parallelism.
-
class
lazyflow.utility.roiRequestBatch.
RoiRequestBatch
(outputSlot, roiIterator, totalVolume=None, batchSize=2, allowParallelResults=False)[source]¶ A simple utility for requesting a list of rois from an output slot. The number of rois requested in parallel is throttled by the batch size given to the constructor. The result of each requested roi is provided as a signal, which the user should subscribe() to.
Example usage:
>>> import sys >>> import vigra >>> from lazyflow.graph import Graph >>> from lazyflow.operators.operators import OpArrayCache
>>> # Example data >>> data = numpy.indices( (100,100) ).sum(0) >>> data = vigra.taggedView( data, vigra.defaultAxistags('xy') )
>>> op = OpArrayCache( graph=Graph() ) >>> op.Input.setValue( data )
>>> # Create a list of rois to iterate through. >>> # Typically you'll want to automate this >>> # with e.g. lazyflow.roi.getIntersectingBlocks >>> rois = [] >>> rois.append( ( (0, 0), (10,10) ) ) >>> rois.append( ( (0,10), (10,20) ) ) >>> rois.append( ( (0,20), (10,30) ) ) >>> rois.append( ( (0,30), (10,40) ) ) >>> rois.append( ( (0,40), (10,50) ) )
>>> # Init with our output slot and list of rois to request. >>> # `batchSize` indicates the number of requests to spawn in parallel. >>> # Provide `totalVolume` if you want progress reporting. >>> batch_requester = RoiRequestBatch( op.Output, iter(rois), totalVolume=500, batchSize=2 )
>>> # Use a callback to handle sub-results one at a time. >>> result_count = [0] >>> result_total_sum = [0] >>> def handle_block_result(roi, result): ... # No need for locking here if allowParallelResults=True. ... result_count[0] += 1 ... result_total_sum[0] += result.sum() >>> batch_requester.resultSignal.subscribe( handle_block_result )
>>> # Optional: Subscribe to progress updates >>> def handle_progress(progress): ... if progress == 0: ... sys.stdout.write("Progress: ") ... sys.stdout.write( "{} ".format( progress ) ) >>> batch_requester.progressSignal.subscribe( handle_progress )
>>> # Execute the batch of requests, and block for the result. >>> batch_requester.execute() Progress: 0 20 40 60 80 100 100 >>> print "Processed {} result blocks with a total sum of: {}".format( result_count[0], result_total_sum[0] ) Processed 5 result blocks with a total sum of: 14500
-
__init__
(outputSlot, roiIterator, totalVolume=None, batchSize=2, allowParallelResults=False)[source]¶ Constructor.
Parameters: - outputSlot – The slot to request data from.
- roiIterator – An iterator providing new rois.
- totalVolume – The total volume to be processed. Used to provide the progress reporting signal. If not provided, then no intermediate progress will be signaled.
- batchSize – The maximum number of requests to launch in parallel.
- allowParallelResults – If False, The resultSignal will not be called in parallel. In that case, your handler function has no need for locks.
-
execute
()[source]¶ Execute the batch of requests and wait for all of them to complete. A batch of N requests is launched, and subsequent requests are launched one-by-one as the earlier requests complete. Thus, there will be N requests executing in parallel at all times.
This method returns
None
. All results must be handled via theresultSignal
.
-
progressSignal
¶ Progress Signal Signature:
f(progress_percent)
-
resultSignal
¶ Results signal. Signature:
f(roi, result)
. Guaranteed not to be called from multiple threads in parallel.
-
IO Utilities¶
These utilities provide access to special data formats supported by lazyflow.
Blockwise Data Format¶
For big datasets, lazyflow supports a special input/output format that is based on blocks of data stored as hdf5 files in a special directory tree structure. The dataset is described by a special json file.
A small example explains the basics. Consider a dataset with axes x-y-z and shape 300x100x100. Suppose it is stored on disk in blocks of size 100x50x50. Let’s start by inspecting the dataset description file:
$ ls
data_description_params.json my_dataset_blocks
$
$ cat data_description_params.json
{
"_schema_name" : "blockwise-fileset-description",
"_schema_version" : 1.0,
"name" : "example_data",
"format" : "hdf5",
"axes" : "xyz",
"shape" : [300,100,100],
"dtype" : "numpy.uint8",
"block_shape" : [100, 50, 50],
"dataset_root_dir" : "./my_dataset_blocks",
"block_file_name_format" : "blockFile-{roiString}.h5/volume/data"
}
This listing shows how the directory tree is structured:
$ ls my_dataset_blocks/*/*/*/*.h5
my_dataset_blocks/x_00000000/y_00000000/z_00000000/blockFile-([0, 0, 0], [100, 50, 50]).h5
my_dataset_blocks/x_00000000/y_00000000/z_00000050/blockFile-([0, 0, 50], [100, 50, 100]).h5
my_dataset_blocks/x_00000000/y_00000050/z_00000000/blockFile-([0, 50, 0], [100, 100, 50]).h5
my_dataset_blocks/x_00000000/y_00000050/z_00000050/blockFile-([0, 50, 50], [100, 100, 100]).h5
my_dataset_blocks/x_00000100/y_00000000/z_00000000/blockFile-([100, 0, 0], [200, 50, 50]).h5
my_dataset_blocks/x_00000100/y_00000000/z_00000050/blockFile-([100, 0, 50], [200, 50, 100]).h5
my_dataset_blocks/x_00000100/y_00000050/z_00000000/blockFile-([100, 50, 0], [200, 100, 50]).h5
my_dataset_blocks/x_00000100/y_00000050/z_00000050/blockFile-([100, 50, 50], [200, 100, 100]).h5
my_dataset_blocks/x_00000200/y_00000000/z_00000000/blockFile-([200, 0, 0], [300, 50, 50]).h5
my_dataset_blocks/x_00000200/y_00000000/z_00000050/blockFile-([200, 0, 50], [300, 50, 100]).h5
my_dataset_blocks/x_00000200/y_00000050/z_00000000/blockFile-([200, 50, 0], [300, 100, 50]).h5
my_dataset_blocks/x_00000200/y_00000050/z_00000050/blockFile-([200, 50, 50], [300, 100, 100]).h5
But you shouldn’t really have to worry too much about how the data is stored.
The BlockwiseFileset
and RESTfulBlockwiseFileset
classes
provide a high-level API for reading and writing such datasets.
See the documentation of those classes for details.
-
class
lazyflow.utility.io_util.
BlockwiseFileset
(descriptionFilePath, mode='r', preparsedDescription=None)[source]¶ This class handles writing and reading a ‘blockwise file set’. A ‘blockwise file set’ is a directory with a particular structure, which contains the entire dataset broken up into blocks. Important parameters (e.g. shape, dtype, blockshape) are specified in a JSON file, which must match the schema given by
BlockwiseFileset.DescriptionFields
. The parent directory of the description file is considered to be the top-most directory in the blockwise dataset hierarchy.- Simultaneous reads are threadsafe.
- NOT threadsafe for reading and writing simultaneously (or writing and writing).
- NOT threadsafe for closing. Do not call close() while reading or writing.
Note
See the unit tests in
tests/testBlockwiseFileset.py
for example usage.-
__init__
(descriptionFilePath, mode='r', preparsedDescription=None)[source]¶ Constructor. Uses readDescription interally.
Parameters: - descriptionFilePath – The path to the .json file that describes the dataset.
- mode – Set to
'r'
if the fileset should be read-only. - preparsedDescription – (Optional) Provide pre-parsed description fields, in which case the provided description file will not be parsed.
-
exception
BlockNotReadyError
(block_start)[source]¶ This exception is raised if readData() is called for data that isn’t available on disk.
-
BlockwiseFileset.
DescriptionFields
= {'view_origin': <lazyflow.utility.jsonConfig.AutoEval object>, 'name': <type 'str'>, 'block_file_name_format': <lazyflow.utility.jsonConfig.FormattedField object>, 'format': <type 'str'>, 'dtype': <lazyflow.utility.jsonConfig.AutoEval object>, 'axes': <type 'str'>, 'drange': <lazyflow.utility.jsonConfig.AutoEval object>, 'dataset_root_dir': <type 'str'>, '_schema_version': 1.1, 'shape': <lazyflow.utility.jsonConfig.AutoEval object>, '_schema_name': 'blockwise-fileset-description', 'block_shape': <lazyflow.utility.jsonConfig.AutoEval object>, 'view_shape': <lazyflow.utility.jsonConfig.AutoEval object>, 'chunks': <lazyflow.utility.jsonConfig.AutoEval object>, 'hash_id': <type 'str'>, 'compression_opts': <lazyflow.utility.jsonConfig.AutoEval object>, 'sub_block_shape': <lazyflow.utility.jsonConfig.AutoEval object>, 'compression': <type 'str'>}¶ These fields describe the schema of the description file. See the source code comments for a description of each field.
-
BlockwiseFileset.
description
¶ The
jsonConfig.Namespace
object that describes this dataset.
-
BlockwiseFileset.
exportRoiToHdf5
(roi, exportDirectory, use_view_coordinates=True)[source]¶ Export an arbitrary roi to a single hdf5 file. The file will be placed in the given exportDirectory, and will be named according to the exported roi.
Parameters: - roi – The roi to export
- exportDirectory – The directory in which the result should be placed.
- use_view_coordinates – If True, assume the roi was given relative to the view start. Otherwise, assume it was given relative to the on-disk coordinates.
-
BlockwiseFileset.
exportSubset
(roi, exportDirectory, use_view_coordinates=True)[source]¶ Create a new blockwise fileset by copying a subset of this blockwise fileset.
Parameters: - roi – The portion to export. Must be along block boundaries, in ABSOLUTE coordinates.
- exportDirectory – The directory to copy the new blockwise fileset to.
-
BlockwiseFileset.
getAllBlockRois
()[source]¶ Return the list of rois for all VIEWED blocks in the dataset.
-
BlockwiseFileset.
getBlockStatus
(blockstart)[source]¶ Check a block’s status. (Just because a block file exists doesn’t mean that it has valid data.) Returns a status code of either
BlockwiseFileset.BLOCK_AVAILABLE
orBlockwiseFileset.BLOCK_NOT_AVAILABLE
.
-
BlockwiseFileset.
getDatasetDirectory
(blockstart)[source]¶ Return the directory that contains the block that starts at the given coordinates.
-
BlockwiseFileset.
getDatasetPathComponents
(block_start)[source]¶ Return a PathComponents object for the block file that corresponds to the given block start coordinate.
-
BlockwiseFileset.
getEntireBlockRoi
(block_start)[source]¶ Return the roi for the entire block that starts at the given coordinate.
-
BlockwiseFileset.
getOpenHdf5FileForBlock
(block_start)[source]¶ Returns a handle to a file in this dataset.
-
BlockwiseFileset.
isBlockLocked
(blockstart)[source]¶ Return True if the block is locked for writing. Note that both ‘available’ and ‘not available’ blocks might be locked.
-
BlockwiseFileset.
purgeAllLocks
()[source]¶ Clears all .lock files from the local blockwise fileset. This may be necessary if previous processes crashed or were killed while some blocks were downloading. You must ensure that this is NOT called while more than one process (or thread) has access to the fileset. For example, in a master/worker situation, call this only from the master, before the workers have been started.
-
BlockwiseFileset.
readData
(roi, out_array=None)[source]¶ Read data from the fileset.
Parameters: - roi – The region of interest to read from the dataset. Must be a tuple of iterables: (start, stop).
- out_array – The location to store the read data. Must be the correct size for the given roi. If not provided, an array is created for you.
Returns: The requested data. If out_array was provided, returns out_array.
-
classmethod
BlockwiseFileset.
readDescription
(descriptionFilePath)[source]¶ Parse the description file at the given path and return a
jsonConfig.Namespace
object with the description parameters. The file will be parsed according to the schema given byBlockwiseFileset.DescriptionFields
.Parameters: descriptionFilePath – The path to the description file to parse.
-
BlockwiseFileset.
setBlockStatus
(blockstart, status)[source]¶ Set a block status on disk. We use a simple convention: If the status file exists, the block is available. Otherwise, it ain’t.
Parameters: status – Must be either BlockwiseFileset.BLOCK_AVAILABLE
orBlockwiseFileset.BLOCK_NOT_AVAILABLE
.
Remote Volumes¶
-
class
lazyflow.utility.io_util.
RESTfulVolume
(descriptionFilePath=None, preparsedDescription=None)[source]¶ This class provides access to data obtained via a RESTful API (e.g. from http://openconnecto.me). A description of the remote volume must be provided via a JSON file, whose schema is specified by
RESTfulVolume.DescriptionFields
.See the unit tests in
tests/testRESTfulVolume.py
for example usage.Note
This class does not keep track of the data you’ve already downloaded. Every call to
downloadSubVolume()
results in a new download. For automatic blockwise local caching of remote datasets, seeRESTfulBlockwiseFileset
.Note
See the unit tests in
tests/testRESTfulVolume.py
for example usage.-
__init__
(descriptionFilePath=None, preparsedDescription=None)[source]¶ Constructor. Uses readDescription interally.
Parameters: - descriptionFilePath – The path to the .json file that describes the remote volume.
- preparsedDescription – (Optional) Provide pre-parsed description fields, in which case the provided description file will not be parsed.
-
DescriptionFields
= {'format': <type 'str'>, 'dtype': <lazyflow.utility.jsonConfig.AutoEval object>, 'hdf5_dataset': <type 'str'>, '_schema_version': 1.0, 'shape': <lazyflow.utility.jsonConfig.AutoEval object>, 'name': <type 'str'>, 'axes': <type 'str'>, 'bounds': <lazyflow.utility.jsonConfig.AutoEval object>, '_schema_name': 'RESTful-volume-description', 'origin_offset': <lazyflow.utility.jsonConfig.AutoEval object>, 'url_format': <lazyflow.utility.jsonConfig.FormattedField object>}¶ These fields describe the schema of the description file. See the source code comments for a description of each field.
-
downloadSubVolume
(roi, outputDatasetPath)[source]¶ Download a cutout volume from the remote dataset.
Parameters: - roi – The subset of the volume to download, specified as a tuple of coordinates:
(start, stop)
- outputDatasetPath – The path to overwrite with the downloaded hdf5 file.
- roi – The subset of the volume to download, specified as a tuple of coordinates:
-
classmethod
readDescription
(descriptionFilePath)[source]¶ Parse the description file at the given path and return a
jsonConfig.Namespace
object with the description parameters. The file will be parsed according to the schema given byRESTfulVolume.DescriptionFields
. Any optional parameters not provided by the user are filled in automatically.Parameters: descriptionFilePath – The path to the description file to parse.
-
-
class
lazyflow.utility.io_util.
RESTfulBlockwiseFileset
(compositeDescriptionPath)[source]¶ This class combines the functionality of
RESTfulVolume
andBlockwiseFileset
to provide access to a remote dataset (e.g. from http://openconnecto.me), with all downloaded data cached locally as blocks stored in a directory tree of hdf5 files.This class must be constructed with a description of both the remote dataset and the local storage format, provided in a JSON file with a composite schema specified by
RESTfulBlockwiseFileset.DescriptionFields
.Note
See the unit tests in
tests/testRESTfulBlockwiseFileset.py
for example usage.Here’s an example description file.
{ "_schema_name" : "RESTful-blockwise-fileset-description", "_schema_version" : 1.0, "remote_description" : { "_schema_name" : "RESTful-volume-description", "_schema_version" : 1.0, "name" : "Bock11-level0", "format" : "hdf5", "axes" : "zyx", "## NOTE": "The origin offset determines how coordinates are translated when converted to a url.", "## NOTE": "The origin_offset for the bock11 dataset must be at least 2917, because for some reason that's where it starts.", "origin_offset" : [2917, 0, 0], "## NOTE": "The website says that the data goes up to plane 4156, but it actually errors out past 4150", "bounds" : [4150, 135424, 119808], "dtype" : "numpy.uint8", "url_format" : "http://openconnecto.me/emca/bock11/hdf5/0/{x_start},{x_stop}/{y_start},{y_stop}/{z_start},{z_stop}/", "hdf5_dataset" : "cube" }, "local_description" : { "_schema_name" : "blockwise-fileset-description", "_schema_version" : 1.0, "name" : "bock11-blocks", "format" : "hdf5", "axes" : "zyx", "shape" : "[ 4150-2917, 135424, 119808 ]", "dtype" : "numpy.uint8", "block_shape" : [32, 256, 256], "block_file_name_format" : "block-{roiString}.h5/cube", "dataset_root_dir" : "blocks-256x256x32", "## NOTE":"These optional parameters tell ilastik to view only a portion of the on-disk dataset.", "## NOTE":"view_origin MUST be aligned to a block start corner.", "## NOTE":"view_shape is optional, but recommended because volumina slows down when there are 1000s of tiles.", "view_origin" : "[0, 50*1024, 50*1024]", "view_shape" : "[4150-2917, 10*256, 10*256]" } }
-
__init__
(compositeDescriptionPath)[source]¶ Constructor. Uses readDescription interally.
Parameters: compositeDescriptionPath – The path to a JSON file that describes both the remote volume and local storage structure. The JSON file schema is specified by RESTfulBlockwiseFileset.DescriptionFields
.
-
DescriptionFields
= {'_schema_version': 1.0, 'local_description': <lazyflow.utility.jsonConfig.JsonConfigParser object>, '_schema_name': 'RESTful-blockwise-fileset-description', 'remote_description': <lazyflow.utility.jsonConfig.JsonConfigParser object>}¶ This member specifies the schema of the description file. It is merely a composite of two nested schemas: one that describes the remote volume, and another that describes the local storage format. See the source code to see the field names.
-
downloadAllBlocks
(max_parallel, skip_preparation=False)[source]¶ Download all blocks in the local view. This is used in utility scripts for downloading an entire volume at once. This function is NOT intended to be used by multiple threads in parallel (i.e. it doesn’t protect against downloading the same block twice.)
-
readData
(roi, out_array=None)[source]¶ Read data from the fileset. If any of the requested data is not yet available locally, download it first.
Parameters: - roi – The region of interest to read from the dataset. Must be a tuple of iterables: (start, stop).
- out_array – The location to store the read data. Must be the correct size for the given roi. If not provided, an array is created for you.
Returns: The requested data. If out_array was provided, returns out_array.
-
classmethod
readDescription
(descriptionFilePath)[source]¶ Parse the description file at the given path and return a
jsonConfig.Namespace
object with the description parameters. The file will be parsed according to the schema given byRESTfulBlockwiseFileset.DescriptionFields
. Any optional parameters not provided by the user are filled in automatically.Parameters: descriptionFilePath – The path to the description file to parse.
-