skais_mapper.data
Image/map data readers and (HDF5) writers.
Classes:
Name | Description |
---|---|
Img2H5Buffer |
Parse images (incrementally or all at once) and write to HDF5 files. |
ImgRead |
Flexible image reader for multiple formats. |
Img2H5Buffer
Img2H5Buffer(
path: str | Path = None,
target: str | Path = None,
data: np.ndarray | dict = None,
size: int | float | str = "1G",
)
Parse images (incrementally or all at once) and write to HDF5 files.
The directory structure of a dataset should be as follows
: /path/to/dataset/root : image class as a subdirectory in the dataset - image file
: {npy | jpg | png | etc.}
E.g. file paths of the following structure: /path/to/dataset/root//image_class//423120.npy HDF5 files end up being: /image_class/dataset
Note: by default the entire dataset is loaded into cache
Constructor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str | Path
|
Path to a data directory where the source files are located. |
None
|
target
|
str | Path
|
Filename of the HDF5 file to be written. |
None
|
data
|
np.ndarray | dict
|
Alternative input format to |
None
|
size
|
int | float | str
|
Buffer cache size in bytes or passed as string. |
'1G'
|
Methods:
Name | Description |
---|---|
configure_rdcc |
Automatically configure HDF5 data chunking for optimal writing. |
flush |
Send all data pages from the buffer queue. |
glob_path |
Glob path recursively for files. |
inc_write |
Incrementally (append mode) write the buffer to HDF5 file. |
send |
Grab first data page from the buffer queue. |
store |
Insert data into the buffer queue. |
write |
Write all files in buffer to a new HDF5 file. |
Attributes:
Name | Type | Description |
---|---|---|
n_files |
int
|
Number of files to be parsed. |
nbytes |
list[nbytes]
|
List of the number of bytes for each buffer file. |
page |
np.ndarray | dict | None
|
Buffer page ready to be written to file. |
total_nbytes |
list[nbytes]
|
Total number of bytes for buffer. |
Source code in skais_mapper/data.py
184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 |
|
n_files
property
n_files: int
Number of files to be parsed.
nbytes
property
nbytes: list[nbytes]
List of the number of bytes for each buffer file.
page
property
page: np.ndarray | dict | None
Buffer page ready to be written to file.
total_nbytes
property
total_nbytes: list[nbytes]
Total number of bytes for buffer.
configure_rdcc
configure_rdcc(
cache_size: int | float | str | None = None,
f: int = 10,
verbose: bool = False,
**kwargs,
) -> dict
Automatically configure HDF5 data chunking for optimal writing.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cache_size
|
int | float | str | None
|
Cache of the entire buffer. |
None
|
f
|
int
|
Factor with which to increase the number of slots. |
10
|
verbose
|
bool
|
Print additional information to stdout. |
False
|
**kwargs
|
Additional keyword arguments such as
- |
{}
|
Returns:
Type | Description |
---|---|
dict
|
|
dict
|
|
dict
|
|
Source code in skais_mapper/data.py
295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 |
|
flush
flush() -> (
np.ndarray | dict | list[np.ndarray | dict] | None
)
Send all data pages from the buffer queue.
Source code in skais_mapper/data.py
463 464 465 466 467 468 469 470 471 472 473 474 |
|
glob_path
staticmethod
glob_path(
path: str | Path | list[str] | list[Path],
extensions: str | list[str] = None,
) -> list[Path]
Glob path recursively for files.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str | Path | list[str] | list[Path]
|
Filename, path or list, can contain wildcards |
required |
extensions
|
str | list[str]
|
File extension to look fo |
None
|
Source code in skais_mapper/data.py
236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 |
|
inc_write
inc_write(
path: str | Path | None = None,
group: str = "images",
data: np.ndarray | dict | None = None,
expand_dim: bool = True,
axis: int = 0,
overwrite: bool | int | None = None,
verbose: bool = False,
**kwargs,
)
Incrementally (append mode) write the buffer to HDF5 file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str | Path | None
|
Filename of the HDF5 file and optionally the path of the HDF5 group where the dataset is saved. |
None
|
group
|
str
|
HDF5 group where to save the dataset. If it does not exist, it is created. |
'images'
|
data
|
np.ndarray | dict | None
|
Data to be written to the hdf5 file. If None, all files in the buffer are written to HDF5 file. |
None
|
expand_dim
|
bool
|
Expand dimension of data array for stacking. |
True
|
axis
|
int
|
Axis of the n-dimensional array where to append |
0
|
overwrite
|
bool | int | None
|
If data should overwrite indices in a pre-existing HDF5 dataset, set to the index. |
None
|
verbose
|
bool
|
Print additional information to stdout. |
False
|
kwargs
|
Additional keyword arguments for |
{}
|
Source code in skais_mapper/data.py
476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 |
|
send
send(clear: bool = True) -> np.ndarray | dict | None
Grab first data page from the buffer queue.
Source code in skais_mapper/data.py
449 450 451 452 453 454 455 456 457 458 459 460 461 |
|
store
store(
data: np.ndarray | dict | str | Path | list[str | Path],
squash: bool = True,
) -> Img2H5Buffer
Insert data into the buffer queue.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
np.ndarray | dict | str | Path | list[str | Path]
|
Data to be stored in buffer. |
required |
squash
|
bool
|
Squash data dimensions if buffer data is compatible. |
True
|
Source code in skais_mapper/data.py
422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 |
|
write
write(
path: str | Path | None = None,
group: str = "images",
data: np.ndarray | dict | None = None,
verbose: bool = False,
**kwargs,
)
Write all files in buffer to a new HDF5 file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path
|
str | Path | None
|
Filename of the HDF5 file and optionally the path of the HDF5
group where the dataset is saved separated by a colon,
e.g. |
None
|
group
|
str
|
HDF5 group where to save the dataset. If it does not exist, it is created. |
'images'
|
data
|
np.ndarray | dict | None
|
Data to be written to the HDF5 file. |
None
|
verbose
|
bool
|
Print additional information to stdout. |
False
|
kwargs
|
Additional keyword arguments for |
{}
|
Source code in skais_mapper/data.py
589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 |
|
ImgRead
Flexible image reader for multiple formats.
Methods:
Name | Description |
---|---|
__call__ |
Automatically determine file type and read data appropriately. |
__call__
__call__(
paths: str | Path | list[str | Path] | None = None,
squash: bool = True,
pad_val: int | float = 0,
**kwargs,
) -> np.ndarray
Automatically determine file type and read data appropriately.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
paths
|
str | Path | list[str | Path] | None
|
File path to the image to be read. |
None
|
squash
|
bool
|
If multiple paths are passed, merge and squash arrays. |
True
|
pad_val
|
int | float
|
Padding value to be used for shape expansion if multiple paths are passed and images have different shape (default: 0). |
0
|
**kwargs
|
Additional keyword arguments for parser functions:
|
{}
|
Returns:
Type | Description |
---|---|
np.ndarray
|
Numpy ndarray of the image data. |
Source code in skais_mapper/data.py
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
|