receipt module

A program to make sense of pentaplex’s outputs for receipt analysis

@author: phdenzel

class receipt.Receipt(file_id, total=None, market=None, date=None, time=None, auto=False)

Bases: object

Class that encompasses pentaplex’s receipt analysis

check_img_id()

Check if file_id is found in any pictures of imgs/

Args/Kwargs:
None
Return:
image; str; name string of the original image in imgs/ matching id
check_ocr_id()

Check if file_id is found in prp/

Args/Kwargs:
None
Return:
prepd, text: str, str; name string of preprocessed and ocr txt files
check_scanner_id()

Check if file_id is found in prp/ or tmp/

Args/Kwargs:
None
Return:
dst: str; name string of the scanned image in prp/ or tmp/ matching
file_id
clean_ocr(data)

Clean the output of the OCR

Args/Kwargs:
None
Return:
text; list(str); cleaned text of newline characters and stuff
classmethod empty()

Constructor for an empty receipt instance

Return:
instance: Receipt
file_id

Property file_id specifying a receipt

find_file(filetype)

Find a file of given type (if not found preprocessing scripts are executed automatically, thus all the checks beforehand)

Args:
filetype: str; either ‘original’, ‘scan’, ‘preprocessed’, ‘config’,
or ‘txt’
Kwargs:
None
Return:
f: str; path to specific file

Fuzzy search OCR output for a keyword and its possible value

Args:
keyword: str; a keywords after which is fuzzy searched
Kwargs:
accuracy: float; accuracy parameter for the fuzzy search algorithm
Return:
line: list(str); the line of the closest fuzzy search match
imgd = '.../pentaplex/imgs/'
static load_configs(config_path)

Load a yaml config file and return a objectified dictionary

Args:
config_path: str; path string to the yaml config file
Kwargs:
None
Return:
config: objectify instance; the read configurations
parse_date(date)

Parse for the date on the receipt

Args:
date: str; argument to overwrite results
Kwargs:
None
Return:
date: str; matched date on the receipt
parse_market(market)

Parse for the market the receipt is from

Args:
market: str; argument to overwrite results
Kwargs:
None
Return:
market: str; matched market
parse_time(time)

Parse for the time on the receipt

Args:
time: str; argument to overwrite results
Kwargs:
None
Return:
time: str; matched time on the receipt
parse_total(total)

Parse for the total on the receipt

Args:
total: str; argument to overwrite results
Kwargs:
None
Return:
total: str; matched total on the receipt
print_properties()

Send properties to stdout

Args/Kwargs/Return:
None
print_text()

Send properties to stdout

Args/Kwargs/Return:
None
prpd = '.../pentaplex/prp/'
read_files(files)

Read all files associated to the receipt

Args/Kwargs:
None
Return:
data: dict; analogue keys to files
root = '.../pentaplex/'
run_ocr()

Run the ocr.sh script

Args/Kwargs/Return:
None
run_scanner()

Run the scanner.py script

Args/Kwargs/Return:
None
tmpd = '.../pentaplex/tmp/'
txtd = '.../pentaplex/txt/'
receipt.imread(filename[, flags]) → retval