receipt module¶
A program to make sense of pentaplex’s outputs for receipt analysis
@author: phdenzel
-
class
receipt.Receipt(file_id, total=None, market=None, date=None, time=None, auto=False)¶ Bases:
objectClass that encompasses pentaplex’s receipt analysis
-
check_img_id()¶ Check if file_id is found in any pictures of imgs/
- Args/Kwargs:
- None
- Return:
- image; str; name string of the original image in imgs/ matching id
-
check_ocr_id()¶ Check if file_id is found in prp/
- Args/Kwargs:
- None
- Return:
- prepd, text: str, str; name string of preprocessed and ocr txt files
-
check_scanner_id()¶ Check if file_id is found in prp/ or tmp/
- Args/Kwargs:
- None
- Return:
- dst: str; name string of the scanned image in prp/ or tmp/ matching
- file_id
-
clean_ocr(data)¶ Clean the output of the OCR
- Args/Kwargs:
- None
- Return:
- text; list(str); cleaned text of newline characters and stuff
-
classmethod
empty()¶ Constructor for an empty receipt instance
- Return:
- instance: Receipt
-
file_id¶ Property file_id specifying a receipt
-
find_file(filetype)¶ Find a file of given type (if not found preprocessing scripts are executed automatically, thus all the checks beforehand)
- Args:
- filetype: str; either ‘original’, ‘scan’, ‘preprocessed’, ‘config’,
- or ‘txt’
- Kwargs:
- None
- Return:
- f: str; path to specific file
-
fuzzy_search(keyword, accuracy=0.6)¶ Fuzzy search OCR output for a keyword and its possible value
- Args:
- keyword: str; a keywords after which is fuzzy searched
- Kwargs:
- accuracy: float; accuracy parameter for the fuzzy search algorithm
- Return:
- line: list(str); the line of the closest fuzzy search match
-
imgd= '.../pentaplex/imgs/'¶
-
static
load_configs(config_path)¶ Load a yaml config file and return a objectified dictionary
- Args:
- config_path: str; path string to the yaml config file
- Kwargs:
- None
- Return:
- config: objectify instance; the read configurations
-
parse_date(date)¶ Parse for the date on the receipt
- Args:
- date: str; argument to overwrite results
- Kwargs:
- None
- Return:
- date: str; matched date on the receipt
-
parse_market(market)¶ Parse for the market the receipt is from
- Args:
- market: str; argument to overwrite results
- Kwargs:
- None
- Return:
- market: str; matched market
-
parse_time(time)¶ Parse for the time on the receipt
- Args:
- time: str; argument to overwrite results
- Kwargs:
- None
- Return:
- time: str; matched time on the receipt
-
parse_total(total)¶ Parse for the total on the receipt
- Args:
- total: str; argument to overwrite results
- Kwargs:
- None
- Return:
- total: str; matched total on the receipt
-
print_properties()¶ Send properties to stdout
- Args/Kwargs/Return:
- None
-
print_text()¶ Send properties to stdout
- Args/Kwargs/Return:
- None
-
prpd= '.../pentaplex/prp/'¶
-
read_files(files)¶ Read all files associated to the receipt
- Args/Kwargs:
- None
- Return:
- data: dict; analogue keys to files
-
root= '.../pentaplex/'¶
-
run_ocr()¶ Run the ocr.sh script
- Args/Kwargs/Return:
- None
-
run_scanner()¶ Run the scanner.py script
- Args/Kwargs/Return:
- None
-
tmpd= '.../pentaplex/tmp/'¶
-
txtd= '.../pentaplex/txt/'¶
-
-
receipt.imread(filename[, flags]) → retval¶