receipt module¶
A program to make sense of pentaplex’s outputs for receipt analysis
@author: phdenzel
-
class
receipt.
Receipt
(file_id, total=None, market=None, date=None, time=None, auto=False)¶ Bases:
object
Class that encompasses pentaplex’s receipt analysis
-
check_img_id
()¶ Check if file_id is found in any pictures of imgs/
- Args/Kwargs:
- None
- Return:
- image; str; name string of the original image in imgs/ matching id
-
check_ocr_id
()¶ Check if file_id is found in prp/
- Args/Kwargs:
- None
- Return:
- prepd, text: str, str; name string of preprocessed and ocr txt files
-
check_scanner_id
()¶ Check if file_id is found in prp/ or tmp/
- Args/Kwargs:
- None
- Return:
- dst: str; name string of the scanned image in prp/ or tmp/ matching
- file_id
-
clean_ocr
(data)¶ Clean the output of the OCR
- Args/Kwargs:
- None
- Return:
- text; list(str); cleaned text of newline characters and stuff
-
classmethod
empty
()¶ Constructor for an empty receipt instance
- Return:
- instance: Receipt
-
file_id
¶ Property file_id specifying a receipt
-
find_file
(filetype)¶ Find a file of given type (if not found preprocessing scripts are executed automatically, thus all the checks beforehand)
- Args:
- filetype: str; either ‘original’, ‘scan’, ‘preprocessed’, ‘config’,
- or ‘txt’
- Kwargs:
- None
- Return:
- f: str; path to specific file
-
fuzzy_search
(keyword, accuracy=0.6)¶ Fuzzy search OCR output for a keyword and its possible value
- Args:
- keyword: str; a keywords after which is fuzzy searched
- Kwargs:
- accuracy: float; accuracy parameter for the fuzzy search algorithm
- Return:
- line: list(str); the line of the closest fuzzy search match
-
imgd
= '.../pentaplex/imgs/'¶
-
static
load_configs
(config_path)¶ Load a yaml config file and return a objectified dictionary
- Args:
- config_path: str; path string to the yaml config file
- Kwargs:
- None
- Return:
- config: objectify instance; the read configurations
-
parse_date
(date)¶ Parse for the date on the receipt
- Args:
- date: str; argument to overwrite results
- Kwargs:
- None
- Return:
- date: str; matched date on the receipt
-
parse_market
(market)¶ Parse for the market the receipt is from
- Args:
- market: str; argument to overwrite results
- Kwargs:
- None
- Return:
- market: str; matched market
-
parse_time
(time)¶ Parse for the time on the receipt
- Args:
- time: str; argument to overwrite results
- Kwargs:
- None
- Return:
- time: str; matched time on the receipt
-
parse_total
(total)¶ Parse for the total on the receipt
- Args:
- total: str; argument to overwrite results
- Kwargs:
- None
- Return:
- total: str; matched total on the receipt
-
print_properties
()¶ Send properties to stdout
- Args/Kwargs/Return:
- None
-
print_text
()¶ Send properties to stdout
- Args/Kwargs/Return:
- None
-
prpd
= '.../pentaplex/prp/'¶
-
read_files
(files)¶ Read all files associated to the receipt
- Args/Kwargs:
- None
- Return:
- data: dict; analogue keys to files
-
root
= '.../pentaplex/'¶
-
run_ocr
()¶ Run the ocr.sh script
- Args/Kwargs/Return:
- None
-
run_scanner
()¶ Run the scanner.py script
- Args/Kwargs/Return:
- None
-
tmpd
= '.../pentaplex/tmp/'¶
-
txtd
= '.../pentaplex/txt/'¶
-
-
receipt.
imread
(filename[, flags]) → retval¶