Skip to content

Latest commit

 

History

History
173 lines (115 loc) · 7.58 KB

README.md

File metadata and controls

173 lines (115 loc) · 7.58 KB

Spending Tracker

Using Optical Character Recognition technology to track and store user spending and present it in scatter plot form.

Statement

This project is a revision of our original project in which we sought to model the correlation between hydro electric power generation and rain fall in the Pacific Northwest. We have decided to make our project more useful and interesting by combining several technologies that will allow a user to track thier own spending habits. Our solution uses optical character recognition to extract text from receipt images. The text will be parsed and data stored into MongoDB collection. Lastly the data will be plotted based on a filter query i.e. for last 7 days or last month of spending.

Analysis

Approaches from classes that will be used in the project will be using map to create a list of numbers from a list of strings, state-modification, closure, and recursion.

Data set or other source materials

After competing in the UMass Lowell Hawkathon, I (Cullin) was introduced to a the HAVEN ON DEMAND API which provides many meachine learning api's for use for free with an API Key. For this project I thought it would be interesting to utilize their Optical Character Recognition Api which extracts text from images.

###Example Request

In order to use API we send an HTTP request to the correct URL with our parameters. Our sample image image

"curl -X POST --form \"[email protected]\" --form \"apikey=82833b89-515e-4727-97ff-d8af21d53be3\" https://api.havenondemand.com/1/api/sync/ocrdocument/v1"

###Example Response

{
  "text_block": [
    {
      "text": "1- / • ' Q\n/ ! ; ! ; - -\n7 *\n!\n$\n,\nN\nX X\n;/\n, -;t\n! .\nA\nÉ ' . V tx: ; "4 ( X M. Craig Parker\nEN N, Installation Services Man£*8€1'\ngi;' X ,N>\nl gael 908 Boston Turnpike\nUnit 1\nShrewsbury, MA 01545 # . *\n{\nCell 508-797-7623\nOffice 774-275-2189\nFax 608-845-6076 N\nToll Free 877-903-3768\n[email protected]\n! 0\n1\n1 6/\n!\nl\n£\n""Nr\n*> ; "\nw *\n**8 4 $ • ; XM X r\n!\n' ! , #\n* %\nl" l ! ; , '\n* •\n; . . ! A (\n• • 4 • it'\n@• • 0\nI /",
      "left": 0,
      "top": 0,
      "width": 1080,
      "height": 720
    }
  ]
}

With this data we plan on parsing the response string using JSON and regular expressions to extract the final dollar value amount from the image to be used as the total amount of money spent on that purchase. This data will be stored into our MongoDB database with the following schema:

{
  date_created: number(ms since unix epoch)
  total: number(total dollar value spent)
}

Deliverable and Demonstration

The program will take as input an image file of the purchase receipt. It will then insert the transaction into the database. The program will also be able to plot graphs using the available data in the database with the x axis as date time and the y axis as the total amount spent on that day.

Evaluation of Result How will you know if you are successful?

This project will be successful if the program is able to successfully extract the dollar amount from an image and insert it into the database. In addition the program should be able to produce a plot using the most recent data points in the database.

Architecture Diagram

ScreenShot

Status Milestone 1 (Fri Apr 15)

We have accomplished more than what we set out to do for Milestone 1. In addition to configuring an HTTP request to the Haven On Demand OCR API to send binary image files and creating a function to parse dollar values from the response we also were able to setup a mongoDB server and create a purchases collection. Using the mongo-db racket library we were able to create functions that insert and query records in the database and filters them based on date. We have thus finished quite a bit and are looking to implement additional features for Milestone 2 and get the plot working.

Status Milestone 2 (Fri Apr 22)

Currently as our project stands we were able to cleanly join our files into a single main.rkt file in which we plan to run our demo. We created 2 functions, ocr-insert which performs optical character recognition of the image and inserts the total dollar value into the database and graph-week which graphs the last 7 days of spending in a histogram. Unfortunately we were unable to reach our goal of plotting a scatterplot for a month's worth of purchases. We hope to accomplish this by our demo date.

Test 1

grocery

###Output

graph1

Test 2

parking

###Output

graph2

As you can see graph 2 increased in y by 2 dollars.

Final Status (Wed Apr 27)

We were able to make scatter plot for a month's worth of data. Unlike the histogram, using date time as the x axis plots points continuously and thus we do not aggregate cost sums per day instead we plotted each point independently.

month

For our presentation we will be preparing sample images and attempt to also do a live ocr img insert.

Schedule

First Milestone (Fri Apr 15)

  • Configure HTTP request to send a binary image file for OCR analysis.
  • Create function to extract dollar value from JSON response

Second Milestone (Fri Apr 22)

  • Make Month Spending Plot
  • Make Week Spending Plot
  • Possibly make Day Spending Plot

Final Presentation (last week of semester)

  • Make Month Spending Plot
  • Add labels and titles to Graphs

Group Responsibilities

Cullin Lam -

  • Configure HTTP request to take binary file.
  • Set Up MongoDB server
  • Create insert/ query function

John Kuczynski -

  • Create function to parse JSON response
  • Generate plots

Favorite Racket expressions

#Cullin Lam

(define week-ms 604800000)
(define (query-purchases)
(let ((res (mongo-collection-find purchases (make-hasheq '()))))
  (for/list ([e res]) (cons (hash-ref e 'date_created) (hash-ref e 'value)))))
(define (find-week records)
(filter (lambda (record)
          (if(<= (- (current-milliseconds) (car record)) week-ms)
             #t
             #f)) records))

This code snippet shows the power of filter as we can easily write a racket expression to filter our result set as opposed to working with an ill documented API.

How to Download and Run

This git repo can be cloned by using:

git clone https://github.com/oplS16projects/SpendingTrackerRacket.git

To run open main.rkt file with Dr. Racket to access the REPL.

To perform OCR and insert the result of an image of a receipt call :

(ocr-insert img-file-path) 

To plot week spending call:

(graph-week) 

To plot month spending call:

(graph-month)