• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Remember Lenny

Writing online

  • Portfolio
  • Email
  • Twitter
  • LinkedIn
  • Github
You are here: Home / Archives for Python

Python

How I built a REST endpoint based Computer Vision task using Flask

December 31, 2017 by rememberlenny


This is a follow up on my process of developing familiarity with computer vision and machine learning techniques. As a web developer (read as ā€œrails developerā€), I found this growing sphere exciting, but don’t work with these technologies on a day-to-day. This is month three of a two year journey to explore this field. If you haven’t read already, you can see Part 1 here: From webdev to computer vision and geo and Part 2 here: Two months exploring deep learning and computer vision.

Overall Thoughts

Rails developers are good at quickly building out web applications with very little effort. Between scaffolds, clear model-view-controller logic, and the plethora of ruby gems at your disposal, Rails applications with complex logic can be spun up in a short amount of time. For example, I wouldn’t blink at building something that requires user accounts, file uploads, and various feeds of data. I could even make it highly testable with great documentation. Between Devise, Carrierwave (or the many other file upload gems), Sidekiq, and all the other accessible gems, I would be up and running on Heroku within 15 minutes.

Now, add a computer vision or machine learning task and I would have no idea where to go. Even as I explore this space, I still struggle to find practical applications for machine learning concepts (neural nets and deep learning) aside from word association or image analysis. That being said, the interesting ideas (which I have yet to find practical applications for) are around trend detection and generative adversarial networks.

Google search for ā€œhow to train a neuralĀ networkā€

As a software engineer, I have found it hard to understand the practical values of machine learning in the applications I build. There is a lot of writing around models (in the machine learning sense, rather than the web application/database sense), neural net architecture, and research, but I haven’t seen as much around the practical applications for a web developer like myself. As a result, I decided to build out a small part of a project I’ve been thinking about for a while.

The project was meant to detect good graffiti on Instagram. The original idea was to use machine learning to qualify what ā€œgood graffitiā€ looked like, and then run the machine learning model to detect and collect images. Conceptually, the idea sounds great, but I have no idea how to ā€œtrain a machine learning modelā€, and I have very little sense of where to start.

I started building out a simple part of the project with the understanding that I would need to ā€œtrainā€ my ā€œmodelā€ on good graffiti. I picked a few Instagram accounts of good graffiti artists, where I knew I could find high quality images. After crawling the Instagram accounts (which took much longer than expected due to Instagram’s API restrictions) and analyzing the pictures, I realized a big problem at hand. The selected accounts were great, but had many non-graffiti images, mainly of people. To get the ā€œgood graffitiā€ images, I was first going to need to filter out the images of people.

The application I built to crawl Instagram created a frontend that displayed graffiti.

By reviewing the pictures, I found that as many as four out of every ten images was of a person or had a person in it. As a result, before even starting the task of ā€œtrainingā€ a ā€œgood graffitiā€ ā€œmodelā€, I needed to just get a set of pictures that didn’t contain any people.

(Side note for non-machine learning people: I’m using quotations around certain words because you and I probably have an equal understanding of what those words actually mean.)

Rather than having a complicated machine learning application that did some complicated neural network-deep learning-artificial intelligence-stochastic gradient descent-linear regression-bayesian machine learning magic, I decided to simplify the project into building something that detected humans in a picture and flagged them. I realized that many examples of machine learning tutorials I had read before showed me how to do this, so it was a matter of making those tutorials actually useful.

—

The application (with links toĀ code)

I was using Ruby on Rails for the web applications that managed the database and rendered content. I did most of the image crawling of Instagram using Ruby, via a Redis library called Sidekiq. This makes running delayed tasks easy.

The PyImageSearch article used as reference is great and can be found at https://www.pyimagesearch.com/2017/09/11/object-detection-with-deep-learning-and-opencv/

For the machine learning logic, I had a code example for object detection, using OpenCV, from a PyImageSearch.com tutorial. The code example was not complete, in that it detected one of 30 different items in the trained image model, one of them being people, and drew a box around the detected object. In my case, I slightly modified the example and placed it inside a simple web application based on Flask.

Link to Github: The main magic of theĀ app

I made a Flask application with an endpoint that accepted a JSON blob with an image URL. The application downloaded the image URL and processed it through the code example that drew a bounding box around the detected object. I only cared about the code example detecting people, so I created a basic condition to give a certain response for detecting a person and a generic response for everything else.

This simple endpoint was the machine learning magic at work. Sadly, it was also the first time I’d seen a practical, usable example of how the complicated machine learning ā€œstuffā€ integrates with the rest of a web application.

For those who are interested, the code for these are below.

https://github.com/rememberlenny/Flask-Person-Detector

—

Concluding Realizations

I was surprised that I hadn’t seen a simple Flask based implementation of a deep neural network before. I also feel like based on this implementation, when training a model isn’t involved, applying machine learning into any application is just like having a library with a useful function. I’m assuming that in the future, the separation of the model and the libraries for utilizing the models will be simplified, similar to how a library is ā€œimportedā€ or added using a bundler. My guess is some of these tools exist, but I am not deep enough yet to know about them.

https://www.tensorflow.org/serving/

Through reviewing how to access the object detection logic, I found a few services that seemed relevant, but eventually were not quite what I needed. Specifically, there is a tool called Tensorflow Serving, which seems like it should be a simple web server for Tensorflow, but isn’t quite simple enough. It possibly is what I need, but the idea of having a server or web application that solely runs Tensorflow is quite difficult to setup.

Web service based machineĀ learning

A lot of the machine learning examples that I find online are very self-encompassed examples. The examples start with the problem, then provide the code to run the example locally. Often the image is an input provided by file path via command line interface, and the output is a python generated window that displays a manipulated image. This isn’t very useful as a web application, so making a REST endpoint seems like a basic next step.

Building the machine learning logic into a REST endpoint is not hard, but there are some things to consider. In my case, the server was running on a desktop computer with enough CPU and memory to process requests quickly. This might not always be the case, so a future endpoint might need to run tasks asynchronously using something like Redis. A HTTP request here would most likely hang and possibly timeout, so some basic micro-service logic would need to be considered for slow queries.

Binary expectations and machine learningĀ brands

A big problem with the final application was that processed graffiti images were sometimes falsely flagged as people. When the painting contained features that looked like a person, such as a face or body, the object classifier was falsely flagging the paintings. Oppositely, there were times when pictures of people were not properly flagging the images as containing people.

[GRAFFITI ONLY] List of images that were noted to not have people. Note the images with the backs ofĀ people.

Web applications require binary conclusions to take action. A image classifier will provide a percentage rating regarding whether or not the object detected is present. In larger object detection models, the classifier will have more than one object being recommended as being potentially detected. For example, there is a 90% chance of a person being in the photo, a 76% chance of a airplane, and a 43% chance of a giant banana. This isn’t very useful when the application processing the responses just needs to know whether or not something is present.

[PEOPLE ONLY] List of images that were classified as people. Note the last one is a giant mural with features of aĀ face.

This brings up the importance of quality in any machine learning based process. Given that very few object classifiers or image based processes are 100% correct, the quality of an API is hard to gauge. When it comes to commercial implementations of these object classifier APIs, the brands of services will be largely impacted by the edge cases of a few requests. Because machine learning itself is so opaque, the brands of the service providers will be all the more important in determining how trustworthy these services are.

Oppositely, because the quality of a machine learning tasks vary so greatly, a brand may struggle showcasing its value to a user. When the binary quality of solving a machine learning task is pegged to a dollar amount, for example per API request, the ability to do something for free will be appealing. From the perspective of price, rolling your own free object classifier will be better than using a third-party service. The branded machine learning service market still has a long way to go before becoming clearly preferred over self-hosted implementations.

Specificity in object classification is very important

Finally, when it comes to any machine learning task, specificity is your friend. Specifically, when it comes to graffiti, its hard to qualify something that varies in form. Graffiti itself is a category that encompasses a huge range of visual compositions. Even a person may struggle to qualify what is or isn’t graffiti. When compared to detecting a face or a fruit, the specificity of the category is important.

The brilliance of WordNet and ImageNet are the strength of categorical specificities. By classifying the world through words and their relationships to one another, there is a way to qualify similarities and differences of images. For example, a pigeon is a type of bird, but different from a hawk. All the while, its completely different from an airplane or bee. The relationship between those things allow for clearly classifying what they would be. No such specificity exists in graffiti, but is needed to properly improve an object classifier.

Final final

Overall, the application works and was very helpful. Making this removed more of the mystery around how machine learning and image recognition services work. As I noted above, this process also made me much more aware of the shortfalls of these services and the places where this field is not yet defined. I definitely think this is something that all software engineers should learn how to do. Before the tools available become simple to use, I imagine there will be a good period of a complicated ecosystem to navigate. Similar to the browser wars before web standards were formed, there is going to be a lot of vying for market share amongst the machine learning providers. You can already see it between services from the larger companies like Amazon, Google and Apple. At the hardware and software level, this is also very apparent between Nvidia’s CUDA and AMD’s price appeal.

More to come!

Filed Under: Uncategorized Tagged With: Computer Vision, Graffiti, Machine Learning, Programming, Python

Primary Sidebar

Recent Posts

  • Thoughts on my 33rd birthday
  • Second order effects of companies as content creators
  • Text rendering stuff most people might not know
  • Why is video editing so horrible today?
  • Making the variable fonts Figma plugin (part 1 – what is variable fonts [simple])

Archives

  • August 2022
  • February 2021
  • October 2020
  • September 2020
  • August 2020
  • December 2019
  • March 2019
  • February 2019
  • November 2018
  • October 2018
  • April 2018
  • January 2018
  • December 2017
  • October 2017
  • July 2017
  • February 2017
  • January 2017
  • November 2016
  • October 2016
  • August 2016
  • May 2016
  • March 2016
  • November 2015
  • October 2015
  • September 2015
  • July 2015
  • June 2015
  • May 2015
  • March 2015
  • February 2015
  • January 2015
  • December 2014
  • November 2014
  • October 2014
  • September 2014
  • August 2014
  • July 2014
  • June 2014
  • May 2014
  • April 2014
  • March 2014
  • February 2014
  • January 2014
  • December 2013
  • October 2013
  • June 2013
  • May 2013
  • April 2013
  • March 2013
  • February 2013
  • January 2013
  • December 2012

Tags

  • 10 year reflection (1)
  • 100 posts (2)
  • 2013 (1)
  • academia (2)
  • Advertising (3)
  • aging (1)
  • Agriculture (1)
  • analytics (3)
  • anarchy (1)
  • anonymous (1)
  • api (1)
  • arizona (1)
  • Art (2)
  • art history (1)
  • artfound (1)
  • Artificial Intelligence (2)
  • balance (1)
  • banksy (1)
  • beacon (1)
  • Beacons (1)
  • beast mode crew (2)
  • becausewilliamshatner (1)
  • Big Data (1)
  • Birthday (1)
  • browsers (1)
  • buddhism (1)
  • bundling and unbundling (1)
  • china (1)
  • coding (1)
  • coffeeshoptalk (1)
  • colonialism (1)
  • Communication (1)
  • community development (1)
  • Computer Science (1)
  • Computer Vision (6)
  • crowdsourcing (1)
  • cyber security (1)
  • data migration (1)
  • Deep Learning (1)
  • design (1)
  • designreflection (1)
  • Developer (1)
  • Digital Humanities (2)
  • disruption theory (1)
  • Distributed Teams (1)
  • drawingwhiletalking (16)
  • education (3)
  • Email Marketing (3)
  • email newsletter (1)
  • Employee Engagement (1)
  • employment (2)
  • Engineering (1)
  • Enterprise Technology (1)
  • essay (1)
  • Ethics (1)
  • experiement (1)
  • fidgetio (38)
  • figma (2)
  • film (1)
  • film industry (1)
  • fingerpainting (8)
  • first 1000 users (1)
  • fonts (1)
  • forms of communication (1)
  • frontend framework (1)
  • fundraising (1)
  • Future Of Journalism (3)
  • future of media (1)
  • Future Of Technology (2)
  • Future Technology (1)
  • game development (2)
  • Geospatial (1)
  • ghostio (1)
  • github (2)
  • global collaboration (1)
  • god damn (1)
  • google analytics (1)
  • google docs (1)
  • Graffiti (23)
  • graffitifound (1)
  • graffpass (1)
  • growth hacking (1)
  • h1b visa (1)
  • hackathon (1)
  • hacking (1)
  • hacking reddit (2)
  • Hardware (1)
  • hiroshima (1)
  • homework (1)
  • human api (1)
  • I hate the term growth hacking (1)
  • ie6 (1)
  • ifttt (4)
  • Image Recognition (1)
  • immigration (1)
  • instagram (1)
  • Instagram Marketing (1)
  • internet media (1)
  • internet of things (1)
  • intimacy (1)
  • IoT (1)
  • iteration (1)
  • jason shen (1)
  • jobs (2)
  • jrart (1)
  • kickstart (1)
  • king robbo (1)
  • labor market (1)
  • Leonard Bogdonoff (1)
  • Literacy (1)
  • location (1)
  • Longform (2)
  • looking back (1)
  • los angeles (1)
  • Machine Learning (13)
  • MadeWithPaper (106)
  • making games (1)
  • management (1)
  • maps (2)
  • marketing (4)
  • Marketing Strategies (1)
  • Media (3)
  • medium (1)
  • mentor (1)
  • message (1)
  • mindmeld games (1)
  • Mobile (1)
  • Music (2)
  • Music Discovery (1)
  • neuroscience (2)
  • new yorker (1)
  • Newspapers (3)
  • nomad (1)
  • notfootball (2)
  • npaf (1)
  • odesk (1)
  • orbital (14)
  • orbital 2014 (14)
  • orbital class 1 (9)
  • orbitalnyc (1)
  • paf (2)
  • paid retweets (1)
  • painting (1)
  • physical web (1)
  • pitching (2)
  • popular (1)
  • post production (1)
  • Privacy (1)
  • process (1)
  • product (1)
  • Product Development (2)
  • product market fit (2)
  • Programming (6)
  • project reflection (1)
  • promotion (1)
  • prototype (17)
  • prototyping (1)
  • Public Art (1)
  • Public Speaking (1)
  • PublicArtFound (15)
  • Publishing (3)
  • Python (1)
  • quora (1)
  • Rails (1)
  • React (1)
  • React Native (1)
  • real design (1)
  • recent projects (1)
  • reddit (3)
  • redesign (1)
  • reflection (2)
  • rememberlenny (1)
  • Remote work (1)
  • replatform (1)
  • Responsive Emails (1)
  • retweet (1)
  • revenue model (1)
  • rick webb (1)
  • robert putnam (1)
  • ror (1)
  • rubyonrails (1)
  • segmenting audience (1)
  • Semanticweb (2)
  • Senior meets junior (1)
  • SGI (1)
  • Side Project (1)
  • sketching (22)
  • social capital (1)
  • social media followers (2)
  • social media manipulation (1)
  • social media marketing (1)
  • social reach (5)
  • software (3)
  • Soka Education (1)
  • Spatial Analysis (2)
  • spotify (1)
  • stanford (2)
  • Startup (21)
  • startups (7)
  • stree (1)
  • Street Art (4)
  • streetart (5)
  • stylometrics (1)
  • Technology (1)
  • thoughts (1)
  • Time as an asset in mobile development (1)
  • Towards Data Science (4)
  • TrainIdeation (42)
  • travel (1)
  • traveling (1)
  • tumblr milestone (2)
  • twitter (1)
  • twitter account (2)
  • typography (2)
  • unreal engine (1)
  • user behavior (1)
  • user experience (3)
  • user research (1)
  • user testing (1)
  • variable fonts (1)
  • video editing (2)
  • visual effects (1)
  • warishell (1)
  • Web Development (8)
  • webdec (1)
  • webdev (13)
  • windowed launch (1)
  • wordpress (1)
  • Work Culture (1)
  • workinprogress (1)
  • zoom (1)