• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Remember Lenny

Writing online

  • Portfolio
  • Email
  • Twitter
  • LinkedIn
  • Github
You are here: Home / Archives for Programming

Programming

Why is video editing so horrible today?

September 15, 2020 by rememberlenny

In the last three months, I have done more video post-production than I have done in the past 12 years. Surprisingly, in these years, nothing seems to have changed. Considering how much media is now machine analyzable content, such as audio and visual, I’m surprised there aren’t more patterns that make navigating and arranging video content faster. Beyond that, I’m surprised there isn’t more process for programmatically composing video in a polished complimentary way to the existing manual methods of arranging.

In 1918, when the video camera was created, if you filmed something and wanted to edit it, you took your footage, cut it and arranged it according to how you wanted it to look. Today, if you want to edit a video, you have to import the source assets into a specialty program (such as Adobe Premiere), and then manually view each item to watch/listen for the portion that you want. Once you have the sections of each imported asset, you have to manually arrange each item on a timeline. Of course a ton has changed, but the general workflow feels the same.

Should Critics and Festivals Give Editing Awards? Yes, and Here's Why |  IndieWire
Real life photo of me navigating my Premiere assets folders

How did video production and editing not get its digital-first methods of creation? Computing power has skyrocketed. Access to storage is generally infinite. And our computers are networked around the world. How is it that the workflow of import, edit, and export take so long?

The consumerization of video editing has simplified certain elements by abstracting away seemingly important but complicated components, such as the linearity of time. Things like Tiktok seem to be the most dramatic shift in video creation, in that the workflow shifts from immediate review and reshooting of video. Over the years, the iMovies and such have moved timelines, from horizontal representation of elapsed time into general blocks of “scenes” or clips. The simplification through abstraction is important for the general consumer, but reduces the attention to detail. This creates an aesthetic of its own, which seems to be the result of the changing of tools.

Where are all the things I take for granted in developer tools, like autocomplete or class-method search, in the video equivalent? What is autocomplete look like in editing a video clip? Where are the repeatable “patterns” I can write once, and reuse everywhere? Why does each item on a video canvas seem to live in isolation from one another, with no awareness of other elements or an ability to interact with each other?

My code editor searches my files and tried to “import” the methods when I start typing.

As someone who studied film and animation exclusively for multiple years, I’m generally surprised that the overall ways of producing content are largely the same as they have been 10 years ago, but also seemingly for the past 100.

I understand that the areas of complexity have become more niche, such as in VFX or multi-media. I have no direct experience with any complicated 3D rendering and I haven’t tried any visual editing for non-traditional video displays, so its a stretch to say film hasn’t changed at all. I haven’t touched the surface in new video innovation, but all considering, I wish some basic things were much easier.

For one, when it comes to visual layout, I would love something like the Figma “autolayout” functionality. If I have multiple items in a canvas, I’d like them to self-arrange based on some kind of box model. There should be a way to assign the equivalent of styles as “classes”, such as with CSS, and multiple text elements should be able to inherit/share padding/margin definitions. Things like flexbox and relative/absolute positioning would make visual templates significantly much easier and faster for developing fresh video content.

Currently I make visual frames in Figma, then export them because its so much easier than fumbling through the 2D translations in Premiere

I would love to have a “smarter” timeline that can surface “cues” that I may want to hook into for visual changes. The cues could make use of machine analyzable features in the audio and video, based on features detected in the available content. This is filled with lots of hairy areas, and definitely sounds nicer than it might be in actuality. At a basic example, the timeline could look at audio or a transcript and know when a certain speaker is talking. There are already services, such as Descript, that make seamless use of speaker detection. That should find some expression in video editing software. Even if the software itself doesn’t detect this information, the metadata from other software should be made use of.

The two basic views in Zoom. Grid or speaker.

More advanced would be to know when certain exchanges between multiple people are a self-encompassed “point”. Identifying when a “exchange” takes place, or when a “question” is “answered”, would be useful for title slides or lower-thirds with complimentary text.

Descript will identify speakers and color code the transcript.

If there are multiple shots of the same take, it would be nice to have the clips note where the beginning and end based on lining up the audio. Reviewing content shouldn’t be done in a linear fashion if there are ways to distinguish content of video/audio clip and compare it to itself or other clips.

In line with “cues”, I would like to “search” my video in a much more comprehensive way. My iPhone photos app lets me search by faces or location. How about that in my video editor? All the video clips with a certain face or background?

Also, it would be nice to generate these “features” with some ease. I personally dont know what it would take to train a feature detector by viewing some parts of a clip, labeling it, and then using the labeled example to find the other instances of similar kinds of visual content. I do know its possible, and that would be very useful for speeding up the editing process.

In my use case, I’m seeing a lot of video recordings of Zoom calls or webinars. This is another example of video content that generally looks the “same” and could be analyzed for certain content types. I would be able to quickly navigate through clips if I could be able to filter video by when the video is a screen of many faces viewed at once, or when only one speaker is featured at a time.

All of this to say, there is a lot of gaps in the tools available at the moment.

Filed Under: video Tagged With: film, post production, Programming, video editing

How I built a REST endpoint based Computer Vision task using Flask

December 31, 2017 by rememberlenny


This is a follow up on my process of developing familiarity with computer vision and machine learning techniques. As a web developer (read as ā€œrails developerā€), I found this growing sphere exciting, but don’t work with these technologies on a day-to-day. This is month three of a two year journey to explore this field. If you haven’t read already, you can see Part 1 here: From webdev to computer vision and geo and Part 2 here: Two months exploring deep learning and computer vision.

Overall Thoughts

Rails developers are good at quickly building out web applications with very little effort. Between scaffolds, clear model-view-controller logic, and the plethora of ruby gems at your disposal, Rails applications with complex logic can be spun up in a short amount of time. For example, I wouldn’t blink at building something that requires user accounts, file uploads, and various feeds of data. I could even make it highly testable with great documentation. Between Devise, Carrierwave (or the many other file upload gems), Sidekiq, and all the other accessible gems, I would be up and running on Heroku within 15 minutes.

Now, add a computer vision or machine learning task and I would have no idea where to go. Even as I explore this space, I still struggle to find practical applications for machine learning concepts (neural nets and deep learning) aside from word association or image analysis. That being said, the interesting ideas (which I have yet to find practical applications for) are around trend detection and generative adversarial networks.

Google search for ā€œhow to train a neuralĀ networkā€

As a software engineer, I have found it hard to understand the practical values of machine learning in the applications I build. There is a lot of writing around models (in the machine learning sense, rather than the web application/database sense), neural net architecture, and research, but I haven’t seen as much around the practical applications for a web developer like myself. As a result, I decided to build out a small part of a project I’ve been thinking about for a while.

The project was meant to detect good graffiti on Instagram. The original idea was to use machine learning to qualify what ā€œgood graffitiā€ looked like, and then run the machine learning model to detect and collect images. Conceptually, the idea sounds great, but I have no idea how to ā€œtrain a machine learning modelā€, and I have very little sense of where to start.

I started building out a simple part of the project with the understanding that I would need to ā€œtrainā€ my ā€œmodelā€ on good graffiti. I picked a few Instagram accounts of good graffiti artists, where I knew I could find high quality images. After crawling the Instagram accounts (which took much longer than expected due to Instagram’s API restrictions) and analyzing the pictures, I realized a big problem at hand. The selected accounts were great, but had many non-graffiti images, mainly of people. To get the ā€œgood graffitiā€ images, I was first going to need to filter out the images of people.

The application I built to crawl Instagram created a frontend that displayed graffiti.

By reviewing the pictures, I found that as many as four out of every ten images was of a person or had a person in it. As a result, before even starting the task of ā€œtrainingā€ a ā€œgood graffitiā€ ā€œmodelā€, I needed to just get a set of pictures that didn’t contain any people.

(Side note for non-machine learning people: I’m using quotations around certain words because you and I probably have an equal understanding of what those words actually mean.)

Rather than having a complicated machine learning application that did some complicated neural network-deep learning-artificial intelligence-stochastic gradient descent-linear regression-bayesian machine learning magic, I decided to simplify the project into building something that detected humans in a picture and flagged them. I realized that many examples of machine learning tutorials I had read before showed me how to do this, so it was a matter of making those tutorials actually useful.

—

The application (with links toĀ code)

I was using Ruby on Rails for the web applications that managed the database and rendered content. I did most of the image crawling of Instagram using Ruby, via a Redis library called Sidekiq. This makes running delayed tasks easy.

The PyImageSearch article used as reference is great and can be found at https://www.pyimagesearch.com/2017/09/11/object-detection-with-deep-learning-and-opencv/

For the machine learning logic, I had a code example for object detection, using OpenCV, from a PyImageSearch.com tutorial. The code example was not complete, in that it detected one of 30 different items in the trained image model, one of them being people, and drew a box around the detected object. In my case, I slightly modified the example and placed it inside a simple web application based on Flask.

Link to Github: The main magic of theĀ app

I made a Flask application with an endpoint that accepted a JSON blob with an image URL. The application downloaded the image URL and processed it through the code example that drew a bounding box around the detected object. I only cared about the code example detecting people, so I created a basic condition to give a certain response for detecting a person and a generic response for everything else.

This simple endpoint was the machine learning magic at work. Sadly, it was also the first time I’d seen a practical, usable example of how the complicated machine learning ā€œstuffā€ integrates with the rest of a web application.

For those who are interested, the code for these are below.

https://github.com/rememberlenny/Flask-Person-Detector

—

Concluding Realizations

I was surprised that I hadn’t seen a simple Flask based implementation of a deep neural network before. I also feel like based on this implementation, when training a model isn’t involved, applying machine learning into any application is just like having a library with a useful function. I’m assuming that in the future, the separation of the model and the libraries for utilizing the models will be simplified, similar to how a library is ā€œimportedā€ or added using a bundler. My guess is some of these tools exist, but I am not deep enough yet to know about them.

https://www.tensorflow.org/serving/

Through reviewing how to access the object detection logic, I found a few services that seemed relevant, but eventually were not quite what I needed. Specifically, there is a tool called Tensorflow Serving, which seems like it should be a simple web server for Tensorflow, but isn’t quite simple enough. It possibly is what I need, but the idea of having a server or web application that solely runs Tensorflow is quite difficult to setup.

Web service based machineĀ learning

A lot of the machine learning examples that I find online are very self-encompassed examples. The examples start with the problem, then provide the code to run the example locally. Often the image is an input provided by file path via command line interface, and the output is a python generated window that displays a manipulated image. This isn’t very useful as a web application, so making a REST endpoint seems like a basic next step.

Building the machine learning logic into a REST endpoint is not hard, but there are some things to consider. In my case, the server was running on a desktop computer with enough CPU and memory to process requests quickly. This might not always be the case, so a future endpoint might need to run tasks asynchronously using something like Redis. A HTTP request here would most likely hang and possibly timeout, so some basic micro-service logic would need to be considered for slow queries.

Binary expectations and machine learningĀ brands

A big problem with the final application was that processed graffiti images were sometimes falsely flagged as people. When the painting contained features that looked like a person, such as a face or body, the object classifier was falsely flagging the paintings. Oppositely, there were times when pictures of people were not properly flagging the images as containing people.

[GRAFFITI ONLY] List of images that were noted to not have people. Note the images with the backs ofĀ people.

Web applications require binary conclusions to take action. A image classifier will provide a percentage rating regarding whether or not the object detected is present. In larger object detection models, the classifier will have more than one object being recommended as being potentially detected. For example, there is a 90% chance of a person being in the photo, a 76% chance of a airplane, and a 43% chance of a giant banana. This isn’t very useful when the application processing the responses just needs to know whether or not something is present.

[PEOPLE ONLY] List of images that were classified as people. Note the last one is a giant mural with features of aĀ face.

This brings up the importance of quality in any machine learning based process. Given that very few object classifiers or image based processes are 100% correct, the quality of an API is hard to gauge. When it comes to commercial implementations of these object classifier APIs, the brands of services will be largely impacted by the edge cases of a few requests. Because machine learning itself is so opaque, the brands of the service providers will be all the more important in determining how trustworthy these services are.

Oppositely, because the quality of a machine learning tasks vary so greatly, a brand may struggle showcasing its value to a user. When the binary quality of solving a machine learning task is pegged to a dollar amount, for example per API request, the ability to do something for free will be appealing. From the perspective of price, rolling your own free object classifier will be better than using a third-party service. The branded machine learning service market still has a long way to go before becoming clearly preferred over self-hosted implementations.

Specificity in object classification is very important

Finally, when it comes to any machine learning task, specificity is your friend. Specifically, when it comes to graffiti, its hard to qualify something that varies in form. Graffiti itself is a category that encompasses a huge range of visual compositions. Even a person may struggle to qualify what is or isn’t graffiti. When compared to detecting a face or a fruit, the specificity of the category is important.

The brilliance of WordNet and ImageNet are the strength of categorical specificities. By classifying the world through words and their relationships to one another, there is a way to qualify similarities and differences of images. For example, a pigeon is a type of bird, but different from a hawk. All the while, its completely different from an airplane or bee. The relationship between those things allow for clearly classifying what they would be. No such specificity exists in graffiti, but is needed to properly improve an object classifier.

Final final

Overall, the application works and was very helpful. Making this removed more of the mystery around how machine learning and image recognition services work. As I noted above, this process also made me much more aware of the shortfalls of these services and the places where this field is not yet defined. I definitely think this is something that all software engineers should learn how to do. Before the tools available become simple to use, I imagine there will be a good period of a complicated ecosystem to navigate. Similar to the browser wars before web standards were formed, there is going to be a lot of vying for market share amongst the machine learning providers. You can already see it between services from the larger companies like Amazon, Google and Apple. At the hardware and software level, this is also very apparent between Nvidia’s CUDA and AMD’s price appeal.

More to come!

Filed Under: Uncategorized Tagged With: Computer Vision, Graffiti, Machine Learning, Programming, Python

Two months exploring deep learning and computer vision

December 28, 2017 by rememberlenny

Repost from Medium

I decided to develop familiarity with computer vision and machine learning techniques. As a web developer, I found this growing sphere exciting, but did not have any contextual experience working with these technologies. I am embarking on a two year journey to explore this field. If you haven’t read it already, you can see Part 1 here: From webdev to computer vision and geo.

Iļø ended up getting myself moving by exploring any opportunity Iļø had to excite myself with learning. I wasn’t initially stuck on studying about machine learning, but I wanted to get back in the groove of being excited about a subject. Iļø kicked off my search by attending a day-long academic conference on cryptocurrencies, and by the time the afternoon sessions began, I realized machine learning and computer vision was much more interesting to me.

Getting started

Iļø kick-started my explorations right around the time a great book on the cross section of deep learning and computer vision was published. The author, Adrian Rosebrock from PyImageSearch.com, compiled a three volume masterpiece on the high level ideas and low level applications of computer vision and deep learning. While exploring deep learning, Iļø encountered numerous explanations of linear regression, Naive Bayesian applications (Iļø realize now that Iļø have heard this name pronounced so many different ways), random forest/decision tree learning, and all the other things I’m butchering.

Iļø spent a few weeks reading the book and came away feeling like Iļø could connect all the disparate blog posts I have read up to now to the the array of mathematical concepts, abstract ideas, and practical programming applications. I read through the book quickly, and came away with a better sense of how to approach the field as a whole. My biggest takeaway was coming to the conclusion that Iļø wanted to solidify my own tools and hardware for building computer vision software.

Hardware implementation

Iļø was inspired to get a Raspberry Pi and RPI camera that Iļø would be able to use to analyze streams of video. Little did I know that setting up the Raspberry Pi would take painfully long. Initially, Iļø expected to simply get up and running with a video stream and process the video on my computer. Iļø struggled with getting the Raspberry Pi operating system to work. Then, once Iļø realized what was wrong, Iļø accidentally installed the wrong image drivers and unexpectedly installed conflicting software. The process that Iļø initially thought would be filled with processing camera images, ended up becoming a multi hour debugging nightmare.

So far, Iļø have realized that this is a huge part getting started with machine learning and computer vision ā€œstuffā€ is about debugging.

  • Step 1.Get an idea.Ā 
  • Step 2. Start looking for the tools to do the thing.Ā 
  • Step 3. Install the software needed.Ā 
  • Step 4. Drown in conflicts and unexpected package version issues.

My original inspiration behind the Raspberry Pi was the idea of setting up a simple device that has a camera and GPS signal. The idea was based around thinking about how many vehicles in the future, autonomous or fleet vehicles, will need many cameras for navigation. Whether for insurance purposes or basic functionality, Iļø imagine that a ton of video footage will be created and used. In that process, there will be huge repositories of media that will go unused and become a rich data source for understanding the world.

Iļø ended up exploring the Raspberry Pi’s computer vision abilities, but never successfully got anything interesting working as I’d hoped. Iļø discovered that there are numerous cheaper Raspberry Pi-like devices, that had both the interconnectivity and the camera functionality in a smaller PCB board than a full size Raspberry Pi. Then Iļø realized that rather than going the hardware route, Iļø might as well have used an old iPhone and developed some software.

My brief attempt at exploring a hardware component of deep learning made me realize I should stick to software where possible. Including a new variable when the software part isn’t solved just adds to the complexity.

Open source tools

In the first month of looking around for machine learning resources, I found many open source tools that make getting up and running very easy. I knew that there were many proprietary services provided by the FANG tech companies, but I wasn’t sure how they competed with the open source alternatives. The image recognition and OCR tools that can be used as SAAS tools from IBM, Google, Amazon, and Microsoft are very easy to use. To my surprise, there are great open source alternatives that are worth configuring to avoid unnecessary service dependence.

For example, a few years ago, I launched an iOS application to collect and share graffiti photos. I was indexing images from publicly available API’s with geotagged images, such as Instagram and Flickr. Using these sources, I used basic features, such as hashtags and location data, to distinguish if images were actually graffiti. Initially, I began pulling thousands of photos a week, and soon scaled to hundreds of thousands a month. I quickly noticed that many of the images I indexed were not graffiti and instead were images that would be destructive to the community I was trying to foster. I couldn’t prevent low-quality photos of people taking selfies or poorly tagged images that were not safe for work from loading in people’s feeds. As a result, I decided to shut down the overall project.

Now, with the machine learning services and open source implementations for object detection and nudity detection, I can roll my own service that easily checks each of the photos that get indexed. Previously, if I paid a service to do that quality checking, I would have been racking up hundreds of dollars if not thousands of dollars in API charges. Instead, I can now download an AMI from some ā€œdata scienceā€ AWS box and create my own API for checking for undesired image content. This was out of reach for me, even just two years ago.

On a high level, before undergoing this process, I felt like I theoretically understood most of the object recognition and machine learning processes. After beginning the process of connecting the dots between all the machine learning content I had been consuming, I feel like I am much more clear on what concepts I need to learn. For example, rather than just knowing that linear algebra is important for machine learning, I now understand how problems are broken into multidimensional array/matrices and are processed in mass quantities to look for patterns that are only theoretically representable. Before, I knew that there was some abstraction between features and how they were represented as numbers that could be compared across a range of evaluated items. Now I understand more clearly how dimensions, in the context of machine learning, are represented by the sheer fact that there are many factors that are directly and indirectly correlated to one another. The matrix math that the multidimensional aspects of feature detection and evaluation is still a mystery to me, but I am able to understand the high level concepts.

Concretely, the reading of Adrian Rosebrock’s book gave me the insight to decode the box-line diagrams of machine learning algorithms. The breakdown of a deep learning network architecture is now somewhat understandable. I am also familiar with the datasets (MNIST, CIFAR-10, and ImageNet) that are commonly used to benchmark various image recognition models, as well as the differences between image recognition models (such as VGG-16, Inception, etc).

Timing — Public Funding

One reason I decided machine learning and computer vision are important to learn now is related to a concept I learned from the book: Areas with heavy government investment in research are on track to have huge innovation. Currently, there are hundreds of millions of dollars being spent on research programs in the form of grants and scholarships, in addition to the specific funding being allocated to programs for specific machine learning related projects.

In addition to government spending, publicly accessible research from private institutions seems to be growing. The forms of research that currently exist, coming out of big tech companies and public foundations, are pushing forward the entire field of machine learning. I personally have never seen the same concentration of public projects funded by private institutions in the form of publications like distill.pub and collectives like the OpenAI foundation. The work they are putting out is unmatched.

Actionable tasks

Reviewing the materials I have been reading, I realize my memory is already failing me. I’m going to do more action-oriented reading from this point forward. I have a box with GPUs to work with now, so I don’t feel any limitations around training models and working on datasets.

Most recently, I attended a great conference on Spatial Data Science, hosted by Carto. There, I became very aware of how much I don’t know in the field of spatial data science. Before the conference, I was just calling the entire field ā€œmap location data stuffā€.

I’ll continue making efforts to meet up with different people I find online with similar interests. I’ve already been able to do this with folks I find who live in New York and have written Medium posts relevant to my current search. Most recently, when exploring how to build a GPU box, I was able to meet a fellow machine learning explorer for breakfast.

By the middle of January, I’d like to be familiar with technical frameworks for training a model around graffiti images. I think at the very least, I want to have a set of images to work with, labels to associate the images to, and a process for cross-checking an unindexed image against the trained labels.

Filed Under: Uncategorized Tagged With: Computer Vision, Machine Learning, Programming, Spatial Analysis, Towards Data Science

Two months exploring deep learning and computer vision

December 20, 2017 by rememberlenny

I’ve been reading/note taking is using an iPad Pro and LiquidText

I decided to develop familiarity with computer vision and machine learning techniques. As a web developer, I found this growing sphere exciting, but did not have any contextual experience working with these technologies. I am embarking on a two year journey to explore this field. If you haven’t read it already, you can see Part 1 here: From webdev to computer vision and geo.

—

Iļø ended up getting myself moving by exploring any opportunity Iļø had to excite myself with learning. I wasn’t initially stuck on studying about machine learning, but I wanted to get back in the groove of being excited about a subject. Iļø kicked off my search by attending a day-long academic conference on cryptocurrencies, and by the time the afternoon sessions began, I realized machine learning and computer vision was much more interesting to me.

Getting started

Iļø kick-started my explorations right around the time a great book on the cross section of deep learning and computer vision was published. The author, Adrian Rosebrock from PyImageSearch.com, compiled a three volume masterpiece on the high level ideas and low level applications of computer vision and deep learning. While exploring deep learning, Iļø encountered numerous explanations of linear regression, Naive Bayesian applications (Iļø realize now that Iļø have heard this name pronounced so many different ways), random forest/decision tree learning, and all the other things I’m butchering.

My new book, #DeepLearning for Computer Vision with #Python, has been OFFICIALLY released! Grab your copy here: https://t.co/rQgpAflp52 pic.twitter.com/fKEyf8i2fR

— PyImageSearch (@PyImageSearch) October 17, 2017

Iļø spent a few weeks reading the book and came away feeling like Iļø could connect all the disparate blog posts I have read up to now to the the array of mathematical concepts, abstract ideas, and practical programming applications. I read through the book quickly, and came away with a better sense of how to approach the field as a whole. My biggest takeaway was coming to the conclusion that Iļø wanted to solidify my own tools and hardware for building computer vision software.

Hardware implementation

Iļø was inspired to get a Raspberry Pi and RPI camera that Iļø would be able to use to analyze streams of video. Little did I know that setting up the Raspberry Pi would take painfully long. Initially, Iļø expected to simply get up and running with a video stream and process the video on my computer. Iļø struggled with getting the Raspberry Pi operating system to work. Then, once Iļø realized what was wrong, Iļø accidentally installed the wrong image drivers and unexpectedly installed conflicting software. The process that Iļø initially thought would be filled with processing camera images, ended up becoming a multi hour debugging nightmare.

So far, Iļø have realized that this is a huge part getting started with machine learning and computer vision ā€œstuffā€ is about debugging.

Step 1.Get an idea.Ā 
Step 2. Start looking for the tools to do the thing.Ā 
Step 3. Install the software needed.Ā 
Step 4. Drown in conflicts and unexpected package version issues.

https://aiyprojects.withgoogle.com/vision#list-of-materials

My original inspiration behind the Raspberry Pi was the idea of setting up a simple device that has a camera and GPS signal. The idea was based around thinking about how many vehicles in the future, autonomous or fleet vehicles, will need many cameras for navigation. Whether for insurance purposes or basic functionality, Iļø imagine that a ton of video footage will be created and used. In that process, there will be huge repositories of media that will go unused and become a rich data source for understanding the world.

Iļø ended up exploring the Raspberry Pi’s computer vision abilities, but never successfully got anything interesting working as I’d hoped. Iļø discovered that there are numerous cheaper Raspberry Pi-like devices, that had both the interconnectivity and the camera functionality in a smaller PCB board than a full size Raspberry Pi. Then Iļø realized that rather than going the hardware route, Iļø might as well have used an old iPhone and developed some software.

My brief attempt at exploring a hardware component of deep learning made me realize I should stick to software where possible. Including a new variable when the software part isn’t solved just adds to the complexity.

Open sourceĀ tools

In the first month of looking around for machine learning resources, I found many open source tools that make getting up and running very easy. I knew that there were many proprietary services provided by the FANG tech companies, but I wasn’t sure how they competed with the open source alternatives. The image recognition and OCR tools that can be used as SAAS tools from IBM, Google, Amazon, and Microsoft are very easy to use. To my surprise, there are great open source alternatives that are worth configuring to avoid unnecessary service dependence.


For example, a few years ago, I launched an iOS application to collect and share graffiti photos. I was indexing images from publicly available API’s with geotagged images, such as Instagram and Flickr. Using these sources, I used basic features, such as hashtags and location data, to distinguish if images were actually graffiti. Initially, I began pulling thousands of photos a week, and soon scaled to hundreds of thousands a month. I quickly noticed that many of the images I indexed were not graffiti and instead were images that would be destructive to the community I was trying to foster. I couldn’t prevent low-quality photos of people taking selfies or poorly tagged images that were not safe for work from loading in people’s feeds. As a result, I decided to shut down the overall project.

#graffiti results on instagram

Now, with the machine learning services and open source implementations for object detection and nudity detection, I can roll my own service that easily checks each of the photos that get indexed. Previously, if I paid a service to do that quality checking, I would have been racking up hundreds of dollars if not thousands of dollars in API charges. Instead, I can now download an AMI from some ā€œdata scienceā€ AWS box and create my own API for checking for undesired image content. This was out of reach for me, even just two years ago.

Overview

On a high level, before undergoing this process, I felt like I theoretically understood most of the object recognition and machine learning processes. After beginning the process of connecting the dots between all the machine learning content I had been consuming, I feel like I am much more clear on what concepts I need to learn. For example, rather than just knowing that linear algebra is important for machine learning, I now understand how problems are broken into multidimensional array/matrices and are processed in mass quantities to look for patterns that are only theoretically representable. Before, I knew that there was some abstraction between features and how they were represented as numbers that could be compared across a range of evaluated items. Now I understand more clearly how dimensions, in the context of machine learning, are represented by the sheer fact that there are many factors that are directly and indirectly correlated to one another. The matrix math that the multidimensional aspects of feature detection and evaluation is still a mystery to me, but I am able to understand the high level concepts.

The previously illegible network architecture graphs are now seemingly approachable.

Concretely, the reading of Adrian Rosebrock’s book gave me the insight to decode the box-line diagrams of machine learning algorithms. The breakdown of a deep learning network architecture is now somewhat understandable. I am also familiar with the datasets (MNIST, CIFAR-10, and ImageNet) that are commonly used to benchmark various image recognition models, as well as the differences between image recognition models (such as VGG-16, Inception, etc).

Timingā€Šā€”ā€ŠPublicĀ Funding

One reason I decided machine learning and computer vision are important to learn now is related to a concept I learned from the book: Areas with heavy government investment in research are on track to have huge innovation. Currently, there are hundreds of millions of dollars being spent on research programs in the form of grants and scholarships, in addition to the specific funding being allocated to programs for specific machine learning related projects.

Example of pix2pix algorithm applied to ā€œcat-nessā€. https://distill.pub/2017/aia/

In addition to government spending, publicly accessible research from private institutions seems to be growing. The forms of research that currently exist, coming out of big tech companies and public foundations, are pushing forward the entire field of machine learning. I personally have never seen the same concentration of public projects funded by private institutions in the form of publications like distill.pub and collectives like the OpenAI foundation. The work they are putting out is unmatched.

Actionable tasks

Reviewing the materials I have been reading, I realize my memory is already failing me. I’m going to do more action-oriented reading from this point forward. I have a box with GPUs to work with now, so I don’t feel any limitations around training models and working on datasets.

Most recently, I attended a great conference on Spatial Data Science, hosted by Carto. There, I became very aware of how much I don’t know in the field of spatial data science. Before the conference, I was just calling the entire field ā€œmap location data stuffā€.

I’ll continue making efforts to meet up with different people I find online with similar interests. I’ve already been able to do this with folks I find who live in New York and have written Medium posts relevant to my current search. Most recently, when exploring how to build a GPU box, I was able to meet a fellow machine learning explorer for breakfast.

By the middle of January, I’d like to be familiar with technical frameworks for training a model around graffiti images. I think at the very least, I want to have a set of images to work with, labels to associate the images to, and a process for cross-checking an unindexed image against the trained labels.


Thanks to Jihii Jolly for correcting my grammar.

Filed Under: Uncategorized Tagged With: Computer Vision, Machine Learning, Programming, Spatial Analysis, Towards Data Science

How I Used Machine Learning to Inspire Physical Paintings

July 11, 2017 by rememberlenny

Since I was 15 years old, I have been painting graffiti under bridges and in abandoned buildings. I grew up in San Francisco when street art was booming, and inspired by the colors and aesthetic, I looked for ways to create art and taught myself to paint. As I got older, I discovered the graffiti communities on Flickr, and began making an effort to meet artists where I lived and share photos of my work online. As Tumblr grew in popularity, the community moved. Then Instagram emerged, and the community moved again.








ā€œGiftā€, Photo collection from 2010–2012. All photos taken and painted byĀ author.

In recent years, I haven’t had the same leeway to paint in public. There was a greater cultural acceptance of street art when I lived abroad. Painting on walls was seen as beautification in areas where there was much demolition. When I moved back to the US, I started painting on larger canvases, and eventually moved toward spray cans and paint brushes.

Kawan’s ā€œSunset Runningā€ project. Courtesy of Kawandeep Virdee.

Inspired by a project by Kawandeep Virdee, I photoshopped the paintings with motion blur filters, and modified the lighting effects. The result was a creative jumping-off point, enabling me to create a digitally inspired physical painting.

Last year, I started experimenting even more with digitally manipulated images, and their role in inspiring physical paintings. I began creating aesthetically beautiful images by taking classic paintings from the 18th and 19th century and running various photoshop filters over them. I found the color and contrast from these old paintings to be unmatched and beautiful.

Process for turning classic paintings into beautiful colorĀ muses.

I took the digital pieces I created and used them as the inspiration for painting new pieces by the classical paintings on a computer and then physically painting the remixed image.

The Ninth Wave hanging on my wall. Photo byĀ author.

I continued my interest in graffiti, again using the digital space as a canvas, and spent a few months building out various software tools that I thought would be useful for graffiti artists. After creating such a large library of literally millions of paintings, I realized I wanted to do something more than just browse the images, so I started exploring different techniques around machine learning.

Painting based on Ray Collin’s Seascape series painting after digitally manipulating the photo. Photo by author via RememberLenny

I started teaching myself about the application of neural networks to do something called ā€œstyle transfer,ā€ which refers to the process of analyzing two images for the qualities that make the picture recognizable, then applying those qualities to another picture. This meant that I could replicate an image’s color, shapes, contrast, and various other features onto another. The most commonly recognized style transfer application is from Van Gogh’s ā€œStarry Nightā€ to any photograph.

Example from a GitHub repository that implements the Artistic Style Transfer algorithm using Torch. Credit: jcjohnson

Similar to my previous project of painting the digital sunset images, I processed pictures using the artistic style transfer algorithm and then painted them. Referring to the plethora of graffiti images I’d already collected, I used images of nature and processed them in the style of street art I thought looked interesting. The end result was an aesthetically interesting image I couldn’t imagine creating from scratch.

Process of creating the Artistic Style TransferĀ images.

It’s been a few months since I’ve done anything with this technique of mixing images and painting them. I hope the process depicted above can be a source of inspiration for other programmer-painters who enjoy mixing both practices.

Final version of the digitally inspired painting. Photo byĀ author.

Below are a few examples of what an artist can create by combining street art images with photographs.











Photos byĀ author.

Thanks to Edwin Morris for the grammatical review and Lam Thuy Vo for the ideas.

Filed Under: Uncategorized Tagged With: Artificial Intelligence, Graffiti, Machine Learning, Programming, Web Development

5 things I learned launching my first iOS app

December 5, 2014 by rememberlenny

Public Artā€Šā€”ā€ŠFind street artĀ nearby.

Public Art iOSĀ app.

1. Having control of the external services is important

The primary function of my app is to check your current location against a database of geotagged images. This is a straightforward call that leans on a ruby gem called Geocoder. To translate the user’s current location, I grab the longitude and latitude of a user, then use Google Maps Geocoder API to translate the coordinates into usable data.

Not the Google Maps API Quota. Rails memory maxed by a bad Model.all call.

I was unprepared for the Google Maps Geocoder api quota. At midnight, I noticed that my server wasn’t returning queries to users because Google Maps quota had expired. This could have been incredibly frustrating if the logic for the query was in the iOS app. Fortunately, the query was made through the Rails application I setup as an API endpoint. I was able to make adjustments to the Geocoder source and spin up a Data Science Toolkit virtual machine on a AWS box. I switched the API endpoints to the DSTK and got the Geocodes back up and running.

2. Misusing app push notifications is veryĀ easy

This is what I saw after unintentionally sending a 12am push notification.

Last night, after noticing the Google Maps api quota was maxed out, I tried to find a way to communicate to any users who were receiving empty queries. I intended to send an in-app notification to anyone currently on the app. In the process, I accidentally sent all app downloads a push notification. This was right at midnight on the east coast, so I felt very bad for any people who may have been disturbed. If it was me, I’d be pissed.

3. Launching on the Apple store is time consuming.

I spent days of continuous work to figure out how to get the correct assets, prepare my build files, get the right version of ode installed, etc. The week before Thanks Giving, I was on the subway home from Brooklyn and editing the promo video for the app on Photoshop. Anything that can help accelerate the process is helpful.

http://sketchtoappstore.com/

A huge help were all the Sketch.app plugins that helped generate in-app and app store assets.

4. Setting up analytics isĀ hard

The lean startup approach is all about doing as much work as needed for getting core functionality, without making assumptions about your user’s taste. In this way, you use data to gain insights on what needs to be further developed. To do this correctly, you need to have a good analytics framework setup.

Downloads reported in Parse. Its cool seeing so many timezones.

I spent time setting up event tracking on my app, but completely missed some obvious things. The main pitfall was that I didnt set up a user definition attribute. I accounted for the events that would help me track user frustration, but Im not able to do any effective user segmenting. Its hard for me to isolate the data trails to specific users.

I also leaned on two analytics services because I couldnt get one to fulfill all my needs. I used Mixpanel for event tracking and Parse for push notification management. In the process, I didnt fully set up Parse correctly and never integrated user profiles into Mixpanel. The partial setup handicaps me from knowing the user response to push notifications or segmenting data by users.

5. Understanding analytics is alsoĀ hard

I set up enough tracking to generate a solid list of events. I find when people query, where they query from, how they click, what they click, how many searched they do, etc. Now that I have this data, I dont have any inherit insights. I need to go back to each event and understand exactly how it is generated. I have some events that were called hundreds of times and others that were called only 10 or 20 times. Differentiating how the event may be set up incorrectly, or how there is a huge demand for a certain action is hard.

MixPanel dashboard for trackedĀ events.

One thing that I wish I had was a basic events list for user actions that are not in my app. For example, I want to know if people tried to swipe left or right, pinched, rotate their device. I didnt try to capture these events but it would be useful for understanding how to develop future features.


Im still not sure what the actual total downloads or user actions were. iTunes, Mixpanel and Parse are all out of sync. Ironically, I trust Parse more than iTunes because it reports unique device parameters that I can see.

Overall, the process was a success. Most importantly, I feel good about the process. I was excited to see Laughing Squid post about the app and Product Hunt attention.

If you haven’t already, check out Public Art. The app to discover graffiti and street art near you!


If this was interesting to you, follow me onĀ Twitter.

Filed Under: Uncategorized Tagged With: Product Development, Programming, Startup

Primary Sidebar

Recent Posts

  • Thoughts on my 33rd birthday
  • Second order effects of companies as content creators
  • Text rendering stuff most people might not know
  • Why is video editing so horrible today?
  • Making the variable fonts Figma plugin (part 1 – what is variable fonts [simple])

Archives

  • August 2022
  • February 2021
  • October 2020
  • September 2020
  • August 2020
  • December 2019
  • March 2019
  • February 2019
  • November 2018
  • October 2018
  • April 2018
  • January 2018
  • December 2017
  • October 2017
  • July 2017
  • February 2017
  • January 2017
  • November 2016
  • October 2016
  • August 2016
  • May 2016
  • March 2016
  • November 2015
  • October 2015
  • September 2015
  • July 2015
  • June 2015
  • May 2015
  • March 2015
  • February 2015
  • January 2015
  • December 2014
  • November 2014
  • October 2014
  • September 2014
  • August 2014
  • July 2014
  • June 2014
  • May 2014
  • April 2014
  • March 2014
  • February 2014
  • January 2014
  • December 2013
  • October 2013
  • June 2013
  • May 2013
  • April 2013
  • March 2013
  • February 2013
  • January 2013
  • December 2012

Tags

  • 10 year reflection (1)
  • 100 posts (2)
  • 2013 (1)
  • academia (2)
  • Advertising (3)
  • aging (1)
  • Agriculture (1)
  • analytics (3)
  • anarchy (1)
  • anonymous (1)
  • api (1)
  • arizona (1)
  • Art (2)
  • art history (1)
  • artfound (1)
  • Artificial Intelligence (2)
  • balance (1)
  • banksy (1)
  • beacon (1)
  • Beacons (1)
  • beast mode crew (2)
  • becausewilliamshatner (1)
  • Big Data (1)
  • Birthday (1)
  • browsers (1)
  • buddhism (1)
  • bundling and unbundling (1)
  • china (1)
  • coding (1)
  • coffeeshoptalk (1)
  • colonialism (1)
  • Communication (1)
  • community development (1)
  • Computer Science (1)
  • Computer Vision (6)
  • crowdsourcing (1)
  • cyber security (1)
  • data migration (1)
  • Deep Learning (1)
  • design (1)
  • designreflection (1)
  • Developer (1)
  • Digital Humanities (2)
  • disruption theory (1)
  • Distributed Teams (1)
  • drawingwhiletalking (16)
  • education (3)
  • Email Marketing (3)
  • email newsletter (1)
  • Employee Engagement (1)
  • employment (2)
  • Engineering (1)
  • Enterprise Technology (1)
  • essay (1)
  • Ethics (1)
  • experiement (1)
  • fidgetio (38)
  • figma (2)
  • film (1)
  • film industry (1)
  • fingerpainting (8)
  • first 1000 users (1)
  • fonts (1)
  • forms of communication (1)
  • frontend framework (1)
  • fundraising (1)
  • Future Of Journalism (3)
  • future of media (1)
  • Future Of Technology (2)
  • Future Technology (1)
  • game development (2)
  • Geospatial (1)
  • ghostio (1)
  • github (2)
  • global collaboration (1)
  • god damn (1)
  • google analytics (1)
  • google docs (1)
  • Graffiti (23)
  • graffitifound (1)
  • graffpass (1)
  • growth hacking (1)
  • h1b visa (1)
  • hackathon (1)
  • hacking (1)
  • hacking reddit (2)
  • Hardware (1)
  • hiroshima (1)
  • homework (1)
  • human api (1)
  • I hate the term growth hacking (1)
  • ie6 (1)
  • ifttt (4)
  • Image Recognition (1)
  • immigration (1)
  • instagram (1)
  • Instagram Marketing (1)
  • internet media (1)
  • internet of things (1)
  • intimacy (1)
  • IoT (1)
  • iteration (1)
  • jason shen (1)
  • jobs (2)
  • jrart (1)
  • kickstart (1)
  • king robbo (1)
  • labor market (1)
  • Leonard Bogdonoff (1)
  • Literacy (1)
  • location (1)
  • Longform (2)
  • looking back (1)
  • los angeles (1)
  • Machine Learning (13)
  • MadeWithPaper (106)
  • making games (1)
  • management (1)
  • maps (2)
  • marketing (4)
  • Marketing Strategies (1)
  • Media (3)
  • medium (1)
  • mentor (1)
  • message (1)
  • mindmeld games (1)
  • Mobile (1)
  • Music (2)
  • Music Discovery (1)
  • neuroscience (2)
  • new yorker (1)
  • Newspapers (3)
  • nomad (1)
  • notfootball (2)
  • npaf (1)
  • odesk (1)
  • orbital (14)
  • orbital 2014 (14)
  • orbital class 1 (9)
  • orbitalnyc (1)
  • paf (2)
  • paid retweets (1)
  • painting (1)
  • physical web (1)
  • pitching (2)
  • popular (1)
  • post production (1)
  • Privacy (1)
  • process (1)
  • product (1)
  • Product Development (2)
  • product market fit (2)
  • Programming (6)
  • project reflection (1)
  • promotion (1)
  • prototype (17)
  • prototyping (1)
  • Public Art (1)
  • Public Speaking (1)
  • PublicArtFound (15)
  • Publishing (3)
  • Python (1)
  • quora (1)
  • Rails (1)
  • React (1)
  • React Native (1)
  • real design (1)
  • recent projects (1)
  • reddit (3)
  • redesign (1)
  • reflection (2)
  • rememberlenny (1)
  • Remote work (1)
  • replatform (1)
  • Responsive Emails (1)
  • retweet (1)
  • revenue model (1)
  • rick webb (1)
  • robert putnam (1)
  • ror (1)
  • rubyonrails (1)
  • segmenting audience (1)
  • Semanticweb (2)
  • Senior meets junior (1)
  • SGI (1)
  • Side Project (1)
  • sketching (22)
  • social capital (1)
  • social media followers (2)
  • social media manipulation (1)
  • social media marketing (1)
  • social reach (5)
  • software (3)
  • Soka Education (1)
  • Spatial Analysis (2)
  • spotify (1)
  • stanford (2)
  • Startup (21)
  • startups (7)
  • stree (1)
  • Street Art (4)
  • streetart (5)
  • stylometrics (1)
  • Technology (1)
  • thoughts (1)
  • Time as an asset in mobile development (1)
  • Towards Data Science (4)
  • TrainIdeation (42)
  • travel (1)
  • traveling (1)
  • tumblr milestone (2)
  • twitter (1)
  • twitter account (2)
  • typography (2)
  • unreal engine (1)
  • user behavior (1)
  • user experience (3)
  • user research (1)
  • user testing (1)
  • variable fonts (1)
  • video editing (2)
  • visual effects (1)
  • warishell (1)
  • Web Development (8)
  • webdec (1)
  • webdev (13)
  • windowed launch (1)
  • wordpress (1)
  • Work Culture (1)
  • workinprogress (1)
  • zoom (1)