• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Remember Lenny

Writing online

  • Portfolio
  • Email
  • Twitter
  • LinkedIn
  • Github
You are here: Home / 2014 / Archives for March 2014

Archives for March 2014

Clear goals direct good development

March 28, 2014 by rememberlenny

Yesterday I discovered David Jackson’s blog A Founder’s Notebook on Fred Wilson’s blog. I skipped through a few posts and found Setting clear goals = empowerment. Jackson currates a wide array of content across other blogs and writes brief thoughts on the issue. I’m going to pull from his book.

From A Small World CEO Sabine Heller‘s Corner Office interview:

You have to manage people based on results and set clear goals. It sounds like a simple thing, but people don’t do that often. When I was 22 and working at UGO, it didn’t matter that I had no experience and it didn’t matter what my process was as long as I hit my goal. It taught me how empowering it is to be treated like that. I am a great manager for people who are strong thinkers and motivated. I empower people. I promote people. I give them a lot of leeway. At the end of the day, I look at results, and that’s it. I feel very strongly that organizations infantilize employees. You should treat them like adults.

At my current position as a software engineer at Conde Nast, I have been a part of two tech teams. I have worked on projects that affect platform level code. In my experience, I am most productive when project scope predetermined by the client. I can commit to a clear timeline, only if I understand the expected output. If I am given an unrefined project, there must be an predetermined time limit.

Have clear project definitions before passing on a project to engineers. If the project is not build, then run the ideas past a designer and start on a visual mock. If the visual mock passes the project developer’s taste, then build an prototype. Don’t waste time going into the production build. Figure out the issues and assumptions associated with the prototype. Once product owner vets a prototype, you can build. Someone who understands the potential conflicts of the prototype needs to provide a final review. If all is good, reevaluate a timeline and commit to it.

After the initial design/prototype is complete, an estimate should be accurate. Before design, an estimate can’t be accurate because the project scope will change. This is not the engineers fault. The person who determines the product must take full responsibility. This would be the product manager or the CEO of a company. Before the designer passes ideas onto development, refine it. Clear goals inspire effective work. Defined product expectations will also result in productive engineer time.

Filed Under: Uncategorized

Scraping web forums for image urls

March 7, 2014 by rememberlenny

Breakdown of posts by user

Goals

Im applying my graffiti interests toward my programming ability. Today I put together a brief python script to scrape a reputable graffiti writers forum. The scraper was designed to build a database of user names and images uploaded. My goal is two fold: learn about computer vision through analyzing graffiti pictures, develop insight into people’s usage of the internet for distributing graffiti images. Beyond analysis, I would love to create a graffiti recomendation engine for aspiring graffiti artists. This engine would look at a person’s ‘style’ and give suggestions of other artists they may benefit from studying.

Scraping process

I focused my first group of images scraped on New York city. I found the New York city forum thread in 12ozprophet.com. From here, I catered the scraper to tag posts and the images contained in the post. I created a unique ID for the images and the posts. Now that this has been completed, I may need to create a unique ID for the user names. As of now, this is not necessary.

The scraper pulled from over 440,000 posts within the New York graffiti thread. From those posts, over 275,000 images were gathered. In a brief overview, I can tell that not all images are graffiti. Similarly, not all images are of paintings. There are tags, throw ups, pieces, and sketches. The images vary in quality and size. The dataset pulled from forum posts between 2009 to 2014. I’m surprised to see how much camera quality and digital photography has changed in those few years.

Considering analysis

In the past few weeks, I have spent time understanding the application of statistical analysis on large data sets. Because I was working with small sets, I was able to use Microsoft Excel’s functions. Beyond 100 data points, excel becomes noticeably sluggish in processing simple PivotTable tasks. I will use R and python libraries to process future analysis.

I am reading various books I purchased in the past from Oreilly. The books topics are mainly around R, data analysis in Python, OpenCV, and machine learning. None of these topics are failiar to my background, or me so I have also picked up a basic book on linear algebra. I would like to finish the preliminary results from the first dataset by next week and move ahead with scraping the rest of 12ozprophet.

Breakdown of all posts by date

Code

The code used to scrape is pasted below. The dataset will be made available in the near future. Note, while scraping 12ozprophet, the images are hosted on a large variety of websites. Different users primarily uploaded the images, so the image analysis should not create a large load on the forums. I do not believe I need to download the images for analysis, but if I do, I will likely use some a virtual machine cloud instance.

Also, I am unfamiliar with the Python-way to write programs. I am primarily working with JavaScript and PHP. I have dove into Ruby on Rails, but am still dependent on the Gems and generators. As a result, much of my Python code is written with a JavaScript mentality.

#!/usr/bin/env python
import scraperwiki
import requests
import lxml.html    

# Constants
numberofposts = 7844
post_iteration = 0
user_iteration = 0
image_iteration = 0

# One-time use
global savedusername, date_published, post_image

savedusername = 'null'
dateremovetitle = """N   E   W      Y   O   R   K      C   I   T   Y - """
dateremovere = """Re:"""
ignoredimages = ['images/12oz/statusicon/post_old.gif','images/12oz/buttons/collapse_thead.gif','images/12oz/statusicon/post_new.gif','images/12oz/reputation/reputation_pos.gif','images/12oz/reputation/reputation_highpos.gif','images/icons/icon1.gif','images/12oz/buttons/quote.gif','clear.gif', 'images/12oz/attach/jpg.gif','images/12oz/reputation/reputation_neg.gif','images/12oz/reputation/reputation_highneg.gif','images/12oz/statusicon/post_new.gif']
for i in range(1, numberofposts):

html = requests.get("http://www.12ozprophet.com/forum/showthread.php?t=128783&page="+ str(i)).content
dom = lxml.html.fromstring(html)

print 'Page: ' + str(i)
for posts in dom.cssselect('#posts'):
    for table in posts.cssselect('table'):

        try:
            username = table.cssselect('a.bigusername')[0].text_content()

            if username != savedusername:
                if username != 'null':
                    savedusername = username
        except IndexError:
            username = 'null'

        try:
            post_iteration = post_iteration + 1 #my unique post id
            postdate = table.cssselect('td.alt1 div.smallfont')[0].text_content()
            postdate = postdate.replace(dateremovetitle, '')
            postdate = postdate.replace(dateremovere, '')
            postdate = postdate.strip()
            date_published = postdate
            # print '---'
            # print savedusername +' '+ postdate + ', ID: ' + str(iteration)
        except IndexError:
            postdate = 'null'

        for img in table.cssselect('img'):
            imagesrc = img.get('src')
            imagematch = 'false'

            for image in ignoredimages:
                if image == imagesrc:
                    imagematch = 'true'

            if imagematch != 'true':
                image_iteration = image_iteration + 1
                post_image = imagesrc            


                post = {
                    'image_id': image_iteration,
                    'image_url': post_image,
                    'post_id': post_iteration,
                    'user': savedusername,
                    'date_published': date_published,
                }
                print post

                scraperwiki.sql.save(['image_id'], post)

Filed Under: Uncategorized

Primary Sidebar

Recent Posts

  • Thoughts on my 33rd birthday
  • Second order effects of companies as content creators
  • Text rendering stuff most people might not know
  • Why is video editing so horrible today?
  • Making the variable fonts Figma plugin (part 1 – what is variable fonts [simple])

Archives

  • August 2022
  • February 2021
  • October 2020
  • September 2020
  • August 2020
  • December 2019
  • March 2019
  • February 2019
  • November 2018
  • October 2018
  • April 2018
  • January 2018
  • December 2017
  • October 2017
  • July 2017
  • February 2017
  • January 2017
  • November 2016
  • October 2016
  • August 2016
  • May 2016
  • March 2016
  • November 2015
  • October 2015
  • September 2015
  • July 2015
  • June 2015
  • May 2015
  • March 2015
  • February 2015
  • January 2015
  • December 2014
  • November 2014
  • October 2014
  • September 2014
  • August 2014
  • July 2014
  • June 2014
  • May 2014
  • April 2014
  • March 2014
  • February 2014
  • January 2014
  • December 2013
  • October 2013
  • June 2013
  • May 2013
  • April 2013
  • March 2013
  • February 2013
  • January 2013
  • December 2012

Tags

  • 10 year reflection (1)
  • 100 posts (2)
  • 2013 (1)
  • academia (2)
  • Advertising (3)
  • aging (1)
  • Agriculture (1)
  • analytics (3)
  • anarchy (1)
  • anonymous (1)
  • api (1)
  • arizona (1)
  • Art (2)
  • art history (1)
  • artfound (1)
  • Artificial Intelligence (2)
  • balance (1)
  • banksy (1)
  • beacon (1)
  • Beacons (1)
  • beast mode crew (2)
  • becausewilliamshatner (1)
  • Big Data (1)
  • Birthday (1)
  • browsers (1)
  • buddhism (1)
  • bundling and unbundling (1)
  • china (1)
  • coding (1)
  • coffeeshoptalk (1)
  • colonialism (1)
  • Communication (1)
  • community development (1)
  • Computer Science (1)
  • Computer Vision (6)
  • crowdsourcing (1)
  • cyber security (1)
  • data migration (1)
  • Deep Learning (1)
  • design (1)
  • designreflection (1)
  • Developer (1)
  • Digital Humanities (2)
  • disruption theory (1)
  • Distributed Teams (1)
  • drawingwhiletalking (16)
  • education (3)
  • Email Marketing (3)
  • email newsletter (1)
  • Employee Engagement (1)
  • employment (2)
  • Engineering (1)
  • Enterprise Technology (1)
  • essay (1)
  • Ethics (1)
  • experiement (1)
  • fidgetio (38)
  • figma (2)
  • film (1)
  • film industry (1)
  • fingerpainting (8)
  • first 1000 users (1)
  • fonts (1)
  • forms of communication (1)
  • frontend framework (1)
  • fundraising (1)
  • Future Of Journalism (3)
  • future of media (1)
  • Future Of Technology (2)
  • Future Technology (1)
  • game development (2)
  • Geospatial (1)
  • ghostio (1)
  • github (2)
  • global collaboration (1)
  • god damn (1)
  • google analytics (1)
  • google docs (1)
  • Graffiti (23)
  • graffitifound (1)
  • graffpass (1)
  • growth hacking (1)
  • h1b visa (1)
  • hackathon (1)
  • hacking (1)
  • hacking reddit (2)
  • Hardware (1)
  • hiroshima (1)
  • homework (1)
  • human api (1)
  • I hate the term growth hacking (1)
  • ie6 (1)
  • ifttt (4)
  • Image Recognition (1)
  • immigration (1)
  • instagram (1)
  • Instagram Marketing (1)
  • internet media (1)
  • internet of things (1)
  • intimacy (1)
  • IoT (1)
  • iteration (1)
  • jason shen (1)
  • jobs (2)
  • jrart (1)
  • kickstart (1)
  • king robbo (1)
  • labor market (1)
  • Leonard Bogdonoff (1)
  • Literacy (1)
  • location (1)
  • Longform (2)
  • looking back (1)
  • los angeles (1)
  • Machine Learning (13)
  • MadeWithPaper (106)
  • making games (1)
  • management (1)
  • maps (2)
  • marketing (4)
  • Marketing Strategies (1)
  • Media (3)
  • medium (1)
  • mentor (1)
  • message (1)
  • mindmeld games (1)
  • Mobile (1)
  • Music (2)
  • Music Discovery (1)
  • neuroscience (2)
  • new yorker (1)
  • Newspapers (3)
  • nomad (1)
  • notfootball (2)
  • npaf (1)
  • odesk (1)
  • orbital (14)
  • orbital 2014 (14)
  • orbital class 1 (9)
  • orbitalnyc (1)
  • paf (2)
  • paid retweets (1)
  • painting (1)
  • physical web (1)
  • pitching (2)
  • popular (1)
  • post production (1)
  • Privacy (1)
  • process (1)
  • product (1)
  • Product Development (2)
  • product market fit (2)
  • Programming (6)
  • project reflection (1)
  • promotion (1)
  • prototype (17)
  • prototyping (1)
  • Public Art (1)
  • Public Speaking (1)
  • PublicArtFound (15)
  • Publishing (3)
  • Python (1)
  • quora (1)
  • Rails (1)
  • React (1)
  • React Native (1)
  • real design (1)
  • recent projects (1)
  • reddit (3)
  • redesign (1)
  • reflection (2)
  • rememberlenny (1)
  • Remote work (1)
  • replatform (1)
  • Responsive Emails (1)
  • retweet (1)
  • revenue model (1)
  • rick webb (1)
  • robert putnam (1)
  • ror (1)
  • rubyonrails (1)
  • segmenting audience (1)
  • Semanticweb (2)
  • Senior meets junior (1)
  • SGI (1)
  • Side Project (1)
  • sketching (22)
  • social capital (1)
  • social media followers (2)
  • social media manipulation (1)
  • social media marketing (1)
  • social reach (5)
  • software (3)
  • Soka Education (1)
  • Spatial Analysis (2)
  • spotify (1)
  • stanford (2)
  • Startup (21)
  • startups (7)
  • stree (1)
  • Street Art (4)
  • streetart (5)
  • stylometrics (1)
  • Technology (1)
  • thoughts (1)
  • Time as an asset in mobile development (1)
  • Towards Data Science (4)
  • TrainIdeation (42)
  • travel (1)
  • traveling (1)
  • tumblr milestone (2)
  • twitter (1)
  • twitter account (2)
  • typography (2)
  • unreal engine (1)
  • user behavior (1)
  • user experience (3)
  • user research (1)
  • user testing (1)
  • variable fonts (1)
  • video editing (2)
  • visual effects (1)
  • warishell (1)
  • Web Development (8)
  • webdec (1)
  • webdev (13)
  • windowed launch (1)
  • wordpress (1)
  • Work Culture (1)
  • workinprogress (1)
  • zoom (1)