19 April 2010

Technical look at the “SEOmoz automated link building tool”

Today’s post on SEOmoz, by Will Critchlow, details a specific example of creating an automated link building tool. More specifically, it’s a tool to monitor pages that reference your site, but don’t link to it. It achieves this by combining a google alerts RSS feed with YQL and Python using the google apps engine for deployment.

It’s great to see such a technical post on SEO! However, Will asked for a little more explanation on his code, so below is the code with full comments to explain what’s going on. There’s still a lot more that can be done with this, but I tried to separate out the reusable functions, so this module could be used separately by something else that either throws the results into a database or a flat text file.

After the code is an explanation on how to get this up and running.

# monitor.py - a script to monitor the web for references to your site
#              that don't link to it. It uses google alerts and yql to
#              find mentions and then check for links.

import yql

# the website that you want people to monitor to see people are linking to
# it when they mention it
monitorURL = "http://www.seomoz.org"

# path to the google alerts RSS for seomoz
rssURL = "http://www.google.com/alerts/feeds/02091889458087148316/10137124638087203861"

# determines if the given url has a link or not
def findLink(y, url):

    # this query will be used to get a page for the given url.
    # the @xpath part is just a placeholder that will be filled in in the next line.
    # see here for more: http://python-yql.org/usage.html#using-placeholders-in-queries
    htmlQuery = "select * from html where url='%s' and xpath=@xpath" % url

    # htmlQuery is used, and further filtered in xpath to only return results where there is
    # a link who's href starts with our monitorUrl
    htmlResult = y.execute(htmlQuery, {"xpath": "//a[starts-with(@href,'%s')]" % monitorURL})

    # this will return True if we found a link to our monitorUrl
    return htmlResult.count > 0

# returns the result of the yql query that gets the google alerts RSS feed
def getMentions(y):
    return y.execute("select * from feed where url='%s'" % rssURL)

# this if-statement just makes it so the code below it only runs
# if the script is called from the command line (or directly by google apps engine)
if __name__ == '__main__':

    # this is an HTML header so the result of running this file can be
    # properly read by a web browser, see more here:
    # http://code.google.com/appengine/docs/python/gettingstarted/helloworld.html
    print 'Content-Type: text/plain'
    print ''

    # make sure that this is yql-0.3 and not 0.2
    assert yql.__version__ == '0.3', "Must be yql version 0.3, you're running version: %s" % yql.__version__

    y = yql.Public()           # get the yql object
    rssResult = getMentions(y) # query the google alerts RSS feed for mentions of our website
    maxLinks = 10              # limit the number of links we view to 10 (this is an arbitrary number)

    # loop through the first 10 results from getMentions() to check for links
    for row in rssResult.rows[:maxLinks]:

        # get the url of the page that mentioned us
        url = row.get('link').get('href')

        # call findLink to see if the page linked to us
        if findLink(y, url):
            print "!No links!\t%s" % url
        else:
            print "Has links:\t%s" % url

How to set it up

This assumes you have python installed and access to a terminal (command line).

  1. Install the google app engine sdk
  2. Open the GoogleAppEngineLauncher and add a new app called monitor google app engine launcher - add app
  3. Open a terminal and go to the directory of your app, where you will need to add some libraries…
  4. Download python-yql-0.3 (0.2 won’t work with this script) and pull out the source code into the root directory of your app:

    curl http://pypi.python.org/packages/source/y/yql/yql-0.3.tar.gz > yql-0.3.tar.gz
    tar xvzf yql-0.3.tar.gz
    mv yql-0.3/yql/ .
    mv yql-0.3/oauth2/ .
    rm -r yql-0.3
    
  5. Do the same with httplib2:

    curl http://httplib2.googlecode.com/files/httplib2-0.6.0.tar.gz > httplib2-0.6.0.tar.gz
    tar xvzf httplib2-0.6.0.tar.gz
    mv httplib2-0.6.0/httplib2/ .
    rm -r httplib2-0.6.0
    
  6. Add the monitor.py source code to the root path of your app’s

  7. Open app.yaml (also in the root path of your app) and change script: main.py to script: monitor.py
  8. Click run on your app, and then try it out on http://localhost:8080

After this is all done, you should be able to deploy the app to whatever domain you setup with google. If you try this out, let me know if it worked for you too!

Comments (7)

1. LG wrote:

You're so smart, Peter Coles.

Posted on 20 April 2010 at 5:04 PM  |  permalink

2. randfish wrote:

Really cool stuff Pete. You've got to let us all know whether and how much value Hunch gets from implementing this system :-)

Posted on 21 April 2010 at 12:04 AM  |  permalink

3. Philippe wrote:

Hi Pete,

at point 4, it's : mv yql-0.3/yql/ . mv yql-0.3/oauth2/ .

instead of : mv yql-0.3/yql/ . mv yql-0.3/yql/ .

Thank you for this tutorial

Posted on 21 April 2010 at 8:04 AM  |  permalink

4. peter wrote:

Rand, I’ll let you know if we start using this at Hunch—we get a lot of non-linking mentions. If I went forward with it, I’d probably throw the results into the db and create a really simple workflow for someone to go through the urls. I guess the next logical step would be figuring out what to say when requesting the link.

Philippe, thanks for pointing out the typo! (fixed)

Posted on 21 April 2010 at 9:04 AM  |  permalink

5. i need a job wrote:

I too get a lot of references with no actual links so I am pretty interested in implementing this. Thanks for posting! ~Larry

Posted on 27 April 2010 at 8:04 AM  |  permalink

6. link building wrote:

it was a great article, has made a great understanding to link building work.. thanks. great job

Posted on 31 May 2010 at 3:05 AM  |  permalink

7. SEO Sheffield wrote:

Do you know of any good automated tools to use? I know most people suggest to do it manually,but if you know of any tools that save time

Posted on 5 February 2012 at 9:02 AM  |  permalink

Peter Coles

Peter Coles

is a software engineer who lives in NYC, worked at Hunch/eBayNYC, and blogs here.
More about Peter »

@lethys · github · rss

It’s time to get big money out of politics. Join the kick-started campaign to put government back in the hands of the people. Pledge mayday.us now