Altova Blog: November 2011

Monday, November 28, 2011

Processing the Groupon API – Epilogue

Rare edge cases can derail loosely coupled data mapping applications. This is especially true when you are consuming large datasets available over the Internet and have little or no influence over the source data.

In this article we describe a debugging technique that lets developers working on data mapping and transformation projects quickly identify and accommodate unexpected data in a stream from a remote source.

The Problem

Last summer we wrote a series of blog posts describing how to work with the Groupon API to retrieve a subset of offers in all Groupon cities and format the list for a web browser or mobile device.

We concluded with a command line to run a MapForce data mapping that calls the Groupon API over 150 times -- once for each Groupon city, then filters the data to extract deals sold on the Internet instead of a physical location, and formats the results in HTML using StyleVision. Every morning we run the command line in a batch file that saves the HTML output on a local server so our colleagues can check it out with any Web browser to find interesting offers from all over the country.

The mapping ran fine for more than two months until one day it failed with this error message: “Source-value “” of type dateTime could not be converted into target-type dateTime.”

Analyze Football Statistics using the Altova MissionKit

In this article we use stats from NFL.com and ESPN.com to show how easy it can be to process and analyze online data in new ways – even when it uses different metrics and is only available in textual format.

We have seen in previous blog posts how easy it is to gather data from the Internet that is widely available in XML formats. But what about interesting data that is available online but not in an XML format, or data that is buried in legacy data processing systems and only available in textual report format?

One such example involves quarterback ratings. The NFL has used a Passer Rating that rates quarterbacks solely based on a passer’s completions, attempts, touchdowns, and interceptions. ESPN introduced a new rating system this year called the Total QBR (Quarterback Rating). The Total QBR incorporates more data, including an expected points average and a clutch play index, that ESPN claims gives a more accurate measure of a quarterback’s performance.

Let’s compare the rankings that these system produce to see if we can garner some useful information. For this example we’ll be using the data importing and analysis tools of the Altova MissionKit to compare the ratings. If you want to try this out yourself, the MissionKit is available to download for a 30 day free trial from the Altova web site. You can access the files used in this example here.

The first thing we need is the raw data to analyze. Let’s use the entire 2010 season as a data source. We can get the table with Passer Ratings from NFL.com and then copy and paste it as a new text file.

We can access a similar table of Total Quarterback Ratings from the ESPN web site and create a second text file.

We now have two text files with tables of data in different orders. The next step is to combine the tables into one file and generate charts.

Mastering Paid Keywords

Anyone who manages paid keyword search knows it is hard work! You can look at vast reports of raw statistics and quickly get lost in trivia. At Altova we designed a better way to analyze and manage the performance data for our Google Adwords campaigns. We can creatively query the numbers to:

· Quickly aggregate results for subcategories of campaigns, for instance by product, geographical region, or any other grouping

· Easily identify trends over time

The chart below illustrates these advantages by collecting data for a single Altova product – SemanticWorks – from multiple campaigns over six individual months.

Starting Out

Like many keyword advertisers, we were viewing statistics in Adwords, downloading CSV files, then spending hours massaging and manipulating the data in spreadsheets to identify and format the information we required.

We wanted more immediate and in-depth reporting of keyword performance while retaining full control of the process and managing everything internally. SQL queries of a database of keyword statistics offer a powerful and flexible alternative.

In the remainder of this post we explain how the database design, data mapping, and reporting features of the Altova MissionKit can be applied to create an architecture to efficiently track paid keyword performance.

DiffDog Takes to the Cloud

Techy folks generally have a good diff tool they rely on to compare and sync files and directories. But what happens when, as more and more info is bound for the cloud, your data lives on servers accessed via URL?

There are myriad applications today that live on servers accessed via HTPP – but let’s take a look at a common example: SVN. Subversion (SVN) repositories include WebDAV as a commonly used server option. WebDAV is a natural protocol for SVN because its concern is hierarchy, structured metadata, and versions. Since WebDAV is an extension of HTTP it gives easy access to basic information about files and folders to any HTTP-aware client, including DiffDog – Altova’s diff/merge tool for files, directories, and databases. However, DiffDog knows a few tricks that set it apart from the other breeds.

Digging deeper with the Twitter API: iPhone 4S vs. Galaxy Nexus

We found some interesting data when we dug below the surface of the iPhone 4S vs. Galaxy Nexus debate using the Twitter Search API.

In today’s world there is a vast quantity of data available online that can be used for research, market analysis, and competitive intelligence. While “Big Data” can be a problem for those who produce it, store it, and compile it, it is highly beneficial for those of us who are looking for answers.

Some of that data is fortunately available to be queried online, and, in particular, there is a vast quantity of data on social media interactions out there.

In this article we will explore how to use the Twitter Search API from MapForce, Altova’s data mapping/conversion/integration tool, to aggregate data on recent user submissions (“tweets”) on two highly popular topics – the Apple “iPhone 4S” vs. the “Galaxy Nexus” as the latest hot Android phone – and extract some statistical data about the users engaged in those discussions.

Case Study: Altova Customer Succeeds with XBRL

XBRL is mandated for most public companies. So why are private organizations and non-profits jumping on the bandwagon? This case study examines a real-world success story.

We were really excited when the folks at MACPA told us about their success working with XBRL. They set out to discover if XBRL could be used successfully (without a huge upfront investment) by small businesses and NPOs and ended up confirming not only that, but realizing benefits to their internal financial processes, as well.

Monday, November 28, 2011

Processing the Groupon API – Epilogue

Tuesday, November 22, 2011

Analyze Football Statistics using the Altova MissionKit

Thursday, November 17, 2011

Mastering Paid Keywords

Tuesday, November 15, 2011

DiffDog Takes to the Cloud

Tuesday, November 8, 2011

Digging deeper with the Twitter API: iPhone 4S vs. Galaxy Nexus

Thursday, November 3, 2011

Case Study: Altova Customer Succeeds with XBRL

Visit the Altova website

Subscribe to the Altova Blog

Like It

Blog Archive

Label Cloud

Altova Links