Tuesday, August 16, 2011

Processing the Groupon API with MapForce – Part 2

In Part 1 of this series we described how to connect Altova MapForce to the Groupon API. We queried the API for a list of Groupon divisions, then used the list to create API queries for all the current deals from every division.

In this part, we will execute the /deals queries and filter the response for the most interesting data.

The list of /deals queries we built previously looks like this:

List of Groupon /deals queries generated by Altova MapForce

To process all the queries, we can connect the list as a dynamic file input to a new mapping component.

When we needed a new component last time, we dropped an API /divisions query into the mapping, and let MapForce create an XML Schema automatically. We could do the same thing here by dropping in an API /deals query as an XML input file. There’s just one small issue -- although the Groupon API online documentation clearly describes the queries we can make, it is vague about the information that will be returned. Before we send dozens of queries to the API for all the current deals, we probably want to know a little more about the data that will come back.

Let’s Make a Deal

Like Yogi Berra said, you can observe a lot just by looking. Let’s start by running a /deals query in XMLSpy. That will let us examine the response to a query for one division before we pull in a potentially unwieldy volume of data.

The XMLSpy File / Open menu includes the same Switch to URL option we used in MapForce in the earlier post. If we enter the /deals API query for a division that covers a large metro area – say Dallas – we are likely to get enough deals instances to extrapolate the characteristics of the entire data set.

XMLSpy opens the response to the /deals API query in Text view just as if we opened a local file:

Example from the response to a Groupon /deals query, shown in XMLSpy

As expected, we got quite a bit of data when we requested all the deals for a single division!

A fast way to analyze the structure of this data is to use the XMLSpy DTD / Schema menu option to generate an .xsd file from the xml. Shown below is a reduced view of the entire generated .xsd file based on the response to the /deals query for Dallas:

An xsd file generated by XMLSpy from the Groupon query

We can dig even deeper, following Yogi’s advice like déjà vu all over again. Expanding all the elements to review the XML Schema reveals some curious anomalies. For instance, there are two elements named redemptionLocation with different definitions. The first contains a sequence of child elements:

First use of the remdemptionLocation element

And the second is defined as a simple string:

Second use of the remdemptionLocation element

Going back into the xml data for Dallas and searching for redemptionLocation displays these examples:

One example of redemptionLocation in the body of the response

And:

One example of redemptionLocation in the body of the response

And:

One example of redemptionLocation in the body of the response

Now this is really interesting, because redemptionLocation = ”online” identifies deals that can be redeemed from anywhere, instead of by a visit to a bricks and mortar location in the division where they are advertised. What if we ran the /deals API queries for all divisions and extracted a list of all the online deals? That would be one extreme Groupon!

Only Ask for What You Need

The Groupon /deals API query supports an optional parameter called &show= that allows users to limit the data returned. Applying this parameter can save bandwidth and reduce processing time for the data transformation by removing unwanted data from the API response.

We can also simplify our final result by including only the most interesting information, including the link to the Groupon web page for each deal. After we remove unwanted elements from the generated Dallas schema, our final version for the summary of online deals looks like this:

XMLSpy Schema diagram of the simplified Groupon xsd file

When we add the &show= parameter to our MapForce mapping to request only the elements included in the simplified XML Schema, the queries look like this:

Modified list of queries with the &show= parameter

Now we can drop the revised .xsd file into the mapping and connect the list of API /deals queries as dynamic input. We don’t need to delete the text file we used to collect the list of queries -- that might continue to be helpful for future debugging.

Mapforce dynamic input file mapping

These changes complete the input side of the data mapping.

Defining the Data Transformation Output

Back in XMLSpy we can make a couple more revisions to the input XML Schema to design a new version for output:

XMLSpy schema diagram of the output file xsd

We discarded the response element since it doesn’t add any value, and eliminated the redemptionLocation element that we don’t intend to include in the output. We also added a date element for a timestamp, because our output file will be a snapshot of data that is constantly changing. After saving this version of the .xsd file in XMLSpy, we can drop it into the MapForce mapping.

Shown below is the output side of the mapping with the output component partially connected. The filter at the top reads the redemptionLocation element to select only online deals and the now function inserts the date:

Partial view of the MapForce output file mapping

The last revision we made in the output XML Schema was to change several element types from dateTime, Boolean, and integer to the string data type to allow more descriptive text

Here is the complete definition of the mapping with the final connections to the output component:

Mapforce data mapping for the Groupon API

Now for the Payoff

When we click the Output button MapForce processes the entire mapping from beginning to end using the MapForce Built-in execution engine. Here’s a breakdown of the steps:

  • Run the /divisions query to get the current list of divisions
  • Concatenate strings to build the list of /deals queries for all divisions
  • Run the /deals queries to create dynamic data for the input component
  • Filter for online deals to generate the output component, execute the remaining mapping functions, and add the timestamp after all the deals are processed

MapForce takes only a few seconds to complete all those steps and generate an output file with a series of deals that look like this:

Output data from the MapForce mapping for the Groupon API

In part 3 of this series we’ll design a stylesheet to automatically transform the XML output of our mapping into html for attractive presentation in a web browser and on mobile devices. See ya at the ballpark, Yogi!

XMLSpy and MapForce are available together in the specially priced Altova MissionKit. See for yourself how easy it is to use the MissionKit to convert data from a Web API -- download a free 30-day trial!

Editor’s Note: Our original series on mapping data from the Groupon API ran in three parts you can see by clicking the links here:

Part 1 of Processing the Groupon API with Altova MapForce describes how to create dynamic input by collecting data from multiple URLs.

Processing the Groupon API with MapForce – Part 2 describes how we filtered data from the API and defined the output to extract only the most interesting details.

Processing the Groupon API – Part 3 describes formatting the output as a single HTML document optimized for desktop and mobile devices, and reviews ways to automate repeat execution.

6 comments:

jailbreaker said...

Hello!Thanks a lot for review. But I have a question. Where do you add &show= parameter?I added to the constant (you wrote about it in Part 1). So when I press the output button my results is the same is yours on the picture. But I got confused later on. I don't actually understand how do you execute all api queries? Could you please explain how to execute all this groupon api links? (even without filtering for an online deals and ets...)

DaveMcG said...

Hi-

When we connected the list of /deals queries to the XML Schema called GrouponDealsStripped, that defined input side of the data mapping. (Shown above in the image of the data mapping just above the heading Only Ask for What You Need.)

We also need to define an output component before we can actually execute all the /deals queries as part of our data mapping.

The easiest way to test the list of queries is to insert another copy of the GrouponDealsStripped XML Schema. You can even copy and paste it right inside the mapping window. Then you can connect all the elements of the two copies of GrouponDealsStripped to simply pass through the data from the input component.

Make sure the eye button on the second copy of the XML Schema is selected, then click Output at the bottom of the mapping window. You should see a very large file containing all the data returned by your /deals queries!

DaveMcG said...

(The &show= parameter should be added to the constant connected to the concat function discussed in Part 1.)

jailbreaker said...

Thanks for the response.
But I still can not figure out couple things
Correct me if I'm wrong...
1)To generate .xsd file using XMLSpy DTD / Schema menu option I click "generate DTD/Schema" and assigned generated Schema to XML document.
2)Then after removing unwanted elements from the generated schema I saved it as GrouponDealsStripped.xsd
3)To drop the revised .xsd file into the mapping I Insert "GrouponDealsStripped.xsd" file using "insert XML Schema/File" in MapForce. (When MapForce asked me if I want to supply a sample XML file, a global resource,or not supply any at all, I skipped it and chose respond as a root element)
4)I connected the list of API /deals queries with this file and then I was trying to test the list of queries. So I copied GrouponDealsStripped schema and connected all the elements of the two copies (starting from deal)
5)Then I pressed the eye button on the second copy of GrouponDealsStripped schema and clicked Output.

The result is:





I was trying everything but no results. Could you please correct me.
Looking forward

DaveMcG said...

Hi-

Sorry you're still having trouble. Everything you're doing sounds right. I didn't assign the XML Schema to the working file, but that would not make any difference.

One thing to double check is that the concatenation is connected to the File link on the input schema, not the response element. The mapping should say File: "dynamic" at the top of the input component.

We can't see your contact information from the comment, but you can contact us by email to marketing@altova.com or through Altova Support at http://www.altova.com/support_center.html

--davemcg

DaveMcG said...

Part 3 of the series is available here:

http://blog.altova.com/2011/08/processing-groupon-api-part-3.html