Every time you receive data from an outside source, there is a chance it won’t arrive in the form you expect. This can require special accommodations for the rare and unlikely to make a real-world data mapping and transformation solution robust and reliable.
We processed literally dozens of .gpx files, containing hundreds of coordinates each, through the MapForce mapping we wrote about in the blog post Web Service as a Look-Up Table to Refine GPS Data. Then one day we ran a new file and encountered the error below, which caused the mapping to fail:
Reaching into the Altova MissionKit to combine features of MapForce and XMLSpy, we quickly diagnosed the issue and developed a solution we can also reuse in future mapping projects.
We first suspected a problem with the input data, so we opened the file in XMLSpy, where it passed well-formedness and XML validation tests. Fortunately, each data point has a unique time stamp, so we searched for 23:06:22, marking the last set of GPS coordinates that processed successfully. That time stamp occurred once at line 1772 of the input file.
Nothing looked obviously wrong in the source data that immediately followed. We simply commented out the next data point, and saved the file to reprocess the mapping:
This time the mapping ran successfully:
Now we were suspicious of the data returned by the Web service. Even though the U.S. Geological Survey National Geospatial Program runs the Web service, maybe the underlying database contained invalid data.
We inserted a simple .csv file into the mapping as an alternate output and mapped the elevation results for each set of source coordinates to examine the Web service output.
One line in the output file diagnostic.csv contained the same value quoted in the previous error message:
It’s scientific notation! The Web service returned a number formatted in scientific notation! The round-precision function in our data mapping that processes the Web service result requires decimal input.
Datatype Conversion
One strategy might be to write a function that recognizes the Web service result as scientific notation and explicitly calculate the numeric value. The MapForce error message “Conversion to decimal failed for ‘-1.24202767892712E-06’” suggests a simpler solution.
This is a good time to think about datatypes. The Web service component in our mapping clearly indicates it returns a text string. MapForce automatically performs type conversions from string to decimal number when a mapping connects a string as input to a mathematical formula. In most cases, this frees developers from thinking about explicit type conversions as data moves between formats. In our mapping, MapForce successfully performed type conversion from a string to decimal 178 times before encountering the entry in scientific notation.
Scientific notation is normally used for numbers that are too large or too small to be conveniently recorded in decimal form. In MapForce, the decimal datatype does not specify the size, or value, of the number. Instead, it identifies the XML decimal type, a character sequence of digits with a period as the decimal separator.
In XML -- and in MapForce -- the double datatype supports scientific notation. We can explicitly cast data in scientific notation as the double datatype, then round the result.
This solution is easy to test in a simple mapping, using text files for both the input and output. We inserted a simple variable before the round-precision function and assigned its datatype as double. For our first test, we used the data captured from the USGS Web service as input, to run the same data without redundantly performing the Web service calls. This mapping also lets us easily build more test cases with new input data.
The mapping processed successfully, generating this output:
Build a User Function
User functions in MapForce are defined in one mapping file, and can be added to the Function Library for use in other mapping files, even by multiple users. User functions also encapsulate complicated operations and help make the overall data flow of a large mapping design much more traceable.
We had already modified the simple Web service call by choosing the database for eastern or western continental US based on the longitude. Now, adding explicit datatyping to the result makes calling getElevation even more complex. We chose to define everything in a user function.
Applying the User Function
In the mapping below, we inserted the new getElevationUS function.
At this point it’s good to recall why we rounded the elevation returned by Web service in the first place. The Web service returns a value in meters, and two decimal places -- or each centimeter -- is less than a half inch.
We could have included the round operation as part of the getElevationUS function, but the function will be more useful in future data mappings if it does not round the raw elevation data.
Output of the revised mapping is shown below, using the same input .gpx file that caused the initial problem. We searched the output file for 23:06:22, the time stamp we used to find the last good coordinates before the error. The following point starting on line 902 is the one that failed.
At first we were disappointed all this effort boiled down to an elevation that rounds to 0. Then we mapped the suspicious coordinates on a Google map:
Part of the route followed a bridge over a tidal inlet. Even if we never reuse the getElevationUS function in a future data mapping, it’s very likely other .gpx files for other trips will lead across other tidal inlets, where they might generate more very small elevation values.
If you would like use the tools in the Altova MissionKit to create user-defined functions for your own data mappings, click here to download a free trial.
No comments:
Post a Comment