When your analysis requires information that is most easily accessed from the internet, Analyzer supports accessing web pages to obtain that data. Examples might include exchange or interest rates, but can theoretically include almost any web-based data.
Note: Analyzer can read any web page that can be saved as a favorite or bookmark.
Analyzer accesses the defined web-based data every time you open the table layout that defines the specified web page. The data that is displayed is the most current data on the web page up to the moment the table layout was opened. If the source web page changes frequently (such as a stock quote page), then the information will not be updated while the table layout is open. If you want to refresh the information simply re-open the table layout.
To define a data from a Web page, select the Web URL radio button on the Select Server Data Source panel, and specify a valid URL. Click [Next].
The Select Web URL Access Method dialog appears. Select the Access method and click [Next].
When you use the Wizard to define data on a web page, you need to consider what form the results should take. The options range from receiving the raw HTML (including all the tags), through to receiving formatted information extracted from a table on the page. Which format you use depends in large part on the structure of the information and your intended use.
If you are trying to implement a web site “crawler” that accesses every page on a site then you will clearly need the raw HTML including tags. On the other hand, if the information is presented on a single page in the form of a table of columns and rows, then selecting the table is likely appropriate.
Your choices are:
| • | All Text - all the text that is visible on screen from the web page (including headings, etc.) |
| • | Formatted Text Only - Only the text from the web page that is formatted into paragraphs or lists |
| • | Raw HTML Content - all of the actual underlying HTML code, including all tags. (This is essentially the same as selecting “view source” in your browser.) |
| • | Selected Table - a specific table from a web page. Since a web page can contain multiple tables, you can choose a single table from the available tables on the web page and the data it contains will be formatted into a tabular format, unlike the other options. |
If the selected web-based data is not in a standard tabular format, Analyzer will advance directly to the File Format dialog. Depending on the format of the web-based data, select the appropriate option (usually Manual Definition, Delimited File or Print Image (Report) File). See File Format - Local and Arbutus Windows Server.
When an individual table is selected from the web page, Analyzer will recognize the tabular data and will automatically treat the tabular data as a delimited file and will advance directly to the Delimited File Properties dialog.
In the Delimited Files Properties dialog, click [Next] to define the delimited fields. See Edit Field Properties.
Note: It is important to note that HTML is a specification that controls the visual presentation of information. Unfortunately, there is no requirement that the underlying HTML code be formatted in a manner that is in any way similar to the presentation. In fact, for most situations, the HTML coding appears very unstructured. The one exception is tabular data, where Analyzer automatically formats the data in a manner consistent with the visual presentation.
Note: Web content can change at any time. When integrating web based data into automated procedures, be sure to carefully validated the web data being accessed prior to relying upon it for analysis. If the web based data is not validated, resulting analysis may be invalid.