相較於其他稽核資料分析軟體系統，喜歡Arbutus 的100個理由：71～80 (Part 8)

David Chuang 2015-04-12 Arbutus資料分析技術, 顧問與技術支援服務

［請點按左列各項理由顯示內容］

71.快速 XML 轉換72. 運用條件設定來定義多重性質的欄位73. 可選擇替換不同日誌檔(Log)74. 長的記錄及長的文字型檔案(CRCL File)75. 匯出至Excel檔案(XLSX)76. At() / Substring() 函數可適用於整筆記錄77. 驗證結果可以下探記錄內容78. 負的小數位運用79. 程序(Procedure)編輯器自動縮排80. Ftype() 函數可以限制要搜尋的型態

Arbutus 不仰賴微軟的XML解析器，而是改用自己發展的ＸＭＬ轉換器，它的處理速度更快，經常在數秒之間完成結果，除此之外，XML的介面允許我們單選包含與不包含任何項目或群組。

Table definitions may include fields that are conditional in nature, meaning they only take on a value when the specified condition is true. Where a table definition requires conditional fields, Arbutus offers an easy means to set the condition. You may select any set of fields, and in one operation, set a condition on all of them.

Where the table layout is coming from COBOL, additional sort options allow you to place related field definitions together, to simplify the selection of fields.

Arbutus logs all processing activity in a log file as supporting documentation. When a project involves several separate documentation threads, you can switch between logs with a simple click in the Overview.

The maximum record length supported by Arbutus is 2GB. While you are unlikely to ever encounter a record length this long, the capability often allows an entire file to be read as a single record, for specialized processing.

In addition, it is not uncommon for normal text (CRLF) files to occasionally have unexpectedly long lines that will likely exceed the table definition. This may be due to data corruption or an unexpected or unusual situation. When this occurs, Arbutus automatically compensates, reading the entire line despite the fact that it violates the table layout.

Arbutus supports exports to all versions of Excel spreadsheets, including XLS and XLSX. Of course, when exporting to either of these formats, you can specify the sheet name to be created, and even create multiple sheets, if necessary.

Certain processing situations require that you inspect the entire record as a single unit. For example, you may be looking for something or extracting something. When looking for something, the At() function allows you to specify RECORD as the last parameter. When extracting something, the Substring()function allows the first parameter to be RECORD.

The results of a Verify command are typically presented in a tabular format. This table includes the record number, field name, expected field type, as well as the contents of the field both in text and hex. In addition to displaying the error, this table includes drill-down links, so that you can immediately move to the selected record and evaluate the error in the context of other fields and records around the error.

Certain field types store values as multiples of a larger value. For example, "Sales" might be the sales figure in thousands, so 125 stored in the field actually means 125,000.

One solution to this issue is to create a computed field to multiply by 1,000, but Arbutus offers another approach. In the same way that you can specify that a number includes decimals (effectively moving the decimal point to the left), you can specify negative decimals (effectively moving the decimal point to the right).

In this example, specifying decimals of -3 will ensure that the value is correct.

The procedure editor automatically indents commands in a Group.

The Ftype() function is used to either validate user input (ensuring that the supplied information is valid) or to ensure that the prerequisites for a particular test or calculation are present. Usually this validation involves testing that an object has a specific type, or perhaps a limited range of values.

To address this situation, the Ftype() function includes an optional parameter that allows you to specify which types are valid. If the object does not exist, or is not one of the valid types, then the function returns "U" for "Unknown".

第81個以後喜歡Arbutus的理由，陸續公佈中！

Performance Comparison: Arbutus vs. The Competition

Importing
XML: 74 times faster
Delimited/CSV: Instantaneous
XLS/XLSX: 2 times fasterExporting
XML: 4 times faster
XLS/XLSX: 3 times fasterFuzzy matching: 11 to 500 times fasterUse of time fields: 2 to 3 times faster

Notes:
1) In Arbutus, Import CSV takes zero time, regardless of file size, as delimited data is read directly.
2) Our Duplicates command is compared to their FUZZYDUP.
3) As their FUZZYDUP command does not support 'same' fields, we have concatenated the three keys.About the Tests:

Test computer specs: Dell I7-920, 2.67GHz, Windows 64 bit, 9GB RAM, 1TB disk
Most reads use a 125,000,000 record, 80 bytes long transaction file (10GB data size)
Fuzzy duplicates tests use a 50,000 record address file
Exports are 5,000,000 records, except Excel, which is 650,000
Imports are all 650,000 records
We chose 125 million records because most people interested in performance have big data. For comparison, Arbutus also ran the "big data" tests (Join through Summarize) on a 5 million record file as well. Analyzer took 50 seconds in total, while the competitor's version 10 took 89 seconds (the graph lines were too small to show individually).

David Chuang の觀網兆益數位莊盛祺的部落格

相較於其他稽核資料分析軟體系統，喜歡Arbutus 的100個理由：71～80 (Part 8)

關於 David Chuang

相關文章

發佈留言取消回覆

相較於其他稽核資料分析軟體系統，喜歡Arbutus 的100個理由：71～80 (Part 8)

關於 David Chuang

相關文章

數據分析如何使傳統稽核走向淘汰［ＡＩ翻譯文章］

Webinar: 瞭解及擷取資料品質 (Part I :11/8/2023 11:00 AM PST, Part II: 11/15/2023 11:00 AM PST)

我們十力展現

Arbutus Windows Server 使用者權限設定

發佈留言 取消回覆

發佈留言取消回覆