RedTitan nDP - Enterprise Data Processing (Version 1)

What is nDP?

This is RedTitan's dataflow processor for performing complex transformations on data and related data-processing tasks. Although based around an XML model, nDP can be used to process any data format. Data flows through a network of functional blocks to acheive the required transformation. Generally, data will flow in and out of each block with all the functions processing in parallel.

nDP will run any number of tasks in parallel. Each task is represented by an icon on the nDP desktop which depicts the processing status of that task. A task is effectively a single nDP program and can be edited to reveal the functional network that describes its actions.

Data Format

All data processing is done using a data stream structurally based on the XML data format. Data consists of character data (including up to 30-bit unicode support) with embedded "elements", "processing instructions" and "comments". Raw binary data is treated as characters in the range 0-FF. The elements have "attributes", being a list of name/value pairs, and an element heads a data stream which itself may contain character data and further embedded elements etc.. The character data includes a distinct "line-end" concept - which is not any particular character sequence. Internally, attributes can be numeric or Boolean values as well as string values. Users of this product should be familiar with the XML data format (see XML).

Data is processed as a stream. Data streams flow through functional blocks which are responsible for all the data processing activities. All data processing occurs in parallel (as far as is possible) - and as the data flows are driven by demand, any unwanted results are simply not computed. Where functions must ask the user for input - this occurs when the demand exists for the result - and the resulting effect is somewhat like a wizard. Some functions will wait for external events such as the arrival of a file or operator input or even the completion of another program.

The element headers (with their attributes) are the main target for data processing operations, the data component of elements being carried by the header. To operate on the data itself, fields must be extracted and added to an elements attributes. Most nDP processing functions make extensive use of attribute values.

Interface

There is a front panel of tasks and services that the user can use for both one-off and unattended activities. A simplified version of this interface is also available via the nQ remote-access mechanism, i.e. presented in a browser window. The nQ icon on the nDP main window shows the status of remote access.

The dialogue called the "Director" is used to organise the activity. Edit an existing task by choosing "Edit" from the right-clicking the task icon, or use "New" from the file menu of the Director.
Add functional blocks from the bar at the top of the Director. Link them by dragging between the input and output arrows on the blocks. Set up constant data using right-click popup menu, either to change the properties or just feed literal data into an input. Shift-click directly starts either properties or constant input as appropriate.

The technique for constructing or editing the flow networks is to make use of nDP's ability to stop and restart a task very easily. The display function called "View" is the most useful block when developing a network. Starting with example data, you can build up the network one step at a time, each time running the task and viewing the flow leaving the final block. By this means you can quickly identify problems and ensure each step is correct before adding the next. Because nDP is a functional processor the only way that one function can impact on another is where they are both interacting with the external environment (e.g. accessing files).
When construction a network to process files you can simplify directory specifications by using the special function "Dir" which associates a name with a directory path that can then be referenced by other functions using its name followed by "::" wherever a filename would be written. This allows you to quickly change the directories used from a testing to a production environment without having to look through the whole processing network.

Functional blocks are grouped into 7 groups by their nature, and this is shown by their colour coding. You can click the coloured buttons at the left of the toolbar to move to/though the groups. The functional blocks have been given different shapes to aid recognition. The functions are documented in the nDP Reference manual which you will probably need open while editing a flow network - this manual is the primary guide to the specifications of the functions and is always issued with each new release of nDP.

Warning: you can edit while the task is actually running! The status icon is repeated from the desktop in the corner of the Director so you can see it's state. A running task will only be reset if your changes have compromised the data flow. You can change the state of a task by clicking its status icon - use right-click for menu of commands or just click for nDP to change it to a likely next state.

Typical applications


Task Specification

Internally, tasks are installed by the "task installer" functional block. This means that task details are conveniently be stored in XML format.
On startup, a file called *.XMJ (where * is the name and location of the EXE file) is XML-decoded and fed to the task installer. This sets up the initial state. A convenient action of this task would be to install all XMJ files from a specified directory thus allowing each file to contain just one task (Note that he "director" does not support editing multi-task files).
The current nDP installation comes with an nDP.XMJ which installs tasks from a TASKS subdirectory of C:\REDTITAN\NDP and installs and runs tasks from the AUTORUN subdirectory.
The user can optionally install other task lists from the primary menu.


<TASK NAME="Task-Identifier" [ INSTANCES="count" ] >
...
</TASK>
(next task)

Each task is specified by a TASK element. When the task installer finishes reading the element the task will be installed and set into the required initial state.

The INSTANCES count specifies how the task behaves when activated. A zero count means the task deletes itself on completion. Positive counts indicate how many copies of the task may run in parallel.

Inside the TASK element is the complete specification of the functional processing that the task is to perform. Each functional block is represented by an element, and these link together in various possible ways.

Technical notes

nDP will attempt to limit is use of the PC's main memory to avoid paging by the operating system as this would impact badly on its performance. Data that cannot be processed will be spooled to disc in a single database constructed in the MS-Windows temporary folder. The "About" display shows both MS-Windows and nDP memory usage information.

There are very few limits on the sizes that data can be. The main one is that the maximumm length of names for elements and attributes is currently set to 64 characters (unless Unicode characters >FF are encountered). There is no particular limit to the size of an attribute string value or of data content or the number of levels of nesting but some functions may perform with sub-optimal speed if unusually structured data of massive scale is encountered. Most processing limits will be set by the PC platform being used.

Note that nDP makes use of itself to load and save task descriptions.