The Distributed System Manager (DSM) is a cooperative, multiple process application that manages a distributed INtime system. The DSM tracks the state of the system, monitors the health of its components, and cleans up in the event of component termination or failure.
The DSM performs these tasks:
You can set up the DSM processes to communicate with one another via these avenues:
INtime application use NTX and the DSM to communicate with RT clients as shown in this illustration:
The DSM consists of the Windows DSM, a process that executes on the Windows host, and the RT DSM, a process that executes on each RT client. The bulk of the DSM work is performed on the Windows host. The processes cooperate to ensure the integrity of a distributed INtime system.
The RT DSM includes:
The DSM process monitoring mechanism refers to the monitoring of processes with declared dependencies. That is, the DSM continuously checks for the existence of those processes which have dependencies logged in the database. If it finds that a process has been deleted, it notifies all of the terminated process' dependencies as well as the terminated process' sponsors. The mechanisms used to monitor processes are different on Windows and RT systems.
Windows Side | Monitoring on the Windows host will be done by one thread which monitors all processes in one call of WaitForMultipleObjects. When one of the processes is deleted, the call returns, the DSM responds appropriately, and then returns to waiting on the remaining processes. |
RT Side | The RT DSM sets up a kernel task deletion handler which is notified whenever a task (thread) is deleted in the system. If the owner-job (the thread's process) is scheduled to be deleted, the kernel task deletion handler forwards the job handle to the DSM for processing. |
System monitoring is done by the NTX and the DSM only provides an API for handling shutdown/failure scenarios.
The "active" and "passive" logical entities in the Windows DSM can be best understood when discussed together according to the responsibilities which they perform. The Windows DSM provides the following functionality:
The Dependency Database stores all process-related dependencies, system information, and sponsor registrations. The active agents on both Windows and RT nodes monitor processes with dependencies, along with other responsibilities.
In the event that those processes are deleted, the dependent processes are notified. The passive agent, implemented as an API, allows user applications to communicate shutdown information to their dependents. It also allows dependencies to be declared or revoked, as well as providing services the ability to register for use by RT applications. The RT passive agent has the same purposes as the Windows passive agent, and will be implemented in the library rtman.lib. Each of these components will be detailed in the following sections.
The active agents on both Windows and RT nodes monitor processes with dependencies, along with other responsibilities. The Windows passive agent, implemented as part of the Windows Extension Library (NTX), allows user applications to control dependency relationships. The RT passive agent has the same purposes as the Windows passive agent, and is implemented in the RT Application library, RT.LIB. Each of these components will be detailed in the following sections.
The DSM Dependency Database stores all of the dependency information in the distributed system. It also stores information about available sponsors and the current system topology. User applications implicitly access the database via the NTX and RT APIs. The NTX API accesses the database via the IDsmApi Interface. In addition, DSM active agents also access the database during all shutdown scenarios using the IDSMInternal Interface. INtime tools may also access the database using the COM interfaces. The database is implemented using the registry and the Win32 registry API.
It is accessed by user applications implicitly or explicitly via NTX, which has full access to this database. In addition, DSM active agents also access the database during shutdown.
It is important to notice that dependency and sponsor data stored in the database is volatile. In other words, these parts of the database are completely rebuilt at Windows system shutdown and restart. Only system information is retained.
Access to database information is done internally via the Win32 Register API. Access rights to the database will be handled using Windows security descriptors. Changes to database information are done by removing the information and then re-adding the updated information. The nature of the registry assures that accesses to the database are serialized and atomic. It is up to the DSM to make sure that the database is updated to maintain accurate system/process state information.
Process related information is found under the "Distributed System Manager\Process" key. Each key under the process subkey represents a single process (either a Windows Process or an RT Process) and stores all of the relevant information about that process. The names of these keys represent the registered Sponsor name for sponsor processes, or generic, assigned names for clients.
The data elements stored under these keys contain the process' sponsor/client status, its state, its object relationships, etc. The following table describes the individual data fields and their use.
RT objects created by Windows applications are recorded for cleanup purposes in the database. Each object under this key corresponds to one RT object or to one RT Cataloging. The name of this key is a generic name assigned by the DSM. Each key may contain the following data elements.
Locating remote clients from the Windows portion of an INtime application includes:
System tracking on the Windows side involves adding new systems to the database, monitoring all systems in the database, and removing systems from the database.
You can add systems to the database only with the Configuration utility. Once systems are added, the DSM tracks their state, moving clients from the INACTIVE state to the ACTIVE state when it discovers clients from the database requesting assignment. If the requesting client's state is UNKNOWN or ACTIVE when the request is received, it is assumed that the system has shut down and is now restarting. The DSM initiates the termination process for the system and then initiates system startup for the system to bring the system back to the ACTIVE state.
The following diagram illustrates the states and state transitions of a client node as seen by the Windows Host.
State Transitions of an RT client node as seen by the Windows DSM:
RT Client Node States
NULL | This state is only for completeness. This state simple shows that the RT Client has not yet been registered within the Distributed System by the Configuration Utility. |
INACTIVE | Once a Client has been registered, it is placed in the INACTIVE state. The INACTIVE state signifies that the Client can join the system at any time, but has not yet booted or started communicating. When a Client is shut down, it returns to this state. |
CLEANUP | When the DSM recognizes that a Client Node has rebooted (or that some catastrophic error has occurred which will require a reboot), the system enters the CLEANUP state. In this state, all references to process on the down node are removed from the database and the proper notifications are made. |
ACTIVE | When the DSM has responded to the request for assignment, the DSM begins system monitoring for the node to make sure that the communication remains open. The system operates in this mode until a serious system event occurs. |
UNKNOWN | The UNKNOWN state is entered whenever the DSM cannot be sure about the status of the Client node. This could be caused by a Network failure, by the RT system being unable to respond to Pings because of work load (or debugging, etc.), or because of system hang, crash, or restart. Once the DSM recognizes what happened, the DSM will mark the system in the appropriate state and respond accordingly. |
RECOVERY | The RECOVERY state is entered when the Client has been marked as UNKNOWN and now the DSM has detected that the ACTIVE status can be restored without serious integrity problems. The RECOVERY state is an intermediary state in which the Windows DSM synchronizes with the RT DSM on the Client node. This is a fairly complicated procedure and is dependent upon the nature of the communication media in order to work at all. It includes things like message recovery, process dependency checks, etc. |
The following diagram illustrates the states and state transitions of an RT DSM Component.
State Transitions of the RT DSM:
RT DSM States
INIT | The RT DSM runs on RT Client nodes as either a first level job or as a loadable job. At the time of system boot, the RT DSM loads and performs the necessary initialization. |
REQUEST | In this state, the RT DSM makes a request to the Windows host for any configured assignments. Once the request has been responded to by the Windows@@@ DSM, the RT DSM moves into the NORMAL state. |
NORMAL | In the NORMAL state, the RT DSM monitors processes in dependency relationships. It remains in this state until some major system event takes place. |
NT_GONE | The NT_GONE state is entered when the RT DSM receives the Stay Active message from the Windows DSM. This message is sent when Windows is shutting down and the Configuration settings for the Client indicate that the Client should remain active after Windows shutdown. The Windows DSM sends the appropriate message to the RT DSM. The RT DSM then performs a "partial cleanup". A partial cleanup consists of notifying all RT Sponsor processes of the termination of their Windows dependents and notifying all RT Dependent processes that their Windows Sponsors have terminated. |
WAIT | The RT DSM enters the WAIT state after it has completed partial cleanup. It remains in the WAIT state until it receives notification from the Windows Host that Windows is back up and running. |
RECOVERY | During RECOVERY state, the RT DSM attempts to synchronize with the Windows DSM. The focus of the effort is on RT sponsors because all other stored information becomes invalid after Windows shutdown. In the case where Windows has not shutdown, the RT DSM attempts to verify that process dependencies are still valid. |
UNKNOWN | The UNKNOWN state is entered whenever some failure is detected. The RT DSM attempts to reestablish contact with the Windows Host in order to determine the state of the system. |
CLEANUP | In the CLEANUP state, the RT DSM makes all necessary notifications to processes in dependency relationships. It returns to the REQUEST state unless the RT sub-system has received a SHUTDOWN notification from the Windows DSM. |
Dependency tracking on the Windows side consists of recording dependencies in the dependency database, monitoring the appropriate processes, handling controlled and uncontrolled process shutdown, and removing dependency information from the database.
Once a process is added to the database as having a dependency, the DSM begins monitoring the process (either on the Windows side or the RT side). If shutdown/termination is detected, the appropriate processes are notified. If the terminated process was a sponsor, all associated dependents are notified of the termination via the deletion mailbox for RT processes and via the callback function for Windows processes. If the terminated process was a dependent, then the associated sponsors are notified of the termination. Once notified that a process is terminated, there is no longer a need for the dependency to be in the database and it is removed.
Processes also have the ability to explicitly remove a dependency without termination.
The ability to create RT objects from an Windows process is a new feature in INtime 2.0 and brings several advantages as well as added complexity. The managing of RT objects for Windows processes consists of storing ownership information registered by the NTX at the time of RT object creation and removed by the NTX at the time of RT object deletion, monitoring life of the creating process (done the same way as for dependent processes), and deleting the objects either via the NTX call, or after process termination detection. The RT object case is not significantly different from the process dependency case.
At the time of RT Object creation by an Windows process, the dependency relationship is added to the database. Active agent monitoring of the Windows process is the same as in the case of process dependency. Standard deletion using the NTX library API causes the dependency to be removed from the database. If Windows process termination is detected by the active agent, then the associated RT objects are also deleted.
When a process terminates, the DSM active agent recognizes process termination and notifies all processes marked as dependent that the process has terminated. In addition, the DSM notifies all sponsor processes of the terminated process.
Windows system shutdown is handled as multiple invocations of the process shutdown model. During controlled Windows system shutdown, every window application receives a message telling it to shut down. The DSM also receives the message and communicates the shutdown to the RT DSM Components in the system. The RT Client may choose to continue running based on its configuration. Uncontrolled Windows system shutdown is handled by RT nodes. Likewise, if an RT node is identified as shutdown, the Windows node handles the situation. All processes with RT dependents or RT sponsors are notified and the database is purged of the references to objects on the down node.
The RT process and system shutdown model is a mirror image of the Windows side. The only exception is that system shutdown is done as a message to the Process Deletion Mailbox of each process. The ability to shutdown an RT node is not given to the user in INtime 2.0. It is, however, a valuable internal feature.
This lists common operations related to distributed system management, and the RT kernel system calls that do the operations:
To . . . | Use this system call . . . |
---|---|
Register a dependency | ntxRegisterDependency RegisterRtDependency |
Unregister a dependency | ntxUnregisterDependency UnregisterRtDependency |
Register a sponsor | ntxRegisterSponsor RegisterRtSponsor |
Unregister a sponsor | ntxUnregisterSponsor UnregisterRtSponsor |
Find the location of a sponsor | ntxFindSponsor FindRtSponsor |
Notify of an event | ntxNotifyEvent RtNotifyEvent |
Register a DSM event handler | RegisterRtEventHandler |
Unregister a DSM event handler | UnregisterRtEventHandler |
Change event handler priority | SetRtEventHandlerPriority |
Manage Windows shut down | RtContinueWindowsShutdown RtShutdownBlockReasonCreate RtShutdownBlockReasonDestroy |
This shows the order to make DSM calls: