Mathematical support and software for data processing in robotic neurocomputer systems

The paper addresses classification and formal definition of neurocomputer systems for robotic complexes, based on the types of associations among their elements. We suggest analytical expressions for performance evaluation in neural computer information processing, aimed at development of methods, algorithms and software that optimize such systems.


Introduction
It is obvious now that productivity of computing devices and systems is insufficient for solving a number of tasks, such as processing of video and audio data, real-time management robotics, recognition of images, forecasting, optimization, and artificial intellect tasks for robotics.The problem is due to several causes: it is impossible to increase the frequencies of computing devices because of the "technological restrictions" we currently face; we lack effective methods, algorithms and software solutions for parallelization of operations when multiprocessor and multinuclear systems are used; we have a limited number of generic methods, algorithms and software means of parallelization, when employing specialized computing devices (neuroprocessors, DSP-processors, etc.).
One of the ways to deal with insufficient productivity in solving a number of tasks is to employ high-performance computing systems with massive parallelism.Such parallelism in numerous applications that will perform complex processing of video, images, communication, security, and other types of signals is mainly present at the data level (Data Level Parallelism -DLP).In the majority of modern processors, DLP is implemented through a single command stream and a multiple data stream, in «single instruction, multiple data» (SIMD).The above applications also show considerable Instruction Level Parallelism (ILP).In such cases, parallel assignment of multiple scalar operations and/or SIMD-operations provides a basis for their parallel performance at the instruction level.However, a SIMD-architecture may not be the best solution for applications with variable DLP, which will decrease productivity and increase power consumption.Modern computing systems use «very long instruction word» (VLIW) architectures that implement ILP and DLP.Nevertheless, popular von Neumann-type processors cannot implement parallelization effectively, because of complexparallelized constructs in their algorithms, such as cycles, conditional branching, and dependence of data).This paper proposes a method to accelerate the development of methods other than frequency increase, to improve highspeed response: using high-parallel specialized computing hardware (e.g., neurocomputers), and development of innovative easily parallelized algorithms with elements of intellectual compilation.The idea behind this is to simplify the software and extract as much "implicit parallelism" as possible at the command level, using Wide Issue Width (WIW) in command outputs and long pipelines with Deep Pipeline Latency (DPL).Neurocomputers are next generation computing devices consisting of many concurrently running simple computing elements (neurons).These elements are connected in a neuron network.They perform uniform computing operations and do not require external control.The great number of concurrently running computing elements ensures high-response performance and low energy consumption.Besides, neurocomputing applications do not contain any elements that are hard to parallelize, and all computing they do is multiple, parallel, independent, and neuro-based.

Mathematical formalization of data processing in robotic neurocomputer systems
Thus, our intention to develop the first universal model of programming for neurocomputing devices and systems.
The goal of the current research is to develop mathematical tools for optimizing parallel, distributed and cloud computing systems based on neuroprocessors.
Our objectives are as follows: -classification of computing systems built on neuroprocessors, according to associations among their elements; -formalization of parallel, distributed and cloud computing systems -neuroprocessor computing systems (NPCS); -defining analytical expressions of performance evaluation in neurocomputer systems of data processing; -development of optimization methods for parallel, distributed and cloud systems of neurocomputer data processing.
-partitioning of software tools of optimization in parallel, distributed and cloud systems of neurocomputer data processing.
To solve these tasks, we employ the set-theoretic approach, methods of system analysis, methods of optimization and of computing process planning.
We introduce a classification of computing systems based on neuroprocessors, according to types of associations among their elements [4][5][6][7].
1.A parallel neuroprocessor computing system is one whose elements are connected on a bus level.The element of a computing system is its computing nucleus, the neuroprocessor.Computing nuclei may refer to specific processors (1 nucleus for each) or to several multinuclear processors.For purposes of simplifying the model, they are regarded as an integrated set of neuroprocessor computing modules (NPCMs) } ,..., , { 2 1 q P P P P = .Processes that are executed on computing nuclei can directly read and write information in the RAM and in the disk storage of the node, through the interface of a system bus or commutator.
A parallel system can be defined in this way: We can single out the following properties: As the processors are interconnected through a bus, delay times between data transfer Ttr and latency time Tl are negligible: . We can also describe the system's dynamic parameters at any moment of time is the available disk storage capacity of the i th NPCM.
An important feature of neuroprocessors is their continuous performance and training (in each command), which consists in data exchange between the RAM and the computing nucleus.This makes it important to correctly choose the external bus configuration for interaction with RAM in a multiprocessor mode.Neuroprocessors, like most DSP processors, support both single-processor and multiprocessor work modes along a single or multiple external bus.Let us analyze the most common option, with two buses.If two processors are connected to a shared memory, its access arbitration occurs without an external controller.It should be remembered that only oppositely charged processor buses can unite: a local bus of one processor with the global bus of another, as following a system reset, only one processor will be authorized to access the shared memory.
2. Distributed neuroprocessor computing systems (DNPCSs) are complexes of NPCMs or autonomous computer systems (CSs) that are situated at a distance and interact via programmable commutators and system devices.It should be noted that the rules of parallel and distributed processing are not the same.A GRID system is an example of distributed systems.
One of the features of a distributed system is absence of shared memory, except for a case when a unified address space [5] is provided (we do not discuss it here).Thus, processes that are executed on computing nuclei can directly read and write information in the RAM and disk storage of the node, through the interface of a system bus or commutator.
The notion of cluster is essential in a distributed system.We refer to a neuroprocessor cluster when we mean a group of computers, whose computing nodes are neuroprocessors; the computers have broadband connection and are perceived by the user as a single hardware unit.In other words, a cluster is a loosely connected combination of multiple computing systems that perform jointly in executing shared applications, and are viewed by users as a unified system.A cluster based on neuroprocessors has a number of advantages arising from high-level parallelism of the neurocomputer's paradigm.
Each cluster i CL can be described in this way: E is a set of directional associations between clusters of the DNPCS, where some associations can be highspeed (strong), and others slow (loose).
Δ is the controller (governing node) of the NPCS; defines the latency, i.e., time required to initialize messages, send and receive data, etc. between each pair of computing cluster (bytes/sec).
3. Cloud neuroprocessor computing systems (CNPCSs) require ubiquitous access to the network infrastructure of configurable computing resources, such as a neuroprocessor system (we only analyze the IaaS service model, where users are given access to the neuroprocessor infrastructure).A significant feature of cloud systems is that there occur time delays due to the employment of relatively slow communication channels between a user and a cloud.
In cloud computing, these two options are possible: a) The cloud's computing system is not distributed.b) The cloud's computing system is distributed.
A cloud system can be described like this: Ttrс is the time delay in data transfer to the CNPCS governing device and back to the user; Tlс is the latency time in data transfer from the CNPCS governing device and back.If

MK in subprogram
where H H k ∈ is a technical specification defining the quantity of neurons modeled within a tact; H H m ∈ is a technical specification defining the execution time for one tact of the neuroprocessor.
If Ttr is the total time of data transfer delays between the NPCM in the system; Tl is the total latency time in data transfer.The data transfer time can be: where 0 i Ttr is the data transfer time from node i to the Δ governing node.
where PS is the data throughput for the channel (bit/sec); N is the bit count for transfer.
, where ij Tl is the latency time in data transfer between i and j the NPCM.
The latency time can be: where 0 i Tl is the latency time in data transfer from node i to the Δ governing node; j Tl 0 is the latency time in data transfer from the governing node Δ to the j th node.
Thus, in distributed systems we obtain a transfer time delay in our NPCS: In such cases, the total data transfer times are computed as: That is, in this case we obtain 0 ≠ Ttrс ; 0 ≠ Ttr .For further consideration, we assume the total time loss in data transfer to be the following: Using the above ratios, we can define the analytical expressions for evaluating the chief criterion of effectiveness: performance of neurocomputer systems for various structures: pipeline, vector, pipeline-vector, and vector-pipeline.

Cycle ) ( j c
T of the pipeline structure is defined as the duration of the maximum processing time, i.e.
The first result of the information processing after implementation of algorithm ) ( j A will be obtained as the pipeline output after the Tsc q T t j c j This value will be the execution time Each subsequent processing output will be obtained after ) ( j c T , and then the times of obtaining outputs can be calculated in this way: where N is the ordinal number of the required processing output.
The time loss will be the difference in times between execution time ) ( j o T and the total execution time for all subprograms: On the other hand, processing each next word will yield a time gain as compared to a single-processor option of the system's implementation Then the gain time will equal the difference between the total operate time of all NPCMs and the total execution time for all subprograms: The downtime can be computed as the sum of differences The total processing time is defined as the execution time for all q subprograms on all NPCMs: Let us look at the time characteristics for distributed and cloud systems that have a pipeline structure.
For distributed or cloud systems at each pipeline run, data transfer time delays will be added, totaling the delay times of all 1 − q associations: Then, the DNCPS's pipeline cycle will equal: The first result of data processing during the ) ( j A algorithm implementation will be obtained at the pipeline output after Then, considering the delay times, the ) ( j PR execution time will be: Each subsequent processing result will be the output in ) ( j c T , so the time when the next processed output is obtained can be computed in this way: The time lag is defined as the difference in execution time ) ( j o T with consideration for time delays in data transfer and the total time of all subprograms' execution: (29) The time gain in a distributed or cloud system does not differ from the time gain in a parallel system.
The downtime in a distributed or cloud system: The processing time T in a distributed or cloud system does not differ from the processing time ) ( j р T in a parallel system.

Software for data processing in robotic neurocomputer systems
Basing on the obtained analytical expressions, we have developed optimization methods for CSs of neurocomputer data processing that relate to the task of achieving similar processing times for all subprograms and all segments of the program code [7].
For practical study, we chose the NP Studio software platform to analyze and optimize the CSs for neurocomputer data processing.
In keeping with the developed classification of structures, we have implemented a choice of five options: vector, pipeline, vector-pipeline, pipelinevector, and arbitrary architecture.
We have developed a module for modulation, analysis and optimization.The results of this analysis are the following: For work convenience with various source data a task model was implemented, to automatically control electric and mechanical systems in the developed Visual Programming subsystem of the NP Studio software platform.The chief item in the subsystem is the notion of a functional unit, connectable to other functional units to perform certain functions.
The workspace of the software application is divided into functional parts: 1.The area of functional units, instance of which may be transferred to the modeling workspace with the dragand-drop function.The same area contains control items for visualization and deletion of connections between items.When added to the workspace, each item gets a unique ID, which is a concatenation of the item category, the item number within the category and the index number of the item, e.g., "O2.9" defines an item from category Output Signals, Number, with "9" as its unique ID.
2. The workspace (which, in its turn, can be represented as visual items and connections between them, or as a matrix of connections between items. The theoretical and practical results of the study have been used for development (in cooperation with the Institute of Machinery Studies, Russian Academy of Science) of a specialized neurocomputing robotic device based on neuroprocessors for automated control of modules in electric-mechanical systems (specifically, hexapods) in a near real-time mode [10][11][12][13].

Conclusion
The paper proposes theoretical and practical conclusions that represent mathematical tools, as well as algorithms and software that are necessary for optimization of computing systems based on conceptually new computing hardware: neurocomputers.The results obtained can be employed in analysis and optimization of tasks that use multiple neural-based computing, e.g., implementations of neurocomputer systems with automated administration of SEMS modules (Smart electromechanical systems).
The research was carried out within the framework of the assignment for the performance of public works in the sphere of scientific activity within the framework of the initiative scientific project of the state task of the Ministry of Education and Science of the Russian Federation No. 2.9519.2017/ BC on the topic "Technologies for parallel processing of data in neurocomputer devices and systems".

wS
is the structure of the DNPCS; a is the hardware architecture of each DNPCS node; h is the software architecture of each DNPCS node; to the principles and specific features of neurocomputer functioning, the execution time for auxiliary commands tends to zero:

1 .
Let us look into the sets that define delay and latency times in data transfer Ttr and Tl for the DNPCS, transfer time from i to j NPCS.

2 . 3 .
cases, the initial data transfer to the governing device (commutator) does not cause delays, iLet us look into the Ttr and Tl sets for the CNPCSs that do not have a distributed structure.In this case, value 0 ≈ Ttr , and the delay time Ttrc in the system equals: is the transfer time from the user to the Δ node; op Ttr is the transfer time from the Δ node to the user.latency time in data transfer between the user and Δ ; op Tl is the latency time in data transfer between Δ and the user.Then we obtain the following: Let us look into the Ttr and Tl sets for CNPCSs with distributed structures.
E is a set of directional associations between the nodes of the NCPS, whose quantity is defined by the structure type, i.e. defines the latency, i.e., time required to initialize messages, send and receive data, etc.
that define the processor speed, number of its nuclei, RAM and disk storage capacity (if applicable) for the i th NPCM, accordingly; w S is the NCPS structure; -