Earth Sciences & Env.
Engineering & Tech.
Information & Comm. Tech.
Life Sciences & Biotech
Use this search facility to find out more about the profile of our HPC-Europa2 visitors, the type of work they have been doing, and their project achievements.
The main goal of this work has been to obtain the Renner-Teller (RT) coupled-channel (CC) dynamics of the reaction, NH(a1Δ) + D(2S), considering the four following channels: NH(a1Δ) + D(2S)--> N(2D) + HD(1Σg+) (depletion) (1)
NH(a1Δ) + D(2S) --> ND (a1Δ) + H(2S) (exchange) (2)
NH(a1Δ) + D(2S) --> NH(X3Σ-) + D(2S) (quenching) (3)
NH(a1Δ) + D(2S) --> ND(X3Σ-) + D(2S) (exchange+quenching) (4)
We have used the best available potential energy surfaces and have obtained initial-state-resolved probabilities, cross sections, rate constants and branching ratios via the time dependent real wave-packet method(1) (TDRWP) and flux analysis.
The two electronic states of NHD (X2B1 and A2A1) are the degenerate components of a linear 2Π state, thus giving rise to Renner-Teller(2) (RT) rovibronic nonadiabatic interactions allowing to change the electronic state. The RT effect is responsible to exchange and exchange + quenching reactions, so these reactions are not allowed under the Born-Oppenheimer approximation. Thus, the propagation of the RWP starts in the A2A1 excited potential energy surface(3) (PES), and when it reaches the H-N-D collinear arrangements it becomes possible to change the electronic state (jump into the X2B1 PES(3)) through the RT effect. Moreover, the NH2 X2B1 state has a deep minimum that traps the RWP for long time, increasing in this way the probability of nonadiabatic electronic transition (change of the electronic state).
Finally, the part of the RWP corresponding to reaction channels (1), (2), (3) and (4) is determined and the probability of each one of them is calculated, as a function of the initial conditions.
All the work detailed here is just the continuation of our previous project concerning the Renner-Teller dynamic and kinetic of atom+diatom reactions(5),(6).
(1) S. K. Gray and G. G. Balint-Kurti, J. Chem. Phys. 108, 950 (1998)
(2) C. Petrongolo, J. Chem. Phys. 89, 1297 (1988)
(3) Z.-W. Qu, H. Zhu, R. Schinke, L. Adam, and W. Hack, J. Chem. Phys. 122, 204313 (2005)
(4) S. Akpinar, P. Defazio, P. Gamallo and C. Petrongolo, J. Chem. Phys. 129, 174307 (2008)
(5) P. Gamallo and P. Defazio, J. Chem. Phys. 131, 44320 (2009)
(6) P. Defazio, P. Gamallo, M. González, S. Akpinar, B. Bussery-Honvault, P. Honvault and C. Petrongolo, J. Chem. Phys. 132, 104306-1 (2010)
The first thing we had to do was to test our parallel code in the CINECA machines. Some problems arise from this first step due to poor flexibility of IBM compiler included in the CINECA sp6 machine. Several test were performed and a lot of cpu time was wasted with the aim of solving the problem. Thus, we checked the code running some jobs and comparing the results with others obtained using other computers. The success of this step allowed us to begin the dynamic study of the reactions indicated in the previous section.We want to thank CINECA responsible to allow us to increase the cpu time for finishing all the work expected in the project. Later on, we checked the convergence of the RT-CC-RWP (NH(a1Δ)+D(2S)) calculations verifying a large number of numerical parameters (e.g., rotational basis, mesh, number of iterations, etc.). Once the results were well converged, we performed the RWPs propagations for the title reaction.The different initial conditions investigated have been the following: NH (v0=0, j0=2,3,4) and J=0,1,2,…,40, and K0=0,1,…,min(j,J). Because the CC method has been used, the RWP has been propagated so many times as given by the number of possible K final values (J+1), using a single processor for each propagation. A total of 80000 iteration steps are required to reach convergence and due to this the propagations of the RWPs have been very time demanding.At present, we are carrying the analysis of all this amount of data using the flux and the asymptotic methods to obtain the probability of the four reaction channels. These probabilities will be the basis to obtain both the cross section and the rate constants for all processes. These rate constants will be compared with the experimental data available in the literature.
The Finite Element Tearing and Interconnecting method Dual-Primal (FETI-DP) method is one of the non-overlapping domain decomposition methods, which was published by Farhat and his co-workers in the article [FarhatEtAll-01] in 2001. The domain decomposition methods divide the original domain into several smaller subdomains. The FETI-DP method was developed due to problems with singular subdomain matrices in the original FETI method. The FETI-DP method is the combination of the FETI method and Schur complement method.
The FETI-DP method divide unknowns into two categories – corner unknowns and remaining unknowns. The remaining unknowns are further split into remaining interface unknowns and internal unknowns. The continuity conditions are enforced by Langrange multipliers which are defined on the remaining interface unknowns and also by corner nodes. The corner unknowns ensure the non-singularity of the subdomain matrices. The remaining unknowns are eliminated and the coarse problem is after obtained. The matrix of the coarse problem is symmetric and positive-definitive. Therefore the coarse problem can be solved by the conjugate gradient method. More information about FETI-DP method can be found in the article [FarhatEtAll-01] or in the book [Kruis06] by Kruis.
The selection of the the corner nodes where are corner unknowns defined deserves special attention. A definition of the corner nodes was published in the original article [FarhatEtAll-01] by Farhat. The corner nodes are there defined as
D1: Cross-points - It means the nodes which belonged to more than two subdomains
D2: The set of nodes located at the beginning and end of each edge of each subdomain.
But the definition is not suitable for all possible meshes. If the original domain is divided into two subdomains and the first subdomain is surrounded by the second domain, then there is no cross-points and there is not beginning and the end of any edge. Recently, strong influence of the definition of the corner unknowns on the condition number of the subdomain matrix has been observed in the work [KabelikovaEtAll-09] by Kabelíková et all. The large condition numbers of subdomain matrices significantly deteriorate the convergence of the iterative methods used for the solution of the coarse problem. There is a minimal needed number of corner nodes. In the case of two-dimensional meshes, plane strain and plane stress problems require two different nodes, three nodes are better for plate problems. Therefore, the minimum number of needed nodes is three in the case of two-dimensional meshes. In the case of three-dimensional meshes must be selected three non-collinear nodes. Theoretically can be selected all nodes on the subdomain boundaries, but then the FETI-DP method transforms itself to the Schur complement method.
There is no software known to author which can be use for the selection of corner nodes. This was a motivation for a development of an algorithm for the selection such nodes. The algorithm will be developed with help of the graph theory and several heuristic rules.
[FarhatEtAll-01] Farhat, C. and Lesoinne, M. and LeTallec, P. and Pierson, K. and Rixen, D.: “FETI-DP: A Dual-Primal Unified FETI Method-Part I: Faster Alternative to the Two-Level FETI Method” , International Journal for Numerical Methods in Engineering, vol. 50, pages = 1523 – 1544.
[KabelikovaEtAll-09] Kabelíková, P. and Dostál, Z. and Kozubek, T. and Markopoulos, A.: “Generalized inverse matrix evaluation using graph theory”, In Proceedings of the Modelling 2009, Blaheta, R., Starý J. (ed.),Institute of Geonics AS CR, Ostrava, Czech Republic, 2009.
[Kruis-06] Kruis, J.: “Domain Decomposition Methods for Distributed Computing”, Saxe-Coburg Publications, edition 1st, 2006.
There were developed two independent algorithm for selection of the corner nodes. The first algorithm is used for two-dimensional meshes and the second algorithm for three-dimensional meshes. Both algorithm are based on the Graph theory. Therefore there will be defined several necessary terms from the Graph theory. A graph G(V,E) consist of a finite set V of elements called vertices and a finite set E of elements called edges. Nodes of a finite element mesh can be mapped to the set V of graph vertices. Vertices vi and vj are connected by an edge if the appropriate nodes belong to a edge of the finite element. Otherwise, there is no edge between vi and vj. This graph is called nodal graph. The degree dG(v) of vertex v in graph G is the number of edge of G incident with v. A walk in a graph G = (V, E) is a ﬁnite sequence of vertices v0 , v1 , . . . , vk such that (vi−1, vi ), 1 ≤ i ≤ k is an edge in the graph G. The nodes v0 and vk are called end vertices of the walk. A walk is a trail if all its edges are distinct.
In the case of the two-dimensional algorithm, the graph B(V,E) is established form boundary nodes between subdomains. The vertex degree is established for each vertex in the graph. The corner nodes is then defined as
vertices which vertex degree is equal to one
vertices which vertex degree is more than two
The algorithm, which selects corner nodes, based on the vertex degree is called minimal number algorithm. A control of chosen nodes must be done after the selection procedure. The number of corner nodes is controlled. The minimum number of needed nodes is three. If there are no enough nodes from the minimal number algorithm then the another nodes must be selected.
The next control is aimed on the geometrical relation between corner nodes. If the selected nodes are too close each other, the subdomain matrix has usually very large condition number. A distance between corner nodes are controlled . The next condition is a non-collinearity condition. This is necessary in order to avoid several nodes in one line which lead also to large condition number of subdomain matrix.
It is possible to add further corner nodes by an extended algorithm for adding corner nodes. The extended algorithm is based on the restriction of the graph B(V,E) into several subgraphs. The subgraphs Sj (V,E) are defined as the open trail between two corner nodes in the graph B(V,E). The further corner nodes can be added as the centre of such subgraph or the walk can be split into k part and the corner node can be added at the end of such part of the walk.
The several numerical test was done with help of SMP computer Ness and supercomputer HECToR.
The following behaviour of the algorithm was observed. Higher number of corner nodes reduces the number of iterations in the conjugate gradient method and therefore also the computational time is reduced. If an optimal number is reached, the number of iterations still decreases but the total computational time starts to grow because time of condensation of the matrix contains entries related to the corner nodes prevails over time saved by the reduced number of iterations. The tests also showed the ability of the proposed algorithm to select the minimum number of corner nodes in the case of very general domains. The FETI-DP method can be therefore used without manual selection of corner nodes.
This talk presents the work done on the three months stay in HLRS facilities. The main work is the implementation of an speculative compression module for comunications in openmpi. In summary it is a modification of the point to point comunications module in openmpi for rendevouz messages. whenever a rendeveouz message is requested, the implementation fragment the message and queue the fragments so background threads can compress them. On the other hand when the acknowledge from the receiver arrives, the sender start to send those fragments wheter they are compressed or not (first the compressed ones, then the others). This module allows to send partially compressed rendevouz messages where the level of compression depends of the requirements of the application and the hardware. Also it is a way to compensate in those enviroments where the CPU power available outmaches the network bandwitdh. The project implementation is recently finishing and the project is nowadays begining the benchmarking phase.
The current achievements obtained in this project are the following:First a new compression module base been included into openmpi. This module allows to compress data buffers using several compression algorithms. Each compression algorithm can be included as a dynamic modules. Also the selection of the algorithm can be done on run-time and different algorithms can be used for different buffers.Second a modification has been included on the P2P Manage Layer (PML) used by default (OB1). The modification is included on the rendezvous protocol. The sender fragments the message an queue the pieces. A number of backgrounds threads compress the fragments. When receiver request the data, the sender starts by transmitting the fragments already compressed and then the rest. The receiver uncompress those fragments that are compressed and joins all the fragments to recover the original message.Third several HPC applications has been selected in order to perform a benchmark that measure the performance of these proposal. The benchmarking process is already ongoing.