Reinforcement Learning with Modelica and Python

IS 2021-04-28

Applying Reinforcement Learning to Modelica models

Physics modelling of systems is at the heart of industrial engineering R&D. Digital modelling enables an iterative approach to product development and rapid experimentation with design methodologies.

Machine learning, which is a subset of technologies in the domain of “AI” is a mathematical modelling tool that has gained attention of researchers owing to its ability to generate a generalized solution to a prediction problem, given appropriate data.

Reinforcement learning is one technique that is considered the forefront of the machine learning research effort. It differs from traditional methods of supervised learning (Eg. deep learning) as it does not require explicitly labelled data. Instead, an “Agent” (algorithm) is defined that interacts within the rules of a defined “environment”.

The Agent is allowed to choose from a set of predefined “actions” (decisions). The environment is programmed to give positive or negative feedback depending upon the action, such that after going through multiple “training steps”(iterations in the environment). The agent is able to “learn”(adjust the parameters of the function) to always achieve the favorable result by performing appropriate actions.

Image credit: Daniel Piedrahita -Wordpress

Balancing an inverted pendulum is one such controlled scenario where the RL algorithm has performed exceptionally well. The traditional analytical approach would be to design a feedback based control system (Eg. PID) that would regulate the angle of the pendulum to make it reach a defined angle (setpoint). The control system would require tuning of parameters to meet the required phase and amplitude response. For such cases, the analytical method is effective and fast. However, the complexity of certain systems may lead to this method being cumbersome (Eg. a pendulum with multiple links and joints). In addition, if the system parameters are changed slightly, the calculations or even the entire set of equations may have to be redone. RL can be applied to these situations if the agent and environment are programmed to be sufficiently close to the real world.

In this assignment, we attempted to integrate the Modelica physical behaviour modelling utility on the 3D experience platform with Gym from OpenAI, an off-the-shelf RL programming utility in python.

The second part of the assignment was to attempt to apply this behaviour to turtle-bot, an open source robotic modelling platform. The physical model of the turtle bot was conceptualized as an inverted pendulum.

The nature of the assignment was interdisciplinary from the get-go. Both the physical and behavioural models were formulated by collaboration. Functional-mockup-unit in Dymola behaviour modelling worked as an interface between the platform and python scripting tools. FMU is a representation of the Modelica model (set of differential equations) that is packaged with the solvers for those equations. This black-box representation of the Modelica model can be interfaced with python or other industrial tools like Matlab. Python has specialized libraries for handling of the FMU. Once the inputs and outputs of the Modelica model are handled by python, a range of experiments with ML libraries can be performed. In this use case, I

**1) Modelled the inverted pendulum in DBM app.**

2) Edit the required parameters before exporting the FMU

**3) Exported the model as a co-simulation black-box**

**5) Ran the Q-Learning RL algorithm on the FMU(supply inputs to FMU and read outputs at each time step )to train the agent to stabilize the pendulum for maximum number of steps.**

Steps 1-4 are to be used for any other type of experiment that needs to be performed in python. The Methods for handling the FMU are well explained on the official documentation of the PyFMI Library. Reference scripts in the published material also provide a concise explanation of the data flow.

In order to gain the required expertise, I went through the following courses:

1)Dymola Behavior Modeling Essentials (PLEXP)

2)Feedback Control for Mechatronics Fundamentals (PLEXP)

3) Official resource for Modelica language:

https://mbe.modelica.university/

My Mentor Ajinkya Naik guided me throughout the learning phase. The challenging task was to modify and implement the scripts that were designed by researchers on the assigned problem statement. This included searching for the appropriate libraries. I worked for approximately 4hrs-5hrs each week. I successfully tested the training of the Modelica model FMU in python, the results of which can easily be replicated using the published material.

In spite of the remote working conditions, my mentor, my colleague Omkar and I were easily able to collaborate using the 3D experience cloud to exchange files and data. Well-organized learning material, which helped right from installing of the platform, to executing examples that illustrate the RFLP approach of designing, streamlined the execution. I am thankful to Dassault Systemes for offering me this amazing opportunity. This was a truly wonderful 3D experience!

References to the research material:

ModelicaGym: Applying Reinforcement Learning to Modelica Models

https://arxiv.org/abs/1909.08604

ModelicaGym: GitHub

https://github.com/ucuapps/modelicagym

Modelica Linear Systems Library Research Paper

https://core.ac.uk/download/pdf/11128465.pdf

Modelica Linear Systems Library GitHub

https://github.com/modelica/Modelica_LinearSystems2

Dymola Reinforcement Learning OpenAl Gym for Modelica models

https://github.com/eyyi/dymrl

PyFMI

https://pypi.org/project/PyFMI/