Workstate Insights Blog

Workstate Codes: OpenAI Gym on Windows Ubuntu Bash Shell? Yes, It Works!

February 08, 2017
    

One little-understood killbot scenario. Have a cheeseburger, human.jpgArtificial Intelligence is a funny thing. We are on the cusp of the Singularity where we will merge with our machines and advance our intelligence exponentially. We are also on the verge of unleashing murderous killbots ready to end our civilization as the most direct means to achieve their inscrutable ends. Meanwhile, Google DeepMind’s AlphaGo – the AI that beat the best human players in the world at Go – wouldn’t stand a chance against me at Mario Kart (and I’m just fair at best).

You may recall hearing some expressions of concern from Tesla and SpaceX CEO Elon Musk over the last few years about the potential for AI to take us down the killbot route if it’s not carefully designed. With that in mind, Musk and the president of the YCombinator tech incubator, Sam Altman, came up with the concept of OpenAI. OpenAI is a company whose mission is to make the cutting edge in AI development open for all to see and participate in. The reasoning goes that AI developed in the light of day is more likely to be responsible and accountable than AI developed in locked labs at Google, Facebook and elsewhere.

OpenAI’s first forays into providing platforms for public contributions to AI research have been through their Gym and Universe environments. Briefly, Gym is a toolkit for developing and testing reinforcement learning algorithms while Universe is a way of using any computer program capable of taking inputs and displaying output on a screen as a reinforcement learning environment. The goal is to eventually create reinforcement learning agents that can learn a wider and wider array of environments, moving us closer to general AI.

I won’t go into any more detail here and now, but may do so in the future. My purpose today is to share the steps I took to get a Gym environment working on Windows 10 using the Linux subsystem, AKA Bash on Ubuntu on Windows (on Rye).

Before we get started with the commands for you to copy and paste, let’s go over the high level components we’re using.

  • First we are using the Anaconda platform for Python to help us with some of the software and libraries we’ll need.
  • Then we’ll need several additional packages anyway. These include:
  • And if you’re wondering why we want X Window libraries on Microsoft Windows, that’s needed for you to observe your agents in action. So, for that we’ll need a bonafide X Server for Windows, Xming.


Finally, a note for the wary. I am running Windows 10 Pro Insider Preview build 14986. I note this not to damage my credibility with Linux enthusiasts further, but to draw a distinction between the Ubuntu version supported by this build (16.04 xenial) compared to that supported by the current release build of Windows 10 (14.04 trusty). In other words, your mileage may vary.

First, let’s go out and get Anaconda3, then install. It’s pretty big so it might take a few minutes.

$ wget https://repo.continuum.io/archive/Anaconda3-4.2.0-Linux-x86_64.sh

$ bash Anaconda3-4.2.0-Linux-x86_64.sh

Next, we’ll install all the supporting packages not provided by Anaconda.

$ sudo apt-get install cmake zlib1g-dev xorg-dev libgtk2.0-0 python-matplotlib swig python-opengl

Anaconda made some changes to ~/.bashrc which we’ll need to apply in order to continue so you’ll either need to start a new Bash session or run the source command.

$ source ~/.bashrc

Now we are ready to create and activate a new Anaconda environment which we’ll designate creatively as “gym”.

$ conda create --name gym python=3.5

$ source activate gym

In the output from the create command above you’ll have seen a whole bunch of data-sciencey python libraries get installed. The source activate command will have activated that environment, which you will know by the text “(gym)” prepended to your command prompt.

Next, we’re going to clone the gym github repo and install it using pip.

(gym) $ git clone https://github.com/openai/gym

(gym) $ cd gym

(gym) $ pip install -e .[all]

In addition to the package we installed using apt earlier, we’ll need to install matplotlib into this Anaconda environment using pip.

(gym) $ pip install matplotlib

At this point you can actually run and train your agents as long as you don’t execute the render method which is what actually allows you to see what’s going on. But what’s the fun in that?

This is where X Windows comes in. These next steps take place outside of the Bash console window, in Windows proper.

First, download the Xming installer from here: https://sourceforge.net/projects/xming/files/latest/download. Install it using the usual double-click methodology.

Once installed and running, we’ll head back to the Bash console window and let our subsystem know about our shiny new display.

(gym) $ export DISPLAY=:0

Here, I am going to plug Kevin Frans who has a great tutorial on reinforcement algorithms for conquering the “hello world” of the domain, Cart Pole.

The object is simple. You move the cart left and right in order to keep the pole from falling down. You also need to avoid falling off the edge of the screen.

I used his cartpole-hill.py algorithm to quickly verify that everything was working. But before you try it you should know that it was written for Python 2.7, which has some incompatibilities with 3.5. You will recall that we naively selected 3.5 above when we created our Anaconda environment. So we’ll need to make a couple of changes to Kevin’s code. In a happy coincidence – since we now have a handy X Server – we can actually use gedit on Windows.

(gym) $ sudo apt-get install gedit

(gym) $ git clone https://github.com/kvfrans/openai-cartpole.git

(gym) $ cd openai-cartpole

(gym) $ gedit cartpole-hill.py

Now a gedit window will open up. There are three changes we’ll need to make:

  • First, replace the function call xrange with range in multiple places. This is because in Python 2.7 xrange is simply a lazy loading range function, while in 3.5 range itself is lazy loading.
  • Next find # env.render() on line 10 and uncomment it.
  • Finally on line 55 where it reads print r, put parentheses around r.

All done! Now we’ll just save our file and head back to the Bash console and execute it.

python cartpole.py

Another window will open up where you’ll see your agent doing its thing.


Congratulations! Now that your algorithm has mastered the art of balancing a pole in the air we are one step closer to the general AI of our dreams and/or nightmares. Enjoy!

Interested in working on interesting problems in exciting, cutting-edge technologies? Workstate is a company started by developers to be a great place for people that love software development (and things related) to thrive. Check out our careers page for current openings.