Press "Enter" to skip to content

One Day on Julia

Meeting Julia at Coffee Break

It was a sunny day when I gradually got tired of some programming language, which is slow to some degree. And above all I made no actual progress on efficiency of code execution during the past few years, of course my laziness should account for which. Finally, during the coffee break of my boss, I had a glimpse of Julia.

And I think this would help me out after some quite readings.

Here’s what I read, and what impressed me most is definitely the performance of Julia compared to other languages:

  • Julia’s Role in Data Science
  • Official Website of Julia
    • You may find a lot of useful introductions and tutorials here. There’s a collection of them on the website which helped me a lot.
    • Currently the some of the documents of the website need updates, and for newer discussions you may want to go the their github site.
  • Source of Julia (GitHub) (Just for more reference, of course I was not reading the source code…)
    • Here’s where most recent issues will be discussed, better visited when you dig into the language.
Julia Performance Compared to Other Languages
Julia Performance Compared to Other Languages (From the Official Website)

I saw comments on defects of the experiment settings, but my passion urged me to continue finding out the truth by myself, and here we go.

Notice that I am using Julia 0.4.0 in this article. It’s important to check your own local version before reading the code.

The Version of Julia We're Using
The Version of Julia We’re Using

Hello World~

I am using Ubuntu, so I found the repository of Julia in the very deep corner of its official website, you may go to check the Platform Specific Instructions, or just simply run the following commands on Ubuntu:

# adding the Julia repository on Ubuntu
sudo add-apt-repository ppa:staticfloat/juliareleases
sudo add-apt-repository ppa:staticfloat/julia-deps
sudo apt-get update
sudo apt-get install julia

Then an REPL will show when you type “julia” in your terminal. Following the basic instructions on the official website, I typed:

# saying hello world
println("Hello World~");

And you may guess what happened: The journey starts now!

Wait: A Little Different

I checked out some basic grammars of Julia. It seems to have some interesting differences compared to Python (Yeah I was using Python). I think most people will first notice that the indexing in Julia is 1-based, which means in order to access the initial element of an array in Julia, you starts from 1.

# the indexing
a = [2,3,5,7,11]; # define a simple array
a[0]; # BoundsError
a[1]; # 2
a[2]; # 3

It seems more like procedure oriented when I saw some built-in functions can apply on several different data types, which is more similar to C. It’s not like most of the young languages.

# several tests
a = [2,3,5,7,11]; # define a simple array
length(a); # 5
size(a); # (5,)
eltype(a); # Int64
typeof(a); # Array{Int64,1}

And I found some more differences between the function length and size. It is probably that Julia has a matrix-like type, or even a single list is implemented as matrix in order to accelerate the calculations, because I saw (5,) as a return indicating the size of the array instead of 5, which is quite similar to the numpy.shape.

But there’s not enough time for me to dig into the language, I just want to start quickly using it. The language details will be discussed when needed. I want to deal with a lot of data, so I need something faster.

Incorporate with Python

Although libraries for Julia is far less than those for Python, we can still incorporate Julia codes with other languages, like Python, R and so on. My data was serialized by Python, and now I want to read them out before I can do the calculations.

And PyCall is a library for incorporating Julia with Python, which means you can run Python codes in Julia environment. Julia offers a easy way to install all these libraries, just as Python does:

# installing new packages

That’s it. Usually an update operation should be followed after the adding operation to ensure the packages are the latest, especially for such a language in frequent development. There may be a problem when fetching the meta data from the Julia official github site, where the REPL uses git protocol which may be blocked under some network environment. Simply type the following command in Julia REPL before you call the Pkg operation:

# fixing the github fetching problem
run(`git config --global url."https://".insteadOf git://`)

Or you may try to execute the command inside the brackets in terminal if running it in REPL doesn’t help. And now we may start reading files serialized by Python.

# read data serialized by Python (slow)
using PyCall; # import the PyCall package
@pyimport marshal
d = marshal.load(pybuiltin("open")("./data.msl","rb"));

Only in Julia 0.4+ can you directly call the method of a module (like marshal.load). This is the first scheme I use, and I found it was unbelievably slow (nearly stuck) even when my data is not that big (about 500MB). Compared to running the same code in Python, this operation in Julia is not efficient. I guess it is because the object returned by Python environment will be implicitly converted to Julia object, which cost too much. There should be a way of loading the file while at the same time keeping the object type of Python (as mentioned in the PyCall document, PyAny or PyObject. In another word, is there a way to stop the implicit type conversion from Python Object to Julia Object?

Finally I figured out the following way:

# read data serialized by Python (fast)
using PyCall; # import the PyCall package
@pyimport marshal
d = pycall(marshal.load,PyDict,pybuiltin("open")("./data.msl","rb"));

I can now specify the return type of the function (PyDict) to prevent the type conversion. For the type PyDict share some similar method as Dict in Julia, I may continue to use it for acceleration. Type conversion from Python object to Julia object is slow under some circumstances, so if you need to incorporate with Python in Julia, try to finish some data pre-processing in Python environment using pycall or pyeval (see PyCall document for more details) before you transfer them into Julia objects.

Some examples of PyCall can be found here. One more thing, the PyCall document seems not to mention that there’s a way of controlling the variable substitution in the function pyeval, which is like the following:

# read data serialized by Python (fast)
using PyCall; # import the PyCall package
@pyimport marshal
d = pycall(marshal.load,PyDict,pybuiltin("open")("./data.msl","rb"));
keys = pyeval("dt.keys()",dt=d);

where you may specify the variable in the pyeval to be the variable in the Julia environment. It is interesting that I can still write “pyeval(str,dt=dt)” and it correctly executes. It is convenient and makes things more flexible.

There are some more useful packages like the NPZ package which allows you to read and write numpy files in Julia, the StatsBase package which provides lots of statistics operations.

Tricks & Traps

OK I admit it seems that I was just writing Python in Julia, which seems stupid. But it is just the first day, and I still need more time to actually get myself working in a new setting. The article seems not to mention too much about Julia, but now I get to write down some traps I met on the first day’s contact with this language.

Multi-Dimensional Arrays

Let’s see the following example:

# array example
a = reshape(1:12,3,4)
# you get
# 1 4 7 10
# 2 5 8 11
# 3 6 9 12
a[1] # you get 1
a[2] # you get 2
a[3] # you get 3
a[1,2] # you get 4
a[2,2] # you get 5
a[1,:] # you get 1 4 7 10, Array{Int64,2}
# you get Array{Int64,1}
# 1
# 4
# 7
# 10

I first create an array. It’s 2-dimensional, and thus its type is Array{Int64,2}, and we see we can index a 2-dimensional array in a 1-dimensional way. And at the same time values at specific positions can be accessed using 2 indexes at the same time. Finally Julia also supports slicing operations and we may see the difference from the last two lines: the first command you still get a 2-dimensional matrix [[1,4,7,10]] and the second command you get an 1-dimensional matrix [1,4,7,10].

This would get more messed when  is of the type PyAny where you may still use Python grammar to access the elements in the matrix like a[1][2] returns 8.

So how can I iterate every element in the Multi-Dimensional Array? The for…in… provides convenient iterations, but we may try the followings, for I think it’s more controllable:

# iterating a matrix
a = rand(3,4);
for i=1:length(a[:,1])
    for j=1:length(a[1,:])
        # do something

While if doesn’t have any element, this may fail, but we can iterate the matrix in the order we prefer instead.

Arrays in Arrays

So here comes something weird if I want to define arrays in an array? In Julia, if you define an array like [[1,2,3],[4,5,6],[7,8,9]] it will be flatten to [1,2,3,4,5,6,7,8,9], which is definitely not what I want. The following is quite like some kind of walkaround:

# define arrays in an array
a = Array{Any,1}()
push!(a[1],"I am in an array's array!")

It seems a little clumsy, but it did work. I don’t know whether there’s more elegant methods. Even I don’t want to write such codes…

So this Arrays in An Array is different from the Multi-Dimensional Arrays, the former one behaves like those in Python, which means you may access the element by a[i][j] instead of a[i,j], while I guess Julia doesn’t like such structures.

Time for Dinner

Well I think I should call this a day. In general, I think Julia is quite ambitious, trying to do her best in the aspect of scientific computing but it seems that there are quite a long way to go. But at least for now I can see something distinct from such a language and I starts to enjoy coding with Julia, although there are quite some traps.

The next day I was playing with Julia’s Parallel Computing, which is quite a dinner for me. We will discuss it later.

You may read more about the Julia programming language from the following links:


Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *