Press "Enter" to skip to content

Category: Teatime

Join me trying some interesting techniques and applications with a cup of tea~

Bricks on Julia

Writing something I found when coding, like some kind of syntax sugars… though no guarantee for the correction…

Initializing An Array

There are different kinds of array initializations. Notice the differences between “{}” and “()” notations.

# Array Initialization
a = Array{Int64,1}() # 0-element Array{Int64,1}
b = Array(Int64,0) # 0-element Array{Int64,1}
c = Array(Int64,1)
# 1-element Array{Int64,1}:
# 6874584755139077376
d = Array(Int64,2)
# 2-element Array{Int64,1}:
# 140298281045760
# 140298275170800
e = Array(Int64,2,2)
# 2x2 Array{Int64,2}:
# 140298302013152 140298289255248
# 140298289255184 140298302013232
f = Array{Int64,2}(2,2)
# 2x2 Array{Int64,2}:
# 140298246197808 140298268973680
# 140298268973616 140298268947024

All of them work. Those in parentheses specify how they can be initialized.

How Does Sparse Matrix Work?

The official document doesn’t tell much, so I go to the source code here. There’s a type definition SparseMatrixCSC, which is the basic type of a lot of methods related to sparse matrix. The definition is as follows:

# Type Definition
type SparseMatrixCSC{Tv,Ti<:Integer} <: AbstractSparseMatrix{Tv,Ti}
m::Int # Number of rows
n::Int # Number of columns
colptr::Vector{Ti} # Column i is in colptr[i]:(colptr[i+1]-1)
rowval::Vector{Ti} # Row values of nonzeros
nzval::Vector{Tv} # Nonzero values

After some tossing about the codes, I figure out what does all the parameters mean in the SparseMatrixCSC.

  • m: control the number of rows in the matrix
  • n: control the number of columns in the matrix
  • colptr: column pointers, control the starting and ending indexes of every column, usually Array of the shape (n+1,)
  • rowval: row values of every non-sparse element, Array, the size is the same as the number of non-sparse elements, controls which row is a specific element in
  • nzval: values of every non-sparse element, Array, the size is the same as the number of non-sparse elements

In fact, in a sparse matrix like the following:

# a sparse matrix
# 2 x 3 5 x x x 7
# x x x 1 4 x x x
# 7 8 4 5 3 x x x

in which the “x” indicates the element is sparse. The parameters are like:

# parameters of the matrix
# m = 3
# n = 8
# colptr = [1, 3, 4, 6, 9, 11, 11, 11, 12]
# rowval = [1, 3, 3, 1, 3, 1, 2, 3, 2, 3, 1]
# nzval = [2, 7, 8, 3, 4, 5, 1, 5, 4, 3, 7]

The colptr may seem a little hard to understand. In fact, if you iterate the array using only one iterator, you get a sequence which is the actual storing order of the matrix [2, 7, 8, 3, 4, 5, 1, 5, 4, 3, 7], and the column pointers records the first indexes of the value occurring in the column, or the first expected value in the column if there no value in this column. For example, the first expected value of the 1st column is 2, which has an index of 1 in the iterating list, and the first expected value of the 2nd column is 8 which has an index of 3 in the iterating list, and so on. Then we get the list of column pointers.

Notice that there’s no non-sparse element in the 6th column, and next we are expecting the value 7 to occur, then we should fill 11(the index of 7 in the iterating list) in the array of column pointers.

And the rowval is easier to understand when you just need to assign a row index to each element and so does the nzval.

And now we may construct a specific sparse matrix we would like to.

Dispatching A Missions List in One Line

When doing parallel computing, especially when using DistributedArrays, one may always want to dispatch/separate an array into (even) parts. It seems that I couldn’t find a simple way or method achieving that. Let’s say you want M missions dispatched/separated/grouped/splitted to N processes/workers, so the following may help:

# dispatch your mission
M = 50
N = 8
dlist = [collect(i:N:M) for i=1:N]
# [1,9,17,25,33,41,49] 
# [2,10,18,26,34,42,50]
# [3,11,19,27,35,43]   
# [4,12,20,28,36,44]   
# [5,13,21,29,37,45]   
# [6,14,22,30,38,46]   
# [7,15,23,31,39,47]   
# [8,16,24,32,40,48]

And you don’t need to worry about whether M can be divided exactly. But this seems quite “sparse”, what if I want neighboring missions dispatched together to the same process?

# dispatch your mission
M = 50
N = 8
[p[1]+1:p[2] for p in[unshift!([sum([floor(Int64,M/N)+(i<=M%N) for i=1:N][1:j]) for j=1:N],0)[k:k+1] for k=1:N]]
# 8-element Array{Any,1}:
# 1:7  
# 8:14 
# 15:20
# 21:26
# 27:32
# 33:38
# 39:44
# 45:50
[collect(p[1]+1:p[2]) for p in[unshift!([sum([floor(Int64,M/N)+(i<=M%N) for i=1:N][1:j]) for j=1:N],0)[k:k+1] for k=1:N]]
# 8-element Array{Any,1}:
# [1,2,3,4,5,6,7]     
# [8,9,10,11,12,13,14]
# [15,16,17,18,19,20] 
# [21,22,23,24,25,26] 
# [27,28,29,30,31,32] 
# [33,34,35,36,37,38] 
# [39,40,41,42,43,44] 
# [45,46,47,48,49,50]

Just as what I want, but can the code itself be more elegant? Well, perhaps you may find this easier to understand:

# dispatch your mission
M = 50
N = 8
a = [floor(Int64,M/N)+(i<=M%N) for i=1:N] # how many elements in each group
# [7, 7, 6, 6, 6, 6, 6, 6]
b = unshift!([sum(a[1:j]) for j=1:N],0) # initial-1 and ending elements of each group
# [0, 7, 14, 20, 26, 32, 38, 44, 50]
c = [collect(p[1]+1:p[2]) for p in[b[k:k+1] for k=1:N]] # create array between consecutive elements
# c is the demanded grouping scheme 

Well I am thinking of some better and more elegant method, which may take some time. Before that you can wrap the code into a function for convenience.

And I translated it to Python, which writes like:

# dispatch your mission in Python
M = 50
N = 8
L = [list(range(p[0]+1,p[1]+1)) for p in [([0]+[sum([int(M/N)+(i<=M%N) for i in range(1,N+1)][0:j]) for j in range(1,N+1)])[k:k+2] for k in range(0,N)]]

It looks…… Perhaps there are better ways…

Avoid Some Structures When Saving with JLD

You know, some structures should be avoided, like the Dict{UTF8String,Array{Float64,1}}, even if you explicitly declare the types of every components, they are still slow when saving…
Continue reading Bricks on Julia

Collections on Julia

I will update the post sometimes…

Deep Learnings on Julia



MXNet Documentation:







The Julia Express:

Map, Filter and Reduce in Julia:

What is a “symbol” in Julia?

[CN] An Interesting Introduction:

[CN] Talking About Parallel Computing & Performance Tips:

Interesting Automatic Differentiation/Gradient Packages:

Neural networks and a dive into Julia:

Tricked out iterators in Julia:

Julia for Machine Learning:

Running Shell Commands from Julia

A Weekend With Julia: An R User’s Reflections:

A Lisper’s first impression of Julia:

Fun with Julia, metaprogramming and Sublime Text:

Fun With Just-In-Time Compiling: Julia, Python, R and pqR:

Related Blogs

Cultivating a Simple Life.:

Chiyuan Zhang:

A Julia Language Blog Aggregator:

Continue reading Collections on Julia

Another Day on Julia

Feelings During Writing

There was a time when I thought this language was about to get out of my control. How could I write such codes and my dear how could it executed the codes in such a way!

Calculating 9,000,000,000+ (9 billion+) Cosine Distances on 20 Processes Simultaneously
Calculating 9,000,000,000+ (9 billion+) Cosine Distances on 20 Processes Simultaneously

JLD Performance on Different String Types

It seems Julia is slow dealing with kinds of strings, so does the JLD serializations. I compared the following:

# a little test
using JLD
a = rand(10000000) # 10000000-element Array{Float64,1}
b = [string(i) for i in a] # 10000000-element Array{Any,1}
@time save("./test_float.jld","_",a) # 1.100200 seconds (1.62 M allocations: 72.982 MB)
@time save("./test_string.jld","_",b) # terminated when running 5+ minutes

While I suspected that whether the type of array affected the efficiency, I continued the following:

# a little test
using JLD
a = rand(10000000) # 10000000-element Array{Float64,1}
b = [string(i) for i in a] # 10000000-element Array{Any,1}
c = Array{AbstractString,1}(b) # 10000000-element Array{AbstractString,1}
d = Array{ASCIIString,1}(b) # 10000000-element Array{ASCIIString,1}
e = Array{UTF8String,1}(b) # 10000000-element Array{UTF8String,1}
@time save("./test_abstr.jld","_",c) # terminated when running 5+ minutes
@time save("./test_ascstr.jld","_",d) # 2.094302 seconds (10.06 M allocations: 231.583 MB)
@time save("./test_utfstr.jld","_",e) # 3.230833 seconds (10.05 M allocations: 231.363 MB, 44.33% gc time)

That’s quite a relief! Using some definite type of array definitions may probably accelerate the serialization. In some cases it seemed that JLD has flopped into a infinite loop (I don’t know whether there’s a bug).
Continue reading Another Day on Julia

One Day on Julia

Meeting Julia at Coffee Break

It was a sunny day when I gradually got tired of some programming language, which is slow to some degree. And above all I made no actual progress on efficiency of code execution during the past few years, of course my laziness should account for which. Finally, during the coffee break of my boss, I had a glimpse of Julia.

And I think this would help me out after some quite readings.

Here’s what I read, and what impressed me most is definitely the performance of Julia compared to other languages:

  • Julia’s Role in Data Science
  • Official Website of Julia
    • You may find a lot of useful introductions and tutorials here. There’s a collection of them on the website which helped me a lot.
    • Currently the some of the documents of the website need updates, and for newer discussions you may want to go the their github site.
  • Source of Julia (GitHub) (Just for more reference, of course I was not reading the source code…)
    • Here’s where most recent issues will be discussed, better visited when you dig into the language.
Julia Performance Compared to Other Languages
Julia Performance Compared to Other Languages (From the Official Website)

I saw comments on defects of the experiment settings, but my passion urged me to continue finding out the truth by myself, and here we go.

Continue reading One Day on Julia