An Overview of the Language
This section is meant to give you a running start in using Julia. The ultimate reference for all things regarding the language is the official manual, where you will find much more information.
Package Management
Julia will open up in the default environment, e.g., @v1.9. You can add packages to this environment, and they will be available to load as soon as you open Julia. Be very careful about doing this, as this can very often lead to what is lightly called dependency hell. This is what happens when a package you want to update depends on another package that can’t update, sometime because it depends on a specific version of yet another package. The more packages you have in the current environment, the more often this will happen. The correct solution is to create a new environment for each project and add only the packages you are actually using in the project.
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.9.1 (2023-06-07)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
cd("./Dummy")
pwd()
# "D:\\JuliaCode\\Dummy"
(@v1.9) pkg> activate . #type ] to activate package manager in REPL
# Activating project at `D:\JuliaCode\Dummy`
(Dummy) pkg> add UnicodePlots
# Updating registry at `C:\Users\Braam\.julia\registries\General.toml`
# Resolving package versions...
# Updating `D:\JuliaCode\Dummy\Project.toml`
# [b8865327] + UnicodePlots v3.4.1
# Updating `D:\JuliaCode\Dummy\Manifest.toml`
# [d360d2e6] + ChainRulesCore v1.15.7
# [9e997f8a] + ChangesOfVariables v0.1.6
# [35d6a980] + ColorSchemes v3.20.0
# .
# .
# .
(Dummy) pkg> st
# Status `D:\JuliaCode\Dummy\Project.toml`
# [b8865327] UnicodePlots v3.4.1Here we changed the current directory to the Dummy sub-folder. Note the use of a forward slash. Backslashes are used for special characters, , \n for a new line and must themselves be escaped by typing a double backslash: cd(".\\Dummy"). Or you can use a raw string: cd(raw".\Dummy"). The forward slash however works just as well in Linux and MaxOS, so is preferred.
pwd() is a function that replicates the Unix (Linux) command for print working directory and simply returns the name of the current folder.
The package manager is then activated with ] and the current folder is activated as a project with activate ., where . means the current directory (.. means the parent directory).
Finally, we add a registered package, UnicodePlots.jl, and once the installation is done, check which packages and versions are currently installed with the st command (short for status).
Once a package is installed, it can be loaded by issuing the using command:
using UnicodePlotsIn the Dummy folder, Julia creates two files: project.toml and manifest.toml. These hold the record of the specific versions of packages that have been installed for this project. The packages directly installed are in the project.toml file, while the version numbers of dependencies are tracked in manifest.toml. These two files mean that someone else can reinstate the exact environment you developed your code in by activating the project and issuing the instantiate command to the package manager. This will install the same versions of the packages and dependencies as listed in the *.toml files.
Each project should be in its own folder, with its own *.toml files. This means different projects can potentially use different versions of the same package, depending on what other packages are in use.
To update the packages and dependencies to the latest versions (as allowed for by the specified versions of dependencies for each package), use the up command of the package manager. This updates packages for the current project only.
Using vs Import
There are two ways to load a Julia package: using and import
If you use using, all methods and variables exported by a package are brought into the current namespace. You can call them directly:
using Plots
scatter(rand(10), rand(10))This does mean that several functions and variables you are not using are now also in the namespace and you cannot define a new function with the same name or use another package that exports a function with the same name. For those cases, you can use import. If a package is imported, you need to prepend each function call with the package name.
import Plots
import GLMakie
Plots.scatter(rand(10), rand(10))
GLMakie.scatter(rand(10), rand(10))See what happens when you use using for the previous example.
You can load specific functions or variables from a package:
using Plots: scatter
import GLMakie: linesNone of the other exported variables or functions will become available. When loading a single item, you can also rename it using the as keyword:
import Plots.scatter as ps
import GLMakie: scatter as ms
ps(rand(10), rand(10))
ms(rand(10), rand(10))or even rename the package during import:
import BenchmarkTools as bt
bt.@benchmark sin.(1:1_000_000)
# BenchmarkTools.Trial: 400 samples with 1 evaluation.
# Range (min … max): 11.751 ms … 24.278 ms ┊ GC (min … max): 0.00% … 49.13%
# Time (median): 12.043 ms ┊ GC (median): 0.00%
# Time (mean ± σ): 12.515 ms ± 2.246 ms ┊ GC (mean ± σ): 3.56% ± 9.50%
# ▆█▅
# ████▆▁▄▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▁▁▁▅▇▆ ▆
# 11.8 ms Histogram: log(frequency) by time 24 ms <
# Memory estimate: 7.63 MiB, allocs estimate: 2.Variables
There is no programming without variables. In Julia, there is no need to pre-declare a variable (although you can) and variables can be reassigned to a value of another type:
x = 1
# 1
typeof(x)
# Int64
x = 2.4
# 2.4
typeof(x)
# Float64
x = "Hello, World"
# "Hello, World"
typeof(x)
# StringWhen assigning a value to the variable x, Julia infers the type of the value, e.g., Int64 and then associates the name, x, with that value.
As illustrated, the function typeof() will return the type of the variable or value. While you don’t have to specify the types of variables, Julia is a strongly typed language, just like C or Fortran. The type of a variable is just automatically inferred whenever possible.
Since variables can be reassigned in the global scope, this makes it difficult for the compiler to generate optimised code - the type of the variable could change. If you want fast code, then avoid using global variables, or if you absolutely have to use them, declare them as const.
const MyConst = 1
# 1
typeof(MyConst)
# Int64
MyConst = 2
# WARNING: redefinition of constant MyConst. This may fail, cause incorrect answers, or produce other errors.
# 2
MyConst = 2.0
# ERROR: invalid redefinition of constant MyConst
# Stacktrace:
# [1] top-level scope
# @ REPL[4]:1Declaring a variable as const freezes the type of the variable, allowing more optimisations. It is possible, but not recommended, to change the value, but not the type. It is expected that changes to the values will also be prevented in future versions of Julia.
The best practice, however, is to put your code inside functions.
Strings and Characters
Text values are stored as either strings ("This is a string", type String) or single characters ('c', type Char). Note the use of double and single quotation marks for String vs Char. A string can be just one character long: "A". Strings can also include multiple lines, and quotation marks, when enclosed with triple quotation marks:
s = """
This is my
very long
string with "quotes"
"""
# "This is my \nvery long\nstring with \"quotes\"\n"
s[5]
# ' ': ASCII/Unicode U+0020 (category Zs: Separator, space)
s[5:10]
# " is my"Note how new lines are indicated with '\n', which is itself a character (n is short for new line).
Numerical values can be converted to strings using the string() function:
string(123)
# "123"Strings and characters in Julia are encoded in UTF-8 Unicode. This allows all kinds of characters, from mathematical symbols to Chinese characters and emojis to be used. It also means that not all glyphs take the same amount of memory to store. This can be confusing when processing strings. If you intend to work with strings, read the relevant sections of the manual carefully.
A somewhat unintuitive feature of Julia is that string concatenation is done via the * operator. This does have the advantage that multiple copies of a string can be concatenated via the exponentiation (^) operator:
str = "lala"
# "lala"
str*str
# "lalalala"
str^5
# "lalalalalalalalalala"Julia includes a vast number of string processing functions. These are discussed in detail in the manual.
String Interpolation
You can interpolate a value from a variable or expression into a string, using the $ character:
value = 1.0
# 1.0
key = "myvar"
# "myvar"
println("The key is: $key and the value is: $value.")
# The key is: myvar and the value is: 1.0.
println("Twice the value is: $(2*value)")
# Twice the value is: 2.0Integers and Floating-point Values
Julia has the usual selection of variable types for integers and floating-point values. There are signed and unsigned versions of 8, 16, 32, 64 and 128-bit integers. See the manual for details.
There is also a Bool type that holds either true (1) or false (0).
For floating-point values, there is Float16, Float32 and Float64.
The default integer, Int is equivalent to Int64 for 64-bit Julia and Int32 for 32-bit Julia. Generally, you would want to use Float64 for floating-point numbers, unless there is a specific reason not to. Calculations on a GPU (via e.g., CUDA.jl) should be done using Float32, unless you have a very expensive1 GPU capable of processing 64-bit floating point calculations.
A Word on Floating-Point Values
Floating-point values are stored in a limited number of bits (typically 64 bits - a.k.a. double precision) and hence have a limited precision. The result is that most values cannot be precisely stored in a Float64 variable. As a simple example, 1/10, which is clearly precisely equal to 0.1, is actually calculated as 0.1000000000000000055511151231257827021181583404541015625 with 256-bit precision.
The smallest positive value that can be stored in a Float64, is 2.220446049250313e-16. You can calculate this in Julia using the eps() function e.g., eps(Float64). The value epsilon is an indication of the precision you can expect. It is the smallest value you can add to the floating-point value that will cause it to result in a new value. Anything smaller may as well be zero. Obviously, epsilon depends on the magnitude of the values you are working with, and you can again use the eps() function: eps(100) = 1.4210854715202004e-14, so adding a smaller amount to 100 will not change the value.
While you can safely ignore this for many, if not most engineering calculations, it can sometimes become an issue when you least expect it. Consider adding up a very large number of small values. The larger the difference between the running total and the next value you are adding, the larger the rounding error will become. At some point in this exercise, the value you are adding to the running total will be in the order of the relative epsilon and the running total will stop increasing, no matter how many more values you add. The solution to this is actually fairly simple and implemented in the Julia sum() function.
The list of values is split into pairs and the pairs are added to each other, then this is repeated over and over until there is only one value left - the answer. The algorithm works under the inherent assumption that the values are fairly equally sized and so adding similar values results in minimal rounding error. Once the pairs have been summed, the new values should also be similarly sized and so the process repeats, with a minimum rounding error at each step.
Something else to consider, which much more often trips up new programmers, is that you will very rarely find two identical floating-point values through calculations. Directly comparing values that are realistically speaking equal, will very often result in the wrong part of an if statement executing. When comparing floating-point values, always use a check for approximate equality. This is done either via the ≈ operator (\approx<tab>), or the isapprox() function, which allows you to specify absolute and relative tolerances. The ≈ operator calls isapprox() with default tolerances.
The internal storage of floating-point values is standardised by the IEEE 754 standard, which is used in just about every programming language. This is important, since it means you can pass values between Julia and code running in R or Python or compiled in C.
BigInt and BigFloat
Sometimes, you may find a need for additional precision. Arbitrary-precision integer and floating-point types are available as BigInt and BigFloat.
There are several ways to specify that you are using big numbers, but the simplest is via the big() function:
x = big(10.0)
# 10.0
typeof(x)
# BigFloatThe precision again comes at the cost of performance. See the manual for more details. You only need to specify that one variable/number in a calculation is big - the rest will be converted automatically by Julia.
Complex Numbers
Julia has built-in types for complex numbers, which depend on the integer or floating-point type used to store the real and imaginary parts, e.g.
typeof(1 + 2im)
# Complex{Int64}
typeof(1.0 + 2.0im)
# ComplexF64 (alias for Complex{Float64})As you will notice, a complex type is specified as Complex{T} where T is an integer or floating-point type. Any of the integer and floating-point types mentioned before could be used, including BigInt and BigFloat. Simply define the value as:
z = big(10.0) + 1.0im
# 10.0 + 1.0im
typeof(z)
# Complex{BigFloat}
z = big(10) + im
# 10 + 1im
typeof(z)
# Complex{BigInt}The imaginary part of the number is indicated by adding im directly behind the number - no space! As shown in the example, the 1 is optional in the imaginary part - you can just specify im for 1im.
The expected functions for use with complex numbers are available, including:
z = 1 + 1im
# 1 + 1im
real(z) # real part
# 1
imag(z) # imaginary part
# 1
conj(z) # complex conjugate
# 1 - 1im
abs(z) # absolute value - distance from zero
# 1.4142135623730951
abs2(z) # squared absolute value
# 2
angle(z) # phase angle (radians)
# 0.7853981633974483
angle(z) * 360/2π # convert to degrees
# 45.0
√z
# 1.09868411346781 + 0.45508986056222733im
sqrt(z)
# 1.09868411346781 + 0.45508986056222733imRational Numbers
You can also work with rational numbers.
a = 1//2 + 3//8
# 7//8
float(a)
# 0.875
rationalize(0.875)
# 7//8This eliminates rounding losses, but at a loss of performance:
function myfunc(x)
sum = zero(x)
for i in 1:10_000
sum += x * i
end
return sum
end
# myfunc (generic function with 1 method)
using BenchmarkTools
@benchmark myfunc(1.0)
# BenchmarkTools.Trial: 10000 samples with 3 evaluations.
# Range (min … max): 8.500 μs … 73.733 μs ┊ GC (min … max): 0.00% … 0.00%
# Time (median): 8.700 μs ┊ GC (median): 0.00%
# Time (mean ± σ): 8.704 μs ± 966.620 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
# ▅ █ ▆ ▃ █ █ ▅ ▂ ▁ ▁ ▁ ▂
# █▁▁█▁▁█▁▁█▁▁▁█▁▁█▁▁█▁▁█▁▁▁█▁▁█▁▁█▁▁▁█▁▁█▁▁█▁▁█▁▁▁▇▁▁█▁▁▇▁▁▄ █
# 8.5 μs Histogram: log(frequency) by time 9.1 μs <
# Memory estimate: 0 bytes, allocs estimate: 0.
@benchmark myfunc(1//1)
# BenchmarkTools.Trial: 7295 samples with 1 evaluation.
# Range (min … max): 663.600 μs … 1.234 ms ┊ GC (min … max): 0.00% … 0.00%
# Time (median): 680.100 μs ┊ GC (median): 0.00%
# Time (mean ± σ): 682.504 μs ± 29.021 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
# ▇█▆▃▂▃██▅▄▃▃▃▂▃▁▁▁▁ ▁ ▂
# ███████████████████████▇▇████▇▇█▆▆▇▇▇▅▆▆▅▆▅▅▅▃▅▅▄▆▅▃▅▅▄▄▁▃▃▅ █
# 664 μs Histogram: log(frequency) by time 811 μs <
# Memory estimate: 0 bytes, allocs estimate: 0.So, floating-point calculations are about 78x faster than with rational numbers. What about integers?
@benchmark myfunc(1)
# BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
# Range (min … max): 1.900 ns … 27.200 ns ┊ GC (min … max): 0.00% … 0.00%
# Time (median): 2.000 ns ┊ GC (median): 0.00%
# Time (mean ± σ): 1.976 ns ± 0.530 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
# ▄ █
# █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█ ▂
# 1.9 ns Histogram: frequency by time 2 ns <
# Memory estimate: 0 bytes, allocs estimate: 0.1.9 nanoseconds?!? That sounds too good to be true, doesn’t it? Let’s see what type of code Julia generated to make that possible.
@code_llvm myfunc(1)
; @ REPL[6]:1 within `myfunc`
; Function Attrs: uwtable
define i64 @julia_myfunc_739(i64 signext %0) #0 {
top:
; @ REPL[6]:3 within `myfunc`
%1 = mul i64 %0, 50005000
; @ REPL[6]:6 within `myfunc`
ret i64 %1
}We can again ignore any line starting with a semi-colon. The one line that matters is this:
%1 = mul i64 %0, 50005000The Julia compiler could analyse the code well enough to see that the answer to our function is simply 50005000 times the input value and that is exactly what it returned!
With floating-point and rational values, there were type conversions required before multiplying the integer value of the loop counter with the input value. This “hid” the true nature of the calculation enough that the compiler could not see the short-cut. There is however continuous development in the compiler, and we can reasonably expect this to also be optimised in a future version of Julia.
Arrays, Tuples and Ranges
Arrays
In most code, you will find it convenient to deal with a collection of values at the same time. There are several ways of doing this.
The most common collection of values is an Array. In mathematics, you will be familiar with vectors and matrices. These are simply one- and two-dimensional arrays. You can have arrays with any number of dimensions (tensors). The keywords Vector and Matrix are also available as synonyms for Array in the special cases of one and two dimensions.
Simple one-dimensional arrays are treated as column vectors for use in linear algebra calculations.
Some examples:
a = [1, 2, 3] # use commas to specify column vectors
# 3-element Vector{Int64}:
# 1
# 2
# 3
b = [1 2 3] # use spaces to specify row vectors
# 1×3 Matrix{Int64}:
# 1 2 3
a * b
# 3×3 Matrix{Int64}:
# 1 2 3
# 2 4 6
# 3 6 9
A = [1 2 3; # directly specify a 2D array a.k.a. a matrix
4 5 6;
7 8 9]
# 3×3 Matrix{Int64}:
# 1 2 3
# 4 5 6
# 7 8 9
B = zeros(3, 3, 3) # zeros() and ones() fill the array of the specified size
# 3×3×3 Array{Float64, 3}:
# [:, :, 1] =
# 0.0 0.0 0.0
# 0.0 0.0 0.0
# 0.0 0.0 0.0
# [:, :, 2] =
# 0.0 0.0 0.0
# 0.0 0.0 0.0
# 0.0 0.0 0.0
# [:, :, 3] =
# 0.0 0.0 0.0
# 0.0 0.0 0.0
# 0.0 0.0 0.0
C = zeros(Int64, 2, 2) # You can specify the type - Float64 is the default
# 2×2 Matrix{Int64}:
# 0 0
# 0 0
D = [1;2;;3;4;;;5;6;;7;8;;;9;10;;11;12] # ; separates in first dimension, ;; in second dimension, ;;; in third etc.
# 2×2×3 Array{Int64, 3}:
# [:, :, 1] =
# 1 3
# 2 4
# [:, :, 2] =
# 5 7
# 6 8
# [:, :, 3] =
# 9 11
# 10 12
E = Float64[] # an empty 1D array of Float64
# Float64[]The individual entries of an array are accessed via [], e.g.,
A[2, 3]
# 6In the background, [] calls getindex() and setindex() to retrieve or modify the entries of the array. If you define your own array-like type, you will need to supply the appropriate getindex() and setindex() functions.
You can concatenate arrays horizontally and vertically with hcat() and vcat(), or using the syntax above with spaces or semi-colons:
A = [1, 2, 3]
# 3-element Vector{Int64}:
# 1
# 2
# 3
B = [4, 5, 6]
# 3-element Vector{Int64}:
# 4
# 5
# 6
[A; B]
# 6-element Vector{Int64}:
# 1
# 2
# 3
# 4
# 5
# 6
[A B]
# 3×2 Matrix{Int64}:
# 1 4
# 2 5
# 3 6
vcat(A, B)
# 6-element Vector{Int64}:
# 1
# 2
# 3
# 4
# 5
# 6
hcat(A, B)
# 3×2 Matrix{Int64}:
# 1 4
# 2 5
# 3 6Tuples
Tuples are functionally similar to arrays. They are specified with commas and (optional) parentheses. They are intimately linked with passing parameters to functions and returning values from functions, e.g.
function myfunc(a, b)
return a*b, a+b
end
# myfunc (generic function with 1 method)
myfunc(1, 2)
# (2, 3)
typeof(myfunc(1, 2))
# Tuple{Int64, Int64}We pass the parameters to the function as a tuple, (a, b). The results are returned as a tuple, (2, 3). Other than passing parameters to functions, what is the use of tuples? They are fairly widely used in Julia. The main differences from arrays are that while arrays are heap-allocated, tuples are stack-allocated. This is some more computer jargon, but what it means in practise is that tuples can be accessed faster than arrays but cannot be as large.
Tuples are also immutable, meaning once created, they cannot be modified. The entries of an array can always be modified. This is important to keep in mind when writing functions in Julia. The only way you can modify a parameter passed to a function is if that parameter is an array. And then only the individual entries of the array can be changed - you cannot replace the array with another one (more detail later).
Some examples:
a = (1, 2, 3)
# (1, 2, 3)
typeof(a)
# Tuple{Int64, Int64, Int64}
b = (1., 2., 3, 1//2)
# (1.0, 2.0, 3, 1//2)
typeof(b)
# Tuple{Float64, Float64, Int64, Rational{Int64}}
a[2]
# 2
c = (1,) # A tuple with only one value - note the trailing comma!
# (1,)
typeof(c)
# Tuple{Int64}
d = 1,2,3 # the parentheses are optional
# (1, 2, 3)
typeof(d)
# Tuple{Int64, Int64, Int64}We again use [] to access the individual entries.
a = (1, 2, 3)
# (1, 2, 3)
a[2]
# 2Tuples can be unpacked into variables:
a
# (1, 2, 3)
x, y, z = a
# (1, 2, 3)
x
# 1
y
# 2
z
# 3And since the parentheses are optional in the direct specification of tuples, we can do this:
x = 1
# 1
y = 2
# 2
y, x = x, y
# (1, 2)
x
# 2
y
# 1Here we defined a tuple (x, y) and then unpacked it into the variables y and x, swapping their values.
Since tuples have superior performance to arrays, there exists a package that builds small arrays-like structures from tuples - StaticArrays.jl. This is commonly used for maximum performance, but only for smallish arrays, typically less than 200-300 entries. More than that and you run out of space on the stack, which has a limited size.
Named Tuples
You can also name the entries in a tuple and access them via the names instead of indexes:
nt = (a = 1, b = 2, c = "Bob")
# (a = 1, b = 2, c = "Bob")
typeof(nt)
# NamedTuple{(:a, :b, :c), Tuple{Int64, Int64, String}}
nt.a
# 1
nt[2]
# 2
nt.c
# "Bob"
nt.c === nt[3] # The values are "egal", i.e. not just equal, but the actual same bits in memory
# trueNote that the entries in a tuple or named tuple needn’t be all of the same type. This is also true for arrays, but in that case the array will be of type Any and performance will be hugely impacted. Avoid this whenever possible!
Ranges
The third collection type we are considering are ranges. There are several ways to specify a range:
a = 1:10 # start : stop with default step of one, hence a unit range
# 1:10
typeof(a)
# UnitRange{Int64}
b = 1:2:20 # start : step : stop, hence a step range
# 1:2:19
typeof(b)
# StepRange{Int64, Int64}
c = 1.0:0.5:5.0 # with floating-point steps, we get a StepRangeLen
# 1.0:0.5:5.0
typeof(c)
# StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}
d = range(2.0, step=5.3, length=5) # instead of colon notation, you can call the function with more options
# 2.0:5.3:23.2
typeof(d)
# StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}The Base.TwicePrecision{Float64} part may be confusion. Directly from the Julia help (type ?Base.TwicePrecision at the REPL prompt:
TwicePrecision is an internal type used to increase the precision of floating-point ranges, and not intended for external use. If you encounter them in real code, the most likely explanation is that you are directly accessing the fields of a range. Use the function interface instead, step(r) rather than r.step
And just like before, we access the entries with []:
a[2]
# 2
b[3]
# 5
c[4]
# 2.5
d[5]
# 23.2The main difference is that, while arrays and tuples consist of values stored in memory, ranges are lazy. The values are only calculated when they are requested and no matter the length of the range, it takes up the same amount of memory - just enough to store the start, step and stop/length values that are needed to calculate any entry and know when you have run through the whole range.
You can change a range into an array, with the collect() function:
collect(1:10)
# 10-element Vector{Int64}:
# 1
# 2
# 3
# 4
# 5
# 6
# 7
# 8
# 9
# 10Iterables
Arrays, tuples and ranges (and strings) are all iterable types, meaning you can iterate through their entries:
a = [1, 5, 10]
# 3-element Vector{Int64}:
# 1
# 5
# 10
for i in a
println(i)
end
# 1
# 5
# 10
for j in 1:3
println(j*2)
end
# 2
# 4
# 6
t = (1, 2, 3)
# (1, 2, 3)
for k in t
println(k)
end
# 1
# 2
# 3You can also use the Unicode symbol ∈ (\in<tab>) in place of the word in.
From a programmer’s point of view, iterable types all behave the same. We can therefore write a function that can handle any iterable type. The only requirement for this example is that the item passed should have at least two entries, or there will be an error.
function mycomp(a)
if a[1] > a[2]
return true
else
return false
end
end
# mycomp (generic function with 1 method)
ar = [1, 2]
# 2-element Vector{Int64}:
# 1
# 2
t = (2, 1)
# (2, 1)
r = 10:-1:1
# 10:-1:1
mycomp(ar)
# false
mycomp(t)
# true
mycomp(r)
# trueFor each case, the Julia compiler will generate optimised code depending on the type of the variable passed.
Indexing
In the examples above, we used one-dimensional arrays and there was really no choice in how to index into the structure. The entries are all in a row in memory.
In the case if multi-dimensional arrays, the entries are still sequential in memory - that is how RAM works! Here, however, there are better and worse options for accessing a given entry in the array.
This is because of how the values are stored in memory. It is faster to sequentially access values that are stored next to each other than to jump around in memory. Julia is column major, meaning that the values in the first column of a matrix are stored next to each other in memory, followed by the values of the second column etc. For higher dimensions, the sequence is similar: each subsequent index follows the next. This is then the fastest way of iterating through the whole array.
A = rand(10_000, 10_000); # the _ is ignored - it is just to make reading easier. The ; at the end suppresses output of the result
function myfunc(A)
mysum = 0.0
m, n = size(A)
for i = 1:m, j = 1:n # we are running through the matrix a row at a time
mysum += A[i, j]
end
return mysum
end
# myfunc (generic function with 1 method)
function myfunc2(A)
mysum = 0.0
m, n = size(A)
for j = 1:n, i = 1:m # we are running through the matrix a column at a time
mysum += A[i, j]
end
return mysum
end
# myfunc2 (generic function with 1 method)
using BenchmarkTools
@btime myfunc(A)
# 644.754 ms (1 allocation: 16 bytes)
# 4.99959369384022e7
@btime myfunc2(A)
# 94.296 ms (1 allocation: 16 bytes)
# 4.999593693838226e7So, in our 10,000 x 10,000 random matrix, summing up the values row-wise takes 644.754ms, while column-wise it takes only 96.449ms. Quite the improvement! If you are worried you won’t remember the correct way of iterating through a structure, Julia has you covered. Use eachindex() to get the optimal sequence:
function myfunc3(A)
mysum = 0.0
m, n = size(A)
for i in eachindex(A)
mysum += A[i]
end
return mysum
end
# myfunc3 (generic function with 1 method)
@btime myfunc3(A)
# 96.394 ms (1 allocation: 16 bytes)
# 5.000090241950418e7eachindex() returns a linear index, no matter what the dimensionality of the array is and sequences it for the fastest sequential access.
Another useful indexing function is enumerate(). It returns an iterator of tuples, each containing an index and value pair. The indexes are linear, not cartesian.
A = rand(2,2)
# 2×2 Matrix{Float64}:
# 0.488632 0.177813
# 0.221677 0.559213
for (index, value) in enumerate(A)
println("$index $value")
end
# 1 0.4886321057630626
# 2 0.22167740760406662
# 3 0.17781317540395236
# 4 0.5592126504604934For more information, see the manual.
Julia also provides the keywords begin and end to directly index the first and last entries in an array:
a = [1, 2, 3]
# 3-element Vector{Int64}: …
a[begin]
# 1
a[end]
# 3
a[end-1] # second last entry
# 2Array Assignments
The array variable is a pointer2 that holds the address of the memory space where the contents is kept (on the heap). This is not an implementation detail - it is an important thing to keep in mind, as there is a difference between modifying the entries of the array (the values in the memory the array variable points to) and changing the array variable (which memory it points to):
a = collect(1:5)
# 5-element Vector{Int64}:
# 1
# 2
# 3
# 4
# 5
b = a
# 5-element Vector{Int64}:
# 1
# 2
# 3
# 4
# 5
b[2] = -1
# -1
a
# 5-element Vector{Int64}:
# 1
# -1
# 3
# 4
# 5
a[2] = 10
# 10
b
# 5-element Vector{Int64}:
# 1
# 10
# 3
# 4
# 5Here we created an array variable, a. Then we assigned a to b. The result is a new array variable that points to the same memory space. If we change the contents of b, we also change the contents of a, and vice versa.
If you want an independent copy of a, then use the copy() function:
a = collect(1:5)
# 5-element Vector{Int64}:
# 1
# 2
# 3
# 4
# 5
b = copy(a)
# 5-element Vector{Int64}:
# 1
# 2
# 3
# 4
# 5
b[2] = -1
# -1
a
# 5-element Vector{Int64}:
# 1
# 2
# 3
# 4
# 5
b
# 5-element Vector{Int64}:
# 1
# -1
# 3
# 4
# 5Array Comprehensions
This is a quick, flexible way of creating arrays with specific values, best illustrated by example:
a = [sin(i^2) for i in 0:0.1:2π]
# 63-element Vector{Float64}:
# 0.0
# 0.009999833334166666
# 0.03998933418663417
# 0.08987854919801107
# 0.159318206614246
# 0.24740395925452294
# ⋮
# 0.7940962483324946
# -0.24980688359658182
# -0.9917788534431158
# -0.4698420526176865
# 0.6749435215575963
[1/(x+y) for x in 1:3, y in 1:3]
# 3×3 Matrix{Float64}:
# 0.5 0.333333 0.25
# 0.333333 0.25 0.2
# 0.25 0.2 0.166667
[1/(x+y) for x in 1:3 for y in 1:3] # A second `for` keyword and no comma!
# 9-element Vector{Float64}:
# 0.5
# 0.3333333333333333
# 0.25
# 0.3333333333333333
# 0.25
# 0.2
# 0.25
# 0.2
# 0.16666666666666666 Take note of the subtle difference between the second and third examples. This is not terribly intuitive!
Broadcasting
You can process each entry in an iterable in a loop, or you can use the built-in broadcasting (a.k.a. dot notation). The following are equivalent:
a = collect(1:5) # create an array
# 5-element Vector{Int64}:
# 1
# 2
# 3
# 4
# 5
b = 2:2:10 # create a range
# 2:2:10
c = similar(a) # create an uninitialised array with the same size and type as a
# 5-element Vector{Int64}:
# 0
# 140734236573968
# 140734236545296
# 0
# 0
for i in 1:length(a)
c[i] = a[i] + b[i]
end
c
# 5-element Vector{Int64}:
# 3
# 6
# 9
# 12
# 15
d = a .+ b # element-wise addition
# 5-element Vector{Int64}:
# 3
# 6
# 9
# 12
# 15
e = sqrt.(d) # works for any function (note the . is between the name and the open parenthesis)
# 5-element Vector{Float64}:
# 1.7320508075688772
# 2.449489742783178
# 3.0
# 3.4641016151377544
# 3.872983346207417The Type Hierarchy
The built-in types have a hierarchy:
This consists of abstract and concrete types. You can only instantiate a variable of a concrete type, but the abstract types are useful to specify groups of types that would behave similarly, e.g., Float64 and Int64 could both the added and multiplied. This is true for all the Number types, be they Real or Complex. We could specify allowed groups of types, via the <: operator, or specific types, via ::, e.g.
function f(x::T) where T <: Number
#do something
endHere, Julia will allow us to call f(x) with any sub-type of the Number abstract type, such as Float64, Int32 or ComplexF64. Calling f(x) with a String type will give an error, rather than try to compile a specialised version of f().
We can also use this to have alternative versions of a function for different types of inputs, via multiple dispatch:
function addthem(a::T, b::S) where {T<:Number, S<:Number}
return a + b
end
function addthem(a::T, b::T) where T<:AbstractString
return a * b
end
addthem(1, 2)
#3
addthem(2, 2.0)
#4.0
addthem("One", "Two")
#OneTwoNote that in the first function we use T and S, both sub-types of Number. If both variables where T<:Number, then calling the function with an integer and a floating point value - the second example - would have given an error, as the two variables are not of the same type.
We also used AbstractString for the second function, so any type that acts like a String should work, as long as they have the * operator defined.
Structs
Structs are use-defined complex types that can contain multiple fields. They are defined using the struct keyword:
struct PersonData
name::String
address::String
ID::Int64
end
customers = PersonData[] # empty array of PersonData structs
# PersonData[]
typeof(customers)
# Vector{PersonData} (alias for Array{PersonData, 1})
bob = PersonData("Robert Smith", "2 Cypress Lane", 123456) # create a variable of type PersonData by calling the constructor
# PersonData("Robert Smith", "2 Cypress Lane", 123456)
push!(customers, bob) # push an entry into the array - will increase the length by one
# 1-element Vector{PersonData}:
# PersonData("Robert Smith", "2 Cypress Lane", 123456)
customers[1]
# PersonData("Robert Smith", "2 Cypress Lane", 123456)
bob.name # access the values via the field names
# "Robert Smith"
customers[1].name
# "Robert Smith"By default, a struct is immutable, meaning it cannot be changed once created:
bob.name = "Joe Bloggs"
# ERROR: setfield!: immutable struct of type PersonData cannot be changed
# Stacktrace:
# [1] setproperty!(x::PersonData, f::Symbol, v::String)
# @ Base .\Base.jl:39
# [2] top-level scope
# @ REPL[31]:1To make a struct mutable, simply add the keyword mutable to the definition.
mutable struct PersonData2
name::String
address::String
ID::Int64
end
bob = PersonData2("Robert Smith", "2 Cypress Lane", 123456)
# PersonData2("Robert Smith", "2 Cypress Lane", 123456)
bob.name
# "Robert Smith"
bob.name = "Joe Bloggs"
# "Joe Bloggs"
bob
# PersonData2("Joe Bloggs", "2 Cypress Lane", 123456)The struct is immutable by default to allow additional compiler optimisations. Mutable structs will therefore have worse performance, but more flexibility.
Constructors
Julia automatically creates a constructor for each struct you define. This is a function that takes the values of the fields, creates a new structure in memory and assigns the specified values to the fields.
You can also create additional constructors. There are two types:
- Outer constructors and
- Inner constructors
The default constructor is an outer constructor. You can also specify additional outer constructors that e.g., have default values for some fields, or calculate some values from others:
struct PersonData
name::String
address::String
ID::Int64
end
bob = PersonData("Robert Smith", "2 Cypress Lane", 123456)
# PersonData("Robert Smith", "2 Cypress Lane", 123456)
function PersonData(name, ID)
# Call the default constructor to do the allocations and assignments
PersonData(name, "No known address", ID)
end
# PersonData
sally = PersonData("Sally Jones", 123)
# PersonData("Sally Jones", "No known address", 123)
sally.address
# "No known address"
tom = PersonData("Tom Jones", "Las Vegas", 34556) # The default is still available
# PersonData("Tom Jones", "Las Vegas", 34556Inner constructors are defined as part of the struct definition and are used to validate the entries for the struct before creating the instance.
struct OnlyNegatives
val::Float64
OnlyNegatives(val) = val > 0 ? error("Non-negative value!") : new(val)
end
OnlyNegatives(-1)
# OnlyNegatives(-1.0)
OnlyNegatives(1)
# ERROR: Non-negative value!
# Stacktrace:
# [1] error(s::String)
# @ Base .\error.jl:35
# [2] OnlyNegatives(val::Int64)
# @ Main .\REPL[7]:3
# [3] top-level scope
# @ REPL[9]:1Inner constructors have access to a special local function, new(), which acts like the default outer constructor would.
If you define an inner constructor, NO default constructors are created. You need to handle all cases via your inner constructors.
Loops
Julia provides two types of loops:
forloops for iterating over an iterable construct, like a range or array with a known lengthwhileloops for iterating until a logical criterium is met, with an initially unknown number of iterations
a = zeros(5)
# 5-element Vector{Float64}:
# 0.0
# 0.0
# 0.0
# 0.0
# 0.0
for i in 1:length(a)
a[i] = 2*i^2
end
a
# 5-element Vector{Float64}:
# 2.0
# 8.0
# 18.0
# 32.0
# 50.0
while a[i] <= 10
println(i, "\t", a[i])
i += 1
end
# 1 2.0
# 2 8.0In a while loop, it is critical to manually implement a step (i += 1 in the example), or the loop will execute indefinitely!
Logic and Flow Control
At some point in your code, you will want to execute different instructions depending on some criteria. The most basic form of flow control is the if statement:
if a < 0
println("Negative")
elseif a == 0
println("Zero")
else
println("Positive")
endThe elseif and else blocks are optional. You can have multiple elseif blocks with different criteria. You can also nest if statements. The end statement is a requirement. Unlike many other languages, there is no then keyword (if…then…else…end).
Julia also offers a short-cut alternative for for simple if statements. The following two statements will both print the value Negative if a < 0.
a < 0 && println("Negative")
a >= 0 || println("Negative")This looks a little arcane. It works as follows:
The && operator is a logical and. For an and statement to be true, both criteria must be true, so if the first statement (before the &&) is false, Julia won’t bother to evaluate the second statement (after the &&), since the whole cannot be true. This is called short-circuiting evaluation. This means the println() is only executed if a is indeed less than zero.
The || operator is a logical or. For an or statement to be true, either of the two criteria must be true, so if the first statement is true, there is no need need to also evaluate the second - the whole is already true. The println() will therefore not be executed if a is larger than or equal to zero.
To help you remember how short-circuiting evaluations work, try this:
a < 0 && println("Negative") reads as a < 0 AND then println()
a >= 0 || println("Negative") reads as a >= 0 OR else println()
You may very well choose to stick to more readily understandable if statements in your code, but these short-circuit notations are commonly used and you will encounter them often in other people’s code
Finally, when you are only assigning values to a variable based on some criterium, there is the so-called ternary operator:
a
# -1
b = a > 0 ? "Positive" : "Not positive"
# "Not positive"These can be nested as well:
b = a > 0 ? "Positive" : (a == 0 ? "Zero" : "Negative")
# "Negative"The parentheses around the final statement are optional, but significantly improves readability.
Functions
Functions are the core of any Julia program. Inputs are passed as parameters and any number and type of values can be returned as a tuple. You can also modify the values of array parameters passed to the function. In this case, the function is called mutating and it is accepted practise in Julia to end the function name with an exclamation mark (or a bang, if you are American), to indicate that the function mutates one or more parameters. Typically, it would be the first parameter that is mutated.
Most people use the terms parameter and argument interchangeably as they refer to functions. If you want to be strictly formal, then a function is defined to accept parameters and you pass specific argument values when you call the function. Here, we won’t be formal - feel free to use either term.
There are four ways to declare a function:
- Full declaration
function myfunc(a, b)
return a + b
endThe keyword return is optional, but it significantly improves readability. In the absence of a return statement, Julia will return the last calculated result. You can also have multiple return statements in, e.g., an if block. When a return is executed (evaluated in Julia parlance), the function will return the specified value and exit.
- Short-cut for single-line functions
myfunc(a, b) = a + b- Anonymous functions, typically used when passing a single-use function to a higher-level function (a function that takes another function as input)
mysum = sum(x -> x^2 + √x, somearray)Here Julia will apply the function x -> x^2 + √x to each entry in the array somearray and then add up the results.
You technically can pass only one parameter to an anonymous function, but since that parameter can be a tuple (and tuples can contain scalars. arrays or even other tuples), there is practically no limit to the number of parameters passed, e.g.
f = (x, y) -> 2x/y^2Here we also assign the result of the function definition to a variable, f, which means we can later call f(x, y). There is really no reason to do this. Rather use the short-cut function definition above.
- Do blocks (similar to anonymous functions, but for multi-line functions)
open("myfile.txt", "w") do io
write(io, "Hello world!")
endHere, there are a few things to pay attention to:
- We are calling the higher-level function
open(), which takes another function as its first parameter. Thedoblock ALWAYS passes the specified (anonymous) function to the first parameter of the calling function. The nominal syntax foropen()isopen(f::Function, args...; kwargs...). The function,fis specified by thedoblock. - We pass the parameter
ioto the function be specifyingionext to the keyworddo. This is similar to thex -> ...in an anonymous function, here we effectively haveio -> ...
Passing Values to Functions and Returning Results
In all of the examples so far, we have passed values to the functions. These values are called parameters or arguments3 and there are two types of parameters in Julia: positional parameters and keyword arguments.
Positional parameters
Positional parameters are parameters where Julia knows which local (to the function) variable to assign the value to, based on the parameter’s position in the list of arguments. If we have a function f(x, y), and we call it with f(1, 2), Julia will assign the value of 1 to x and 2 to y.
Positional parameters can have default values: f(x, y=2). We can now either call the function with f(2, 4), in which case x will be assigned a value of 2 and y a value of 4, or we can omit the value for y and call f(2), in which was x will be assigned a value of 2 and y will use its default value, also 2.
Multiple parameters can have default values, e.g. f(x, y=2, z = 3). If we now call f(2), y and z use their default values. What happens when we call f(2, 10)?
function f(x, y=2, z=4)
println(x)
println(y)
println(z)
end
f(2)
# 2
# 2
# 4
f(2, 10)
# 2
# 10
# 4We see that the second value is passed to y, since it is the second positional parameter. Positional values are always assigned left to right, then default values are used for what is left. So how would we pass values for x and z and have y use its default value? The answer is keyword arguments.
Keyword arguments
Keyword arguments (a.k.a. keyword parameters, a.k.a. kwargs4) are identified by name when passing values to a function. To define them, we first list the positional parameters, then a semi-colon, then the keyword arguments. Keyword arguments must all have default values assigned to them.
function f(x; y=2, z=4)
println(x)
println(y)
println(z)
end
f(2)
# 2
# 2
# 4So, as expected, y and z are using their default values. We can now decide to override the defaults for y, z or both:
f(2, z=3)
# 2
# 2
# 3
f(2, y=5, z=10)
# 2
# 5
# 10Since x is a positional parameters, a value must always be passed for it, or we get an error:
f(y=5, z=10)
# ERROR: MethodError: no method matching f(; y::Int64, z::Int64)
# Closest candidates are:
# f(::Any; y, z)
# @ Main d:\JuliaCode\Julia4ER\Julia for Engineering Research\scrathpad.jl:42It is common to find that positional and keyword arguments are separated with a semi-colon at the point where the function is called, e.g. f(2; y=3, z=5). This is just done to emphasise that the parameters after the semi-colon are kwargs, but it is not required. A comma will do just fine.
Returning Results
As mentioned above, to return values from a function, there are two preferred options:
- Use the keyword
return - Mutate (modify) an array passed as parameter to the function
Functions in Julia will always return a value. Either there is an explicit return statement, or the last calculated value is automatically returned. It is good practise to always explicitly return a value. If your function returns nothing, then use return nothing to prevent an unexpected and unintended value from being returned.
You can have multiple return statements, but the first one to execute will terminate the function and return to the line after the call to the function.
function isnotnegative(x)
if x >= 0
return true
else
return false
end
endor more concisely
function isnotnegative(x)
return x >= 0
endThis is more understandable than the equivalent:
function isnotnegative()
x >= 0
endJulia uses pass-by-sharing for parameters passed to functions. If you don’t know what that means, you are in good company.
The practical implications however are fairly simply to understand. Parameters are immutable - you cannot modify them:
function trychangeing(x)
println("Passed value: $x")
x = 10
println("Local value: $x")
end
x = -1
trychangeing(x)
# Passed value: -1
# Local value: 10
println("Value at call site: $x")
# Value at call site: -1
a = [1]
trychangeing(a)
# Passed value: [1]
# Local value: 10
println("Value at call site: $a")
# Value at call site: [1]What happens is that Julia creates a new, local variable in the scope of the function, and assigns it the value of the argument that was passed. You can change the values of the parameters, but that does not affect the variables that were passed to the function - x and a in the examples above.
How do you then mutate an argument? You cannot change scalar values at all. Neither can you change an array. But the value of an array is just a pointer to some data that it contains, and you can change that data:
function canchange!(x)
println("Passed value: $x")
x[1] = -10
println("Local value: $x")
end
a = [1, 2, 3]
canchange!(a)
# Passed value: [1, 2, 3]
# Local value: [-10, 2, 3]
println(a)
[-10, 2, 3]Note the addition of the exclamation mark (or “bang”, for our American friends) to the name of the function. This is not a Julia requirement, just good manners. We inform the user of the code that one or parameter will be mutated. By convention, this will be the first parameter.
A function returns only one value. What then if you need to return more than one result? Simply return them as a tuple and unpack them at the call site:
function sqr_two(a, b)
return a^2, b^2 # The parentheses around the tuple are optional
end
a2, b2 = sqr_two(2, 3) # Unpack the returned tuple into the two variables, a2 and b2
println("a2 = $a2 and b2 = $b2."
# a2 = 4 and b2 = 9.Slurping and Splatting - Variable number of positional arguments
Julia uses the ... operator to slurp and splat values. Slurping means combining multiple scalar values into an array or tuple (slurping them up) and splatting means breaking an array or tuple into a number of scalar values. While this is sometimes used in other places, the most common application is to allow a variable number of parameters to be passed to a function.
function confused(a, b, c...)
println("a is $a")
println("b is $b")
println("c is $c")
end
confused(1, 2, 3, 4, 5)
# a is 1
# b is 2
# c is (3, 4, 5)Splatting allows an array to be passed to a function that expects a number of scalars:
function confused2(a, b, c)
println("a is $a")
println("b is $b")
println("c is $c")
end
x = [1, 2, 3]
confused2(x...)
# a is 1
# b is 2
# c is 3
x = [1, 2, 3, 4]
confused2(x...)
# ERROR: MethodError: no method matching confused2(::Int64, ::Int64, ::Int64, ::Int64)
# Closest candidates are:
# confused2(::Any, ::Any, ::Any)
# ...The last example results in an error, as there is no version of the function that accepts four parameters.
Type Stability and Fast Code
Being able to change the type of a variable can be useful, but it does also open the door to something that can cause slowdown in your code: type instability. This is when the type of x changes during the execution of your code, making many of the optimisations Julia could do, impossible. Instead, additional code is required to handle the type changes. Keep a look-out for something like this:
function myfunc(n)
sum = 0
for i in 1:n
sum += 1.5
end
return sum
endAt first glance, there is nothing strange about this code. If you take a closer look however, you will see that sum is created as an integer via sum = 0, but then we assign floating-point values to it.
Julia has a lot of code analyses tools. One of which is @code_warntype:
@code_warntype myfunc(5)
MethodInstance for myfunc(::Int64)
from myfunc(n) in Main at REPL[20]:1
Arguments
#self#::Core.Const(myfunc)
n::Int64
Locals
@_3::Union{Nothing, Tuple{Int64, Int64}}
sum::Union{Float64, Int64}
i::Int64
Body::Union{Float64, Int64}
1 ─ (sum = 0)
│ %2 = (1:n)::Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(1), Int64])
│ (@_3 = Base.iterate(%2))
│ %4 = (@_3 === nothing)::Bool
│ %5 = Base.not_int(%4)::Bool
└── goto #4 if not %5
2 ┄ %7 = @_3::Tuple{Int64, Int64}
│ (i = Core.getfield(%7, 1))
│ %9 = Core.getfield(%7, 2)::Int64
│ (sum = sum + 1.5)
│ (@_3 = Base.iterate(%2, %9))
│ %12 = (@_3 === nothing)::Bool
│ %13 = Base.not_int(%12)::Bool
└── goto #4 if not %13
3 ─ goto #2
4 ┄ return sumLike when we looked at the LLVM code generated for a function, this may seem intimidating, but the important bits are these:
sum::Union{Float64, Int64}and
Body::Union{Float64, Int64}In the REPL these are helpfully printed in red to draw your attention.
Julia indicates that the variable sum is not type stable. It is assigned both Int64 and Float64values. This limits the amount of optimisation that is allowed and ends up in slower code.
To learn more about type stability and many other useful tips to help you write faster code, refer to the Performance Tips section in the manual.
The main things to consider, if you want fast code are:
- The fastest code is code that you don’t need to execute. Better algorithms always trump more optimised code
- Avoid type instability in your code.
- Avoid unnecessary memory allocation by re-using arrays where possible.
Once you have these issues under control, learning to properly benchmark your code (using BenchmarkTools.jl) will allow you to fine tune the performance of your code. Writing fast code is mostly about not writing slow code.
Footnotes
An Nvidia Tesla A100 costs about US$7400 in mid-2023↩︎
A pointer is just a memory address - computer guys like their jargon.↩︎
See the previous comment on this - the terms are often used interchangeably↩︎
In the spirit of computer science (and technical fields in general) the least understandable option - kwargs - is also the most used one. Go figure.↩︎