Memory and data structures

I will give some examples in C# in this section, but you should be able to understand what is going on if even if you don't know C#.

Consider the following piece of code:

void ValueType() {
    int i = 1;
    int j = i;
    i = 2;
    Console.WriteLine(j);
}

The statement int j = i makes sure that a new place in memory is reserved to hold the value of the variable b. The value of i is copied to this new place in memory. When we write i = 2, the value of i is changed. Because i and j do not share any memory, the value of j is unchanged and Console.WriteLine(j) will print "1".

This contrasts with the following piec of code, still in C#:

class A {
    int X;
}

void ReferenceType() {
    A a = new A{
        X = 1
    };
    A b = a;
    a.X = 2;
    Console.WriteLine(b.X);
}

When this function is executed, the output will be "2" instead of "1". Similar behaviour can be observed in other languages, like Java, Python, JavaScript etc. This is confusing at first, so I will explain in detail what is going on. To explain why we get a different output, let us step trough the example.

The statement

A a = new A {
    X = 1
};

creates a new variable named a and assigns a new value of type A to it. The new value has its field X initialised to 1. But under the hood, a lot more is going on. To store the value, a piece of memory is allocated to store the values of all the fields (in this case only the field X) plus some data that is needed for polymorphism. The memory address of this piece of memory is then stored in a. Let me rephrase that: a only contains no actual data, but only the memory address of the region in memory where the data is stored. The assignment A b = a; creates a new variable named b and stores the same memory address that a already has in this new variable b. The assignment a.X = 2; essentially says the following:

Get a memory address out of the variable b. This memory address marks the first byte of a region in memory.
Assign 2 to the value of the field X that can be found in that region.

The next line, Console.WriteLine(b.X);, then essentially says the following:

Get a memory address out of the variable b. This memory address marks the first byte of a region in memory.
Get the value of the field X that can be found in that region.
Print out that value.

Now because a and b hold the same memory address, they essentially still represent the same object and "2" is printed out.

I think we can agree that there is a big difference between the first example and the second. The difference is that, in C#, int is a value type and the class A is a reference type. A value type is a data type for which the variables store the data directly. A reference type is a data type that contains the memory address of the region in memory where the data is stored. Most programming languages have both value types and reference types. Primitive data types, like integers, booleans, floating numbers, are most often value types and classes are reference types. In Java, int is a value type and Integer is a reference type (and similar for some other primitive types). Strings are tricky business in any language (at least in my opinion).

Most languages need to distinguish value types from reference types because the reference (taking the memory address or following a memory address) is implicit. In Rust, references are (nearly) always explicit. For this reason, one rarely talks about value types and reference types in Rust.

Let us "translate" the first example from C# to Rust:

fn value_type() {
    let i : i32 = 1;
    let j : i32 = i;
    i = 2;
    println!("{}", j);
}

This is very close to the C# example, in particular, the value of i is copied to the value of j.

structs in Rust are more similar to structs in C# or D than to classes in other languages. So the following example is only seemingly similar to the second example in C# given above. In fact, the output is completely different.

#[derive(Clone, Copy)]
struct A {
    x: i32
}

fn still_value_type() {
    let mut a : A = A {x : 1};
    let b : A = a;
    a.x = 2;
    println!("{}", b.x);
}

There is not a single & to be found in this code snipplet and it is very self-containing, so you can be pretty sure there is no reference anywhere in the code you see. And indeed, when the function is called, the whole value of a is copied to b (which is even made obvious with the #[derive(Clone, Copy)] notation) and hence a and b represent completely different entities.

If we were to try to mimick the C# example, we would have something like the following:

#[derive(Clone, Copy)]
struct A {
    x: i32
}

fn try_to_immitate_reference_type() {
    let a : &mut A = &mut A {x : 1};
    let b : &mut A = a;
    a.x = 2;
    println!("{}", b.x);
}

See the &s in the code? This indicates that there is a reference. The problem with this code is that it does not even compile. Why? Let us look again at the second example in C#:

    // ...
    a.X = 2;                (1)
    Console.WriteLine(b.X); (2)

You need more context to know whether line (1) affects the output on line (2). In the simple example, the context was just a few lines above, but in more complicated code and especially with more complicated data structures, it can become hard to know where exactly data is being modified.

TODO: Explain that in Rust, containers like Box, Vec, HashMap etc. store whatever they contain on the heap, but that a plain struct, enum, tuple, array, closure is on the stack (unless it is contained in a container).

TODO: explain what a 'value' is.

Resource management in Rust

Values and references

Value types and reference types in Rust

Data structures and the stack and the heap in Rust