Collections

The Rust standard library gives you several ways to store a collection of items that are all the same type. Some of them you'll use quite a lot, like Vec and HashMap, but there's a few more that have interesting features that can be useful in specific circumstances.

This chapter will mostly focus on the collections you'll use the most, but we'll cover all the other collections provided by the Rust standard library. Collections are commonly used with iterators, which allow you to step though lists one item at a time, but we'll cover those in the next chapter.

Arrays

Arrays are... technically not collections. Or at least, they are quite different to the other collections in one key way.

They're Sized!

While other collections can have elements added to or removed from them, Arrays are always the same size. This means that they can exist on the stack, which further means they can be Copy (so long as the type stored in the array is also Copy).

We can create arrays in two ways, either by defining a list of expressions, or by using an expression and a length:

fn main() {
let a1 = [0, 1, 2, 3, 4, 5]; // A list of values
let a2 = [1 - 1, 1 * 1, 1 + 1, 9 / 3]; // A list of expressions
let a3 = [1 + 2; 5]; // Fill an array with the result of a single expression, evaluated once

assert_eq!(a1, [0, 1, 2, 3, 4, 5]);
assert_eq!(a2, [0, 1, 2, 3]);
assert_eq!(a3, [3, 3, 3, 3, 3]);
}

You can access arrays using square brackets and the index of the entry you want to edit, however, if the index doesn't exist, your code will panic.

fn main() {
let arr = [0, 1, 2, 3, 4];
for i in 0..=5 {
    println!("{}", arr[i]);
}
}

That said, because Arrays are sized, Rust is smart enough to know that you've hardcoded an index out of bounds, so this won't even compile!

fn main() {
let arr = [0, 1, 2, 3, 4];
let _ = arr[5];
}

You can pass arrays into functions so long as you know the exact length of the array. If the contents of the array is Copy then the array will also be Copy and will be passed with Copy Semantics instead of Move Semantics.

fn demonstrate_some_mutation(mut arr: [usize; 5]) {
    // Because the array is passed using copy semantics, it doesn't matter if the original variable was mutable.
    // Also, because we know the array _must_ have five elements, it's safe to directly access elements we know exist
    arr[0] = 100;
    assert_eq!(arr, [100, 1, 1, 1, 1]);
}

fn main() {
// Reminder: Simply passing a1 to a function that takes an array of `usize`s is enough to infer the type 
let a1 = [1; 5];
assert_eq!(a1, [1, 1, 1, 1, 1]);

// Because the contents of the array is `Copy`, the array is `Copy`.
// This means it will use copy semantics when passed to the function
demonstrate_some_mutation(a1);

// So we can still use the original variable, _and_ it hasn't been mutated.
assert_eq!(a1, [1, 1, 1, 1, 1]);
}

Note: This won't compile because the a2 is the wrong size

fn demonstrate_some_mutation(mut arr: [usize; 5]) {
    // ...
    arr[0] = 100;
    assert_eq!(arr, [100, 1, 1, 1, 1]);
}
fn main() {
let a2 = [1; 4];
demonstrate_some_mutation(a1);
}

Slices

Obviously, passing exactly sized arrays around isn't particularly useful in most real world circumstances. So the first dynamically sized collection we should talk about is the "slice".

You can think of a slice as a view or window into a series of contiguous data. The fact that it's a view of some other type hopefully indicates to you that this is a reference type, i.e. &[T] or &mut [T] for mutable slices.

The simplest way to get a slice is to reference an array (though some other collections also allow you to take a slice).

fn main() {
let arr: [u32; 5] = [0, 1, 2, 3, 4];
let slice: &[u32] = &arr;

assert_eq!(slice, &[0, 1, 2, 3, 4]);
}

You can also use range notation, which we've discussed before, including open ranges, which we didn't discuss at the time. The way to think about this is, where X..Y:

if X is specified, the slice begins before the Xth element
if X is not specified, the slice begins at the start of the collection being sliced
if Y is specified, the slice ends before the Yth element
if Y is preceded by an equals, the slice ends after the Yth element
if Y is not specified, the slice ends at the end of the collection being sliced

#![allow(unused)]
fn main() {
let array: [u32; 5] = [0, 1, 2, 3, 4];

assert_eq!(&array[1..3],  &[1, 2]);       // X is specified, slice begins before the `X`
assert_eq!(&array[..3],   &[0, 1, 2]);    // no X, slice begins at the start
assert_eq!(&array[1..3],  &[1, 2]);       // Y is specified, slice ends before the `Y`
assert_eq!(&array[1..=3], &[1, 2, 3]);    // =Y , slice ends after the `Y`
assert_eq!(&array[1..],   &[1, 2, 3, 4]); // no Y, slice ends at the end
}

When using slices, you don't need ot specify their size, that information is encoded into the data at runtime, meaning you can work with slices of arbitrary size. Bear in mind though that they are references, so you may need to use lifetimes to keep track. We discussed lifetimes in the functions chapter, however as a quick reminder, if the reference is a kite, lifetimes are the string that tie it back to the owning data.

In the functions chapter we discussed a function for splitting a String which returned two &str or "string slice references". That's right, str is another, special, kind of slice. Below is that code again, and here's some things to note that will hopefully make a lot more sense after the last few chapters:

String implements Deref targeting str so we can get a string slice just by referencing String
The lifetime 'a (attached to yuki) enters split() through input and is tied to the return parameters left and right.
The same range notation is used to create the slices as above
In the "found_at" branch, open ranges are used to capture the beginning and end for left and right respectively
In the "else" branch, the completely open range creates a slice the full length of the collection, while the slice that starts at input.len() is a zero length slice that starts before the element that would be after the final element (i.e. it starts after the final element) and runs to the end (i.e. the same place as it starts).

fn split<'a>(input: &'a str, sub_string: &str) -> (&'a str, &'a str) {
    if let Some(found_at) = input.find(sub_string) {
      (&input[..found_at], &input[found_at + sub_string.len()..])
    } else {
      (&input[..], &input[input.len()..])
    }
}

fn main() {
let yuki = "yuki".to_string();
let sub_string = "uk".to_string();

let (left, right) = split(&yuki, &sub_string);

assert_eq!(yuki, "yuki");
assert_eq!(left, "y");
assert_eq!(right, "i");

// Here's some test code to make sure I'm telling you the right think
let yuki = "yuki".to_string();
let sub_string = "daniel".to_string();

let (left, right) = split(&yuki, &sub_string);

assert_eq!(yuki, "yuki");
assert_eq!(left, "yuki");
assert_eq!(right, "");
}

Hopefully code like this is starting to make a lot more sense!

Vectors

Vec (short for Vector) is similar to an array (and can be dereferenced as an array slice), but unlike array, Vec can grow in size. Vec is a generic type (Vec<T>) with no trait bound, meaning you use them with any type.

There are several ways to instantiate Vecs, and which way is best can vary depending on how you're going to use them.

The fact that Vecs are dynamically sized means they need to exist on the Heap, and so, your program, under the hood, will request an amount of heap memory. If your vector exceeds the amount of memory that is currently available to it, code inside the Vec type will automatically request a new, larger, portion of memory from the operating system, copy the current data into that new location, then free the memory it used to hold, all automatically without you needing to do anything. This process however is expensive, and you should do what you can to avoid it.

With that in mind, you should try to start with a vector large enough to contain as much as you think is reasonable, using the ::with_capacity(capacity: usize) constructor. This will construct an empty Vec with at least (but possibly not exactly) the capacity you requested. Note that capacity and length are not the same thing in Rust. You can get the number of items of data currently in the vector with .len() and its capacity with .capacity().

fn main() {
let example: Vec<i32> = Vec::with_capacity(10);
assert_eq!(example.len(), 0);
assert!(example.capacity() >= 10);
}

Note: There is no guarantee that capacity will be exactly what you asked for, only that it will be at least what you asked for. Additionally, if you start with a smaller Vec and need to add a lot of items to it, you can preempt thrashing the heap with the .reserve() method which works similarly to ::with_capacity() but can be used to increase the capacity after the Vec has been instantiated with minimal reallocations.

If you're not worried about the potential costs of resizing your vector, and you already have some data that you want to instantiate, you can use the vec! macro.

fn main() {
let example = vec![0, 1, 2, 3];
assert_eq!(example.len(), 4);
}

Usually you'll make Vectors mutable, and they provide a huge array of useful methods (pun intended), but here are some of the basics.

To add elements to the end of a vector we use the .push(t: T) method, and to remove them from the end of the vector we use the .pop() method which returns an Option<T>, since the vector may be empty.

fn main() {
let mut v = Vec::with_capacity(2);
v.push("Hello");
v.push("World!");

// v as it is now
assert_eq!(v, vec!["Hello", "World!"]);

// popping returns an option containing the last element of the vector (if there are no items the Option will be None)
assert_eq!(v.pop(), Some("World!"));

// popping an item from the vector modifies the vector so it no longer contains the last item
assert_eq!(v, vec!["Hello"]);
}

If you're used to arrays and vectors in other languages, you can index directly into an array in Rust in the same way that you can in other languages but you generally shouldn't. If you try to access an element out of bounds (eg, if you have 3 items in your vector, and try to access the fifth), your program will panic.

fn main() {
let v = vec!["Hello", "World!"];
assert_eq!(v[0], "Hello");
assert_eq!(v[1], "World!");
let _ = v[2]; // ❗️Panics❗️
}

Instead, Vec provides .get() and .get_mut() which allow return an Option containing either an immutable or mutable reference to an item inside the vector. This is much safer as the program will not halt if there is no item at the given index, you'll simply get a None.

fn main() {
let v = vec!["Hello", "World!"];
assert_eq!(v.get(0), Some(&"Hello"));
assert_eq!(v.get(1), Some(&"World!"));
assert_eq!(v.get(2), None);
}

.get_mut() will return a mutable reference to the element inside the Vec, but the way we use it... is a little weird:

fn main() {
let mut v = vec!["Hello".to_string()];
if let Some(hello) = v.get_mut(0) {
    assert_eq!(hello, &mut "Hello".to_string());
    hello.push_str(", World!");
}
assert_eq!(v, vec!["Hello, World!".to_string()]);
}

.get() and .get_mut() will also allow you to create an array slice if you give it a Range instead.

fn main() {
let mut v = vec![0, 1, 2, 3, 4, 5];

// Note the weird syntax as `get` returns an array slice, not an array
assert_eq!(v.get(2..), Some(&[2, 3, 4, 5][..])); 
assert_eq!(v.get(..2), Some(&[0, 1][..]));
assert_eq!(v.get(6..), Some(&[][..]));

// You can even edit values inside the returned slice
if let Some(mut inner) = v.get_mut(2..) {
    inner[0] += 10; // Be careful, this is actually element 2!
};

assert_eq!(v, vec![0, 1, 12, 3, 4, 5]);
}

Finally, we can also create slice using square brackets and this actually works without panicking:

fn main() {
let v = vec![0, 1, 2, 3, 4, 5];

// This would panic:
// assert_eq!(v[6], 0);
// But this doesn't:
assert_eq!(v[6..], []);
}

A note on ownership

When you put a variable into a Vec, or any other collection, unless that variable is copy you are moving ownership into the collection. Using methods like get will give you a reference to the data, but the only way to get ownership back is to either clone in (and take the potential memory and runtime hit), or to remove to use a method that removes the element from the collection, like pop in Vec. We'll discuss similar methods for other collections as we go.

VecDequeue

VecDeque is very similar to Vec however, where in Vec you can only add and remove items from the end, VecDeque also allows you to add and remove items to and from the front!

use std::collections::VecDeque;

fn main() {
let mut v = VecDeque::from([0, 1, 2, 3, 4, 5]);

v.push_back(6);
v.push_front(-1);

assert_eq!(v, [-1, 0, 1, 2, 3, 4, 5, 6]);
assert_eq!(v.pop_front(), Some(-1));
assert_eq!(v.pop_front(), Some(0));
assert_eq!(v.pop_back(), Some(6));
assert_eq!(v, [1, 2, 3, 4, 5]);
}

Linked Lists

It's very rare to actually need a full LinkedList, and for performance reasons, you should try to avoid them where possible. Vec and VecDeque will almost always beat LinkedList in both speed and memory efficiency if all you want to do is add items to the end of a list (or, in the case of VecDeque to the front).

Where LinkedLists are useful though, is when splitting and merging your collection is a core feature you will be heavily reliant on.

fn main() {
use std::collections::LinkedList;

let mut list = LinkedList::new();
list.push_back(1);
list.push_back(3); // ohps! forgot 2!
list.push_back(4);
list.push_back(5);

// This gets us a vec use for comparison.
// Don't worry about this syntax yet, we'll explain it in the next chapter!
let v: Vec<_> = list.iter().copied().collect();
assert_eq!(v, &[1, 3, 4, 5]);

// We can inject the missing number like this
let mut right = list.split_off(1);
list.push_back(2);
list.append(&mut right);

let v: Vec<_> = list.iter().copied().collect();
assert_eq!(v, &[1, 2, 3, 4, 5]);

// Weirdly, the append method doesn't consume the other linked list but it does empty it
// This might be useful if you are juggling values linked lists that you want to keep ownership of 
assert_eq!(right.iter().copied().collect::<Vec<_>>(), &[]);
}

BinaryHeap

BinaryHeaps allow you to add items to a heap in any order, but the first item off the heap is always the largest item according to Ord.

use std::collections::BinaryHeap;

fn main() {
let mut heap = BinaryHeap::new();

heap.push("A".to_string());
heap.push("C".to_string());
heap.push("B".to_string());

assert_eq!(heap.pop(), Some("C".to_string()));
assert_eq!(heap.pop(), Some("B".to_string()));
assert_eq!(heap.pop(), Some("A".to_string()));
assert_eq!(heap.pop(), None);
}

The obvious limitation here though is, what do you do if you need to know the smallest value in the stack?

In the standard library there's a cool little newtype that can wrap other types and inverts their ordering:

use std::collections::BinaryHeap;
use std::cmp::Reverse;

fn main() {
let mut heap = BinaryHeap::new();

heap.push(Reverse("A".to_string()));
heap.push(Reverse("C".to_string()));
heap.push(Reverse("B".to_string()));

// Bear in mind that the Reverse type is part of what is stored
assert_eq!(heap.pop(), Some(Reverse("A".to_string())));
// Though the inner field is public
assert_eq!(heap.pop().expect("heap was empty").0, "B".to_string());
}

HashMap

A HashMap is a key value lookup table. The key can be a value of any type, so long as that type implements the Hash trait (see the previous chapter). Hashing the key results in a u64 that is used to create the lookup table. There's more details on how hashing works in the official book, including how to create a HashMap with a different hashing algorithm, but that's beyond the scope of IRISS.

Similar to Vecs, HashMaps can be initialised in a few ways, the main three you're likely to use are:

fn main() {
use std::collections::HashMap;

// Create an empty hashmap with some arbitrary capacity
let mut hashmap: HashMap<String, String> = HashMap::new();

// Create a hashmap with _at least_ this capacity (helps prevent reallocation if you
// know the largest your hashmap will likely be)
let mut hashmap_with_capcity = HashMap::with_capacity(1);

// You usually won't have to specifically type the HashMap so long as Rust can infer
// the types by what you're inserting into it.
hashmap_with_capcity.insert(
    "Key".to_string(), // Can be anything that implements Hash
    "Value".to_string(), // Can be anything
);

// Create a hashmap with initial values from an array of tuples (K, V) where K: Hash
let mut hashmap_from_array = HashMap::from([
    ("String is Hash".to_string(), "First value".to_string()),
    ("Another Key".to_string(), "Another value".to_string()),
]);
}

To access data you've stored in your hashmap, there's a few handy methods:

.get(key: &K) and .get_mut(key: &K) will get references to data if it exists using Options

use std::collections::HashMap;

fn main() {
let mut map = HashMap::from([
    ("Key".to_string(), "Value".to_string()),
]);

assert_eq!(map.get("Key"), Some(&"Value".to_string()));
assert_eq!(map.get("Not a Key"), None);

if let Some(mut value) = map.get_mut("Key") {
    value.push_str(" Changed");
}

assert_eq!(map.get("Key"), Some(&"Value Changed".to_string()));
}

.entry(key: &K) returns a special Entry enum that allows you to modify and existing value if it exists, or insert a value if it doesn't

use std::collections::HashMap;

fn main() {
let mut map = HashMap::from([
    ("Existing Key".to_string(), "Value".to_string()),
]);

map.entry("Existing Key".to_string())
    .and_modify(|value| value.push_str(" Changed"))
    .or_insert("Inserted Value".to_string());

map.entry("Nonexistent Key".to_string())
    .and_modify(|value| value.push_str(" Changed"))
    .or_insert("Inserted Value".to_string());

assert_eq!(map.get("Existing Key"), Some(&"Value Changed".to_string()));
assert_eq!(map.get("Nonexistent Key"), Some(&"Inserted Value".to_string()));
}

.remove(key: &K) takes a value out of the HashMap (if it exists), allowing you to take ownership of it, and .remove_entry(key: &K) can be used to gain ownership of both the value and the key as you remove it from the map

use std::collections::HashMap;

fn main() {
let key = "Key".to_string();
let value = "Value".to_string();
// At this point we own these 👆🏻values
 
let mut map = HashMap::with_capacity(1);

map.insert(key, value);
// Here 👆🏻we move ownership into the hashmap
// So this will no longer work:
// println!("{key}, {value}");

// We can recover both the key and the value using `.remove_entry()`
let (recovered_key, recovered_value) = map.remove_entry("Key")
    .expect("key not found");

assert_eq!(&recovered_key, "Key");
assert_eq!(&recovered_value, "Value");
println!("Found {recovered_key}, {recovered_value}");

// Obviously the key abd value will no longer be part of the HashMap
assert_eq!(map.get("Key"), None);
}

BTreeMap

BTreeMap is a Binary Search Tree version of HashMap. For storing arbitrary data it's a touch slower than HashMap, but it internally sorts keys so that you can easily get the values at the largest and smallest keys, a little bit like a VecDeque:

use std::collections::BTreeMap;

fn main() {
let mut map = BTreeMap::from([
    ("C Key".to_string(), "Value 1".to_string()),   
    ("A Key".to_string(), "Value 2".to_string()),   
    ("D Key".to_string(), "Value 3".to_string()),   
    ("B Key".to_string(), "Value 4".to_string()),   
]);

// Get references to the first or last key/values according to Ord
assert_eq!(map.first_key_value(), Some((&"A Key".to_string(), &"Value 2".to_string())));
assert_eq!(map.last_key_value(), Some((&"D Key".to_string(), &"Value 3".to_string())));

// There are also methods that return `Entry`s so you can insert of modify as necessary.
map.first_entry().expect("Map had no entries").into_mut().push_str(" Modified First");
map.last_entry().expect("Map had no entries").into_mut().push_str(" Modified Last");

// Finally you can pop from the "front" (first) and "back" (last) of a BTreeMap
assert_eq!(map.pop_first(), Some(("A Key".to_string(), "Value 2 Modified First".to_string())));
assert_eq!(map.pop_last(), Some(("D Key".to_string(), "Value 3 Modified Last".to_string())));
assert_eq!(map.pop_first(), Some(("B Key".to_string(), "Value 4".to_string())));
assert_eq!(map.pop_last(), Some(("C Key".to_string(), "Value 1".to_string())));
}

Sets

There are two Set types in Rust that allow you to store values with no duplicates, HashSet and BTreeSet. These are implemented using HashMap<K, ()> and BTreeMap<K, ()>, though they "fix" some of the issues you might run in to if you were to naively do this yourself.

For example, .insert(T) only takes a single value, and methods like .get(K) return an Option with only one value.

The differences between HashSet and BTreeSet are the same as between HashMap and BTreeMap, including BTreeSet allowing easy access to the "first" and "last". Furthermore, when you turn BTreeSet into an Iterator, it will be in order!

We'll talk more about Iterators in the next chapter.

Idiomatic Rust in Simple Steps