Under the hood: Vec<T>
Let's look at Vec<T> to get a better understanding of its inner workings.
A Conspiracy!
As I was reading the Rust API documentation for std::vec::Vec, something interesting caught my eye in the Vec
struct definition.
pub struct Vec<T, A = Global>
where
A: Allocator,
{ /* private fields */ }
I am looking at you { /* private fields */ }
! "What are you trying to hide from me?" I thought. Was I being sucked into a grand conspiracy? The documentation gives no hints as to what these fields are other than giving us a visual representation of the data structure:
ptr len capacity
+--------+--------+--------+
| 0x0123 | 2 | 4 |
+--------+--------+--------+
|
v
Heap +--------+--------+--------+--------+
| 'a' | 'b' | uninit | uninit |
+--------+--------+--------+--------+
Surely we will find that our Vec
has three fields ptr
len
and capacity
. But to be sure we have to go straight to the source. Are you ready to come with me down the rabbit hole and see if we can uncover an age-old mystery?
The many faces of Vec<T>
Diving into the struct definition of Vec
in std::vec
this is what we find:
pub struct Vec<T, A: Allocator = Global> {
buf: RawVec<T, A>,
len: usize,
}
NOTE
We will be ignoring the Allocator
type entirely. This is a topic worth of its own article.
Yay, we have len
! Okay... that was easy. Now we only need ptr
and capacity
. We might be home very early, right?
No, not really!
"What is this misterious RawVec<T, A>
?" you rightly ask yourself and where the hell is the ptr
and the capacity
? Well, let's follow the breadcrumbs!
If we type RawVec
into the search field of the Rust API documentation we find... nothing!?
I knew it! They really are trying to hide something from us!
Okay, okay... stay calm, don't stress it! Let's take a deep breath and look at the source code:
pub(crate) struct RawVec<T, A: Allocator = Global> {
inner: RawVecInner<A>,
_marker: PhantomData<T>,
}
Ah, so that's why we can't find it in the documentation, it is only public within its crate pub(crate)
and not accessible from outside. Good, one mistery solved but what in the world is the RawVecInner<A>
type now and what is PhantomData<T>
1!? How deep does this rabbit hole go?
Looking at RawVecInner<A>
we get a clearer picture:
struct RawVecInner<A: Allocator = Global> {
ptr: Unique<u8>,
cap: Cap,
alloc: A,
}
Haha, no, we don't... Well at least somewhat. We finally found our lost ptr
and cap
acity! But both of them are defined by new types. We're three layers deep now, with no end in sight. But we've come this far we're not stopping now, are we?
NOTE
Cap
is just a type which manages its min and max bounds so we won't go deep into this one.
type Cap = core::num::niche_types::UsizeNoHighBit;
and
pub struct UsizeNoHighBit(usize as usize in 0..=0x7fff_ffff_ffff_ffff);
So what is Unique<u8>
?
pub struct Unique<T: PointeeSized> {
pointer: NonNull<T>,
_marker: PhantomData<T>,
}
No surprises here, just another wrapper Type NonNull<T>
.
pub struct NonNull<T: PointeeSized> {
pointer: *const T,
}
Wait?! Are we done? I think we are! Hallelujah, we now have a broad overview of the whole Vec
stack! Let's try to unravel it's secrets, shall we?
Understanding Vecs layers
Our journey looked something like this:
Vec<T>
holds a...RawVec<T>
which holds a...RawVecInner
which holds a...Unique<u8>
which holds a...NonNull<u8>
which holds a...*const u8
(a raw pointer)
Phew, a lot of abstractions. But what does this tell us? To understand why the engineering team behind the standard library chose to go this route we first need to learn each layers purpose. We will start at the bottom and climb our way up until we reach the top of our Vec<Mountain>
!
Looking at the documentation of each type we can summarize their purpose in easy language the following way:
*const u8: This is the foundation. A simple, unsafe raw pointer to some memory.
NonNull<u8>: This wraps the raw pointer and adds a crucial guarantee: the pointer is never null. This is the first step in building a safe abstraction.
Unique<u8>: This layer adds the concept of ownership. It tells the compiler that we exclusively own the memory the pointer points to. This is very important for Rust's memory safety rules (like making sure data is properly dropped).
RawVec<T> & RawVecInner: These two are responsible for managing the memory. They handle allocations, deallocations, growing, and shrinking the block of memory on the heap.
RawVec
only knows about capacity (the total allocated space) and not length (how many elements are initialized). This makes it a perfect reusable type for other collections likeVecDeque<T>
!Vec<T>: This user facing layer adds the concept of length. It knows which elements are initialized and provides all the safe methods we know for inserting, removing, and accessing elements.
NOTE
That *Inner
type is a clever compile-time optimization. Since RawVecInner
is not generic overT
, its code won't get duplicated for every single type you use in a Vec
, which speeds up compilation.
Each level builds upon the last by adding new guarantees and responsibilities until we have a completely safe and powerful data structure.
An Ordinary Vec<Life>
Seems like my life remains somewhat uneventful. There is no big conspiracy and no hidden truth to be found... just very good engineering.
While we were digging deep to learn what Vec<T>
's private fields are, we uncovered something very interesting: good API design. We saw how Rust's engineers built one of its most common types from the ground up, starting with an unsafe pointer and carefully wrapping it, layer by layer, until it's perfectly safe and ergonomic.
Every layer is crucial for Vec
's contract to be fulfilled. It’s a testament to the power of abstraction and separation of concerns. The "conspiracy" was about managing complexity, allowing each piece of the puzzle to do one thing and do it well. And in doing so, it provides reusable types that power other parts of the standard library.
Maybe next time, when you push an element to a vector, you'll know about the stack of abstractions working under the hood to make it all happen safely and efficiently. And that, in itself, is a pretty cool secret to have uncovered.
What's next?
Uncovering Vec
's inner workings helped me a lot to understand a little more what happens behind the scene in the standard library.
Unfortunately we only touched on a fraction of the complex structure that forms Vec
(but I hope enough to have a better understanding). Explaining every little bit in detail would be too much for this post, so maybe there will be more Under the hood articles which explain Rust's inner life even better since obvious questions already arise while reading this article: "How does NonNull
guarantee its promise?" or "Why do we need Unique
?"
Each type we learned about in this article is worth its own post, including the ones we only briefly mentioned (Allocator
, PhantomData
), so I encourage everyone check these types out for themselves. Rust's documentation is phenomenal and with modern IDE features like go to definition you can easily jump back and forth until you get a clearer picture of the parts that form the public Rust API.
Footnotes
explaining
PhantomData<T>
is out of scope for this article. But we might look at it in another one. ↩