kref is appropriate when the object doesn't outlive its last external
reference." What we want is to know when all "external" references to an object
are gone. At which point we can do something (cleanup, free, etc) and be sure
-that no other code can get a reference. This is similar to what we used to do
-with proc_decref() and a handful of other custom refcounting mechanisms. We
-want to do this without locks, if possible, so we make use of a variety of
-helpful atomics (though RAMP uses a lock-per-atomic).
+that no other code can get a reference. We want to do this without locks, if
+possible, so we make use of a variety of helpful atomics (though RAMP uses a
+lock-per-atomic).
This document will ramble about the details and other stuff.
Here are things to keep in mind, which differ in some of the use cases:
protect that *internal* reference, you often need to lock or otherwise sync
around the source of that internal reference (usually a list-like structure
(LLS)), unless that ref is otherwise protected from concurrent freeing.
-4. If you plan to reincarnate an object (make its refcnt go from 0 to 1), you
+4. If you plan to resurrect an object (make its refcnt go from 0 to 1), you
need to use some other form of sync between the final freeing and the
-reincarnation.
+resurrection.
We differ from Linux in a few of ways. We have a kref_get_not_zero(), which
they talk about in some of their documents, to be a bit more clear about getting
to cut down on the number of places the function pointer is writen, since I
expect those to change a lot as subsystems use krefs. Finally, we only use
krefs to trigger an object's release function, which might not free them
-forever. They can be "reincarnated" back to having an external reference. You
+forever. They can be "resurrected" back to having an external reference. You
can do similar things in Linux, but it's not clear (to me) how separate that
idea is from a kref.
split the release between what kref_put() does automatically with whatever else
you want done.
-What about an 'internal' kref? Much like with the pid2proc hash, if we refcnt
-the refs on lists, we still have the same issue of syncing between a list reader
-and a list writer. The reader needs to atomically read and kref_get (not_zero
-or otherwise). Otherwise, someone can remove, put, free, and do whatever to the
-item between the read and the kref_get (in theory).
+What about an 'internal' kref? If we refcnt the refs on lists, we still have
+the same issue of syncing between a list reader and a list writer. The reader
+needs to atomically read and kref_get (not_zero or otherwise). Otherwise,
+someone can remove, put, free, and do whatever to the item between the read and
+the kref_get (in theory).
The reason for this is the same reason we have trouble with lists and internal
references in the first place: both the list reader and the writer are sharing
regardless of whether the internal reference is using another kref or if it uses
some other scheme (like lock the object and check its state).
-Reincarnating an object after it hit 0
+Resurrecting an object after it hit 0
----------------------------
So when we hit 0, we might not be completely done with the object. This is part
of the "kcref" (c == cached) design pattern the Linux guys talk about. The main
original object. The kref refcount only helps when the refcount goes from 1 to
0, and on triggering the followup/release action.
-For now, we'll do the same thing in those situations, but may do something else
-that is similar (spinlock and seq-ctr per object), but first more digressions...
+To resurrect, we can't just do:
+ if (!kref_refcnt(&dentry->d_kref))
+ kref_init(&dentry->d_kref, dentry_release, 1);
+ else
+ kref_get(&dentry->d_kref, 1);
+
+There is a race on reading the refcnt and mucking with it. If it is
+concurrently going from 1 -> 0, and we read 0, it is okay. We still up it to 1.
+However, if we go from 1 -> 0 and read 1, we'll panic when we try to kref_get a
+0'd kref. Also, doing this would be dangerous going from 0 -> 1 if other code
+would resurrect (which it does not!). The solution is to use a kref_get that
+doesn't care about 0 (__kref_get()).
+
+Elsewhere in this documentation, we talk about kref_get_not_zero(). That one
+will try and fail gracefully (used by pid2proc()). kref_get() will fail on
+zero. __kref_get() will not fail on zero and will blindly increment, which is
+what we want.
Trickiness with lockless data structures:
----------------------------
Ideally, we want concurrent access to the dentry cache (or whatever cache has
-krefd objects that we want to reincarnate). Perhaps this is with CAS on linked
-lists, or locks per hash bucket. Since we need to prevent the reincarnation of
+krefd objects that we want to resurrect). Perhaps this is with CAS on linked
+lists, or locks per hash bucket. Since we need to prevent the resurrection of
objects from proceeding while another thread could be trying to remove them, we
need some sync between readers and writers. Both threads in the scenario we've
been talking about are going to be writers for the object, though from the
Process management is a bit different, since it does not want to destroy or free
you until there was some explicit action (calling proc_destroy()). We still use
krefs, since we don't know who is the "last one out" to do the freeing, so we
-layer proc_destroy() on top of kref/__proc_free(). This makes it easier for us
-to hack something together that works with the pid2proc hash. Specifically, we
-can kref items in that LLS, as well as the runnable_list. We know when to take
-them out. This is similar to the dentry, except that when there is no other
-reference than the LLS, we don't want to do anything in particular (at least not
-yet). The dentry needs to have a lot of state changed, and maybe freed.
+layer proc_destroy() on top of kref/__proc_free(). This is why we have the "one
+ref to keep the object alive." For a little while, this ref was stored in the
+pid2proc hash, but that is now holds an internal reference (we have the tech, it
+keeps things in sync with other usage models, and it makes proc_destroy and
+sys_trywait easier). Note the runnable_list has external references, in part
+because it is a different subsystem (scheduler).
Remember: the reason for why we have trouble with lists and (internal)
references: both the list reader and the writer are sharing the same
grace periods on slab-freed pages.
Kreffing works because we have a known, synchronous initialization point where
-we kref_init() the refcount to 1. We can't do that easily when reincarnating or
+we kref_init() the refcount to 1. We can't do that easily when resurrecting or
even kref_get_not_zero() because another thread may be trying to permanently
free the object.
-The difference between kref_get_not_zero() and reincarnation is that
+The difference between kref_get_not_zero() and resurrection is that
get_not_zero() is trying to get an external reference and the object's release
-method has not been called. Reincarnation is when we've already hit 0,
+method has not been called (or it has, and we should fail since our source is a
+weak/internal reference source). Resurrection is when we've already hit 0,
release()d, and now want to reuse that object. For concrete examples:
get_not_zero() works when getting a file off the superblock file list - it only
should work if the file is still in the system and not about to be removed from
-the list. Reincarnation is for objects that don't get freed when they lose
+the list. Resurrection is for objects that don't get freed when they lose
their external references, such as dentries (they get put on the dentry cache).
On a cache hit for an unreferenced dentry, we'll need to change its state and
reinit its refcount. Next time it is kref_put()d to 0, it'll rerun its