The Dwarfs and The Fast Marking Algorithm

tptacek · on July 11, 2010

Short summary: instead of relying on initialization to a known value, ignore the initial values of the array and rely instead on a consistency constraint (data may be random, but won't be random and satisfy the constraint) that you can set and check. A neat trick you can probably use in other places as well.

shasta · on July 11, 2010

Anyone know the name of this algorithm? There is a similarly solved classic problem in complexity theory: Given an O(m) list of O(n)-bounded integers, determine if there are duplicates in O(m) time and O(n) space.

ori_b · on July 11, 2010

I don't know if it has a name (it seems to simply be referred to as "an efficient sparse set representation"), but I think I initially remember seeing it introduced in "Compilers: Principles, Techniques, and Tools" in an exercise at some point in the past.

A good description of it is here: http://research.swtch.com/2008/03/using-uninitialized-memory...

duskwuff · on July 11, 2010

This seems trivially incorrect. If both arrays are randomized at the outset, there is nothing to prevent them from containing valid data indicating a state having objects marked. In fact, if you free one set of arrays used by this structure and immediately allocate another, there's a good chance that your "random" initial state will, in fact, be the exact one you just freed.

jaspervdj · on July 11, 2010

The trick is in the counter variable. Read the update section, and the source code (it's not much, really).

sesqu · on July 11, 2010

Right, you neglected to mention the use for the counter in your prose.

RiderOfGiraffes · on July 11, 2010

I'm often left confused by people who downvote me without leaving a correction or counter-argument, so I guess I ought to explain why I've downvoted you here.

Who is the "you" in your sentence? The author of the comment to which you reply? They never mention the counter. The author of the original article? You're not replying to them.

And the counter isn't mentioned in the original prose, but it's clearly mentioned in the description of the algorithm. Surely that's reasonable. It's a single variable, hence constant to allocate in both space and time.

So your comment seems to make an unreasonable criticism that doesn't add value, and that's why I've downvoted it.

sesqu · on July 11, 2010

I was referring to jaspervdj, who is both the author of the comment to which I reply and the author of the original article (well, I presume).

The article states: "an element in array can, by chance, point to an element in marks, but this won’t matter since the element in marks won’t point back, and so we can determine it’s fake." This sentence is missing the part where the counter is made use of, and as such is erroneous. The mentioned element can point back, and its index must be compared with the counter to ascertain whether it is valid.

Although that is a small omission, it is in no way trivial, and probably the source of all the confused comments the posting received.

RiderOfGiraffes · on July 11, 2010

OK, I missed the coincidence (by which I mean they are identical, not that it's by chance) or the names of the authors. That clears up my confusion on that point.

Regarding the lack of mention of the counter in the section of the description you mention, I thought it was obvious. In the section before he says: we keep the number of marks set. That means, to me, that if the element in array accidentally/incidentally points to an element in marks that accidentally/incidentally points back, it's not counted as pointing back because it's known (by using the counter) not to be valid.

It is subtle, I agree, but for me the explanation is clear enough already. Adding more detail could simply overwhelm the reader with words, without letting the reader see the wood for the trees.

I agree, now that you've expanded it, that your point is valid, but I, personally, prefer the version as it is.

In this I expect we should simply agree to disagree. One person's "necessary detail" is another's "pointless nit-picking."

nudded · on July 11, 2010

just to chime in, jaspervdj is the commenter, the submitter, the website is calles jaspervdj.be so it is reasonable to guess he is the author (he is btw).

duskwuff · on July 12, 2010

Fair enough, that part wasn't entirely obvious from the description.