We present a practical and stable algorithm for the parallel refinement of tetrahedral meshes. The algorithm is based on the refinement of terminal-edges and associated terminal stars. A terminal-edge is a special edge in the mesh which is the longest edge of every element that shares such an edge, while the elements that share a terminal-edge form a terminal star. We prove that the algorithm is inherently decoupled and thus scalable. Our experimental data show that we have a stable implementation able to deal with hundreds of millions of tetrahedra and whose speed is in between one and two order of magnitude higher from the method and implementation we presented (Rivara et al., Proceedings 13th international meshing roundtable, 2004)