This question is a very deep question that cannot be answered with a few lines of text. Still, let me try to do so.
If we really want to see the "true definition" of continuous functions, we would have to look at topological spaces which are the mathematics of neighborhoods. That defines also open and closed sets in a very general way, and based on that, a function f : X → Y between topological spaces is called continuous if for every x in X and every neighbourhood N of f(x) there is a neighbourhood M of x such that f(M) ⊆ N.
Okay, that far that good. Now, there are specializations of this. One is given by metric spaces, which is the mathematics of distances. Every metric space induces a topology by defining the epsilon-sphere around an element as its neighborhood, and therefore also continuous functions. Actually, one should now prove the usual epsilon-delta definition used to define continuous functions as a theorem that holds for the original definition of continuous functions of the induced topological space. This is what you are considering above.
Third, there are further specializations of topological spaces, and the one we are considering is the one of partial orders which induce a topology called the Scott topology. Again, taking over the original definition of continuous functions would allow us to prove what we used as a definition of continuous functions in the partial order.
So, your confusion arises from mixing up the different specializations. The theorem that every continuous function is also monotonic holds for the partial orders, but not for the metric spaces where you can also define monotonic functions. Still, both notions of continuity have the same common root in topology, and are therefore artifacts derived from the same source. In a metric space, people also study contracting functions which is closer to a comparison.
There are many further relationships. For example, the fixpoint theorems we considered on the partial orders have analoges by the Banach fix point theorems in metric spaces. For models of computations used by modeling languages in embedded systems, both are in use.
Looking at further details here (which I did some time ago and made proofs between all these facts) quickly fills 30 pages. But that doesn't help for VRS, so I decided to not include it in the slides.