Gesture is decided by first motion of fingers. But in many cases a blended approach can also be used. Means on X-axis motion doing rotation and Y-axis motion doing Pinch.
There is nothing more to write because other methods can use filtering and algorithm to smartly differentiate gestures. Like managing a Rectangular band over the two touch points and calculating the motion of fingers strictly inside rectangular boundaries.
We know, if we don't use first motion, then there will be delay in gesture recognition.