iCub-main
The Gaze Interface
Author
Ugo Pattacini

Introduction

The YARP Gaze Interface provides an abstract layer to control the iCub gaze in a bio-plausible way, moving the neck and the eyes independently and performing saccades, pursuit, vergence, OCR (oculo-collic reflex), VOR (vestibulo-ocular reflex) and gaze stabilization relying on inertial data.

Essentially, the interface exports all the functionalities given by the iKinGazeCtrl module directly from within the code without having to be concerned with the communication protocol done via YARP ports. Thus, the user is warmly invited to visit the iKinGazeCtrl page where more detailed descriptions can be found.

Note
If you're going to use this controller for your work, please quote it within any resulting publication: Roncone A., Pattacini U., Metta G. & Natale L., "A Cartesian 6-DoF Gaze Controller for Humanoid Robots", Proceedings of Robotics: Science and Systems, Ann Arbor, MI, June 18-22, 2016.

Dependencies

In order to use the Gaze Interface, make sure that the following steps are done:

  1. Install the required software dependencies.
  2. Compile the iCub repository with the switch ENABLE_icubmod_gazecontrollerclient enabled: this will make the client part of the interface available. The server part is represented by the module iKinGazeCtrl itself.

Running the Gaze Server

Launch:

iKinGazeCtrl

The server will load its parameters from the default configuration file (which can be overridden by the context and the from command-line options) containing information on the camera intrinsic parameters which are required to use some of the methods provided by the Gaze Interface, and specifically the lookAtMonoPixel() and lookAtStereoPixel() method.
Furthermore, by using the installed copy of $ICUB_ROOT/main/app/iCubStartup/scripts/iCubStartup.xml application the user is also able to launch the gaze server contextually with the iCubInterface module.

Opening and Closing the Gaze Interface

The Gaze Interface can be opened as a normal YARP interface resorting to the PolyDriver class:

Property option;
option.put("device","gazecontrollerclient");
option.put("remote","/iKinGazeCtrl");
option.put("local","/client/gaze");
PolyDriver clientGazeCtrl(option);
IGazeControl *igaze=NULL;
if (clientGazeCtrl.isValid()) {
clientGazeCtrl.view(igaze);
}

When you are done with controlling the robot you can explicitly close the device (or just let the destructor do it for you):

clientGazeCtrl.close();

Using the Gaze Interface

The YARP documentation contains a full description of Gaze Interface. It makes use of three different coordinate systems that enable the user to control the robot's gaze as detailed hereafter.

Expressing the fixation point in Cartesian Coordinates

This coordinate system is the same as the one the Cartesian Interface relies on and complies with the documentation; it lets the user to give the target location where to gaze at with respect to the root reference frame attached to the robot's waist.

The following snippet of code shows how to command the gaze in the Cartesian space and to retrieve the current configuration.

Vector fp(3);
fp[0]=-0.50; // x-component [m]
fp[1]=+0.00; // y-component [m]
fp[2]=+0.35; // z-component [m]
igaze->lookAtFixationPointSync(fp); // request to gaze at the desired fixation point and wait for reply (sync method)
igaze->waitMotionDone(); // wait until the operation is done
Vector x;
igaze->getFixationPoint(x); // retrieve the current fixation point
cout<<"final error = "<<norm(fp-x)<<endl; // return a measure of the displacement error
double norm(const yarp::sig::Matrix &M, int col)
Returns the norm of the vector given in the form: matrix(:,col).
static struct bpf_program fp

We have also here sync and non-sync methods for yielding gazing movements, therefore we refer the reader to the Cartesian Interface to get insights on their peculiarities.

Expressing the fixation point in Absolute Angular Coordinate System

This coordinate system is an absolute head-centered angular reference frame as explained below:

  • Angular means that it makes use of the azimuth, elevation and vergence that are angular quantities.
  • Head-Centered means that the frame is attached at the middle point between the two cameras owning the same set of three axes.
  • Absolute indicates that it remains still irrespective of the robot motion and it refers to the position of the head when the robot is in rest configuration (i.e. torso and head angles zeroed).

For instance, to specify a desired location where to look at in this coordinate system, one can write:

Vector ang(3);
ang[0]=+10.0; // azimuth-component [deg]
ang[1]=-05.0; // elevation-component [deg]
ang[2]=+20.0; // vergence-component [deg]
igaze->lookAtAbsAngles(ang); // move the gaze
...
igaze->getAngles(ang); // get the current angular configuration

Expressing the fixation point in Relative Angular Coordinate System

Also the relative angular coordinate system is available which is basically the same as the absolute one but refers to the current configuration of the head-centered frame. Hence we can use:

Vector ang(3);
ang[0]=+10.0; // azimuth-relative component wrt the current configuration [deg]
ang[1]=-05.0; // elevation-relative component wrt the current configuration [deg]
ang[2]=+20.0; // vergence-relative component wrt the current configuration [deg]
igaze->lookAtRelAngles(ang); // move the gaze wrt the current configuration

Expressing the fixation point in pixel coordinates (monocular approach)

The user can also command the gaze by specifying a location within the image plane of one camera as follows:

int camSel=0; // select the image plane: 0 (left), 1 (right)
Vector px(2); // specify the pixel where to look
px[0]=160.0;
px[1]=120.0;
double z=1.0; // distance [m] of the object from the image plane (extended to infinity): yes, you probably need to guess, but it works pretty robustly
igaze->lookAtMonoPixel(camSel,px,z); // look!

What the interface will internally execute is to retrieve the corresponding 3D point through a call to get3DPoint() to then use this result to command an equivalent Cartesian displacement as in the following:

Vector x;
igaze->get3DPoint(camSel,px,z,x);
igaze->lookAtFixationPoint(x);

The benefit of lookAtMonoPixel() method is clearly that the time for the rpc communication is saved, hence it is particularly designed for control in streaming mode; on the other hand, the knowledge of the Cartesian point remains hidden to the user, unless it is retrieved at the end of the motion through a call to getFixationPoint().
Alternatively, one can think to rely on the vergence angle given in degrees in place of the the component z, accounting for a different way of expressing distances from the eyes. To this end, the user can call the proper method lookAtMonoPixelWithVergence(camSel,px,ver).

Expressing the fixation point in pixel coordinates (stereo approach)

The same thing can be achieved also by exploiting the stereo vision:

Vector c(2), pxl(2), pxr(2);
c[0]=160.0; // center of image plane
c[1]=120.0;
bool converged=false;
while (!converged)
{
pxl[0]=...; // specify somehow the pixel within the left image plane
pxl[1]=...;
pxr[0]=...; // specify somehow the pixel within the right image plane
pxr[1]=...;
igaze->lookAtStereoPixels(pxl,pxr); // look!
converged=(0.5*(norm(c-pxl)+norm(c-pxr))<5); // recompute somehow the convergence status
}

Of course, the matching problem between pixels of different image planes is left to the user, who has also to provide a continuous visual feedback while converging to the target.

Geometry of pixels

It might be useful sometimes to perform an homography in order to retrieve the projection of a pixel into a plane specified in the 3D space: let's think for instance to the context of the robot presented with a number of objects that all lie on a table. The table introduces a constraint that allows determining the component z of the point in case the monocular approach is used.

int camSel=0; // select the image plane: 0 (left), 1 (right)
Vector px(2); // specify the pixel where to look
px[0]=160.0;
px[1]=120.0;
Vector plane(4); // specify the plane in the root reference frame as ax+by+cz+d=0; z=-0.12 in this case
plane[0]=0.0; // a
plane[1]=0.0; // b
plane[2]=1.0; // c
plane[3]=0.12; // d
Vector x;
igaze->get3DPointOnPlane(camSel,px,plane,x); // get the projection

It is possible to execute a triangulation to find from the pixels in the images the corresponding 3D point in the space. This problem is solved through a least-squares minimization.

Vector pxl(2), pxr(2);
pxl[0]=...; // specify somehow the pixel within the left image plane
pxl[1]=...;
pxr[0]=...; // specify somehow the pixel within the right image plane
pxr[1]=...;
Vector x;
igaze->triangulate3DPoint(pxl,pxr,x);

Importantly, the triangulation is strongly affected by uncertainties in the cameras alignment, so that, unless these unknowns are perfectly compensated, to fixate in stereo mode it is advisable to rely on the method lookAtStereoPixels().

Fast Saccadic Mode

The user has the possibility to enable the fast saccadic mode that employs the low level position control in order to generate very fast saccadic movements of the eyes. To achieve that, one can simply do the following:

igaze->setSaccadesMode(true);

After a saccade is executed, the next saccadic movement can be performed only after a given inhibition period of time, that in turn can be retrieved and/or tuned relying on specific methods (i.e. get/setSaccadesInhibitionPeriod()).
The controller chooses to perform a saccade only if the angular distance of the target from the straight-ahead line overcomes a given threshold the user might profitable tune relying on dedicated methods (i.e. get/setSaccadesActivationAngle()).

Caveat: vision processing algorithms that assume the continuity of the images flow might be heavily affected by saccades.

Gaze Stabilization

The controller can be instructed to do its best in order to keep the fixation point unvaried under the effect of external disturbances/movements. The gaze stabilization makes use of inertial data to accomplish the job and entails that corrections will be sent to the eyes too (therefore, it is not a mere head stabilization). To enable the gaze stabilization, call the following method:

igaze->setStabilizationMode(true);

The stabilization is always active also during point-to-point motion or while tracking a moving reference, regardless of the current setting for the above mode and unless the stabilization has been purposely disabled (via command-line option), so that any disturbance occurring during that motion will be compesated for. In this respect, the difference between the stabilization mode and the tracking mode is that in the former modality the fixation point is stabilized in the "world" coordinate system, thus this mode turns to be particularly suitable for robot balancing and walking, while in the latter modality the fixation point will be always tracked in the robot's root reference frame, which might be moving with respect to the world as well.
When the stabilization is active and the robot is purely compensating for external disturbances, the neck and eyes limits customized by the user are not taken into account, thus spanning their whole admittable range.

Note
For further details about gaze stabilization refer to: Roncone A., Pattacini U., Metta G. & Natale L., "Gaze Stabilization for Humanoid Robots: a Comprehensive Framework", IEEE-RAS International Conference on Humanoid Robots, Madrid, Spain, November 18-20, 2014.

Context Switch

We define here the context as the configuration in which the controller operates: therefore the context includes the current tracking mode, the eyes and neck trajectory execution time and so on. Obviously the controller performs the same action differently depending on the current context. A way to easily switch among contexts is to rely on storeContext() and restoreContext() methods as follows:

igaze->setEyesTrajTime(0.5); // here the user prepares the context
igaze->setTrackingMode(true);
igaze->bindNeckPitch();
...
int context_0;
igaze->storeContext(&context_0); // latch the context
igaze->setNeckTrajTime(1.5); // at certain point the user may want the controller to
igaze->clearNeckPitch(); // perform some actions with different configuration
igaze->lookAtFixationPoint(fp);
...
igaze->restoreContext(context_0); // ... and then retrieve the stored context_0
igaze->lookAtFixationPoint(fp); // now the controller performs the same action but within the context_0

Unless the user needs the interface just for logging purposes, it's a good rule to store the context at the initialization of his module in order to then restore it at releasing time to preserve the controller configuration.
Note that the special context tagged with the id 0 is reserved by the system to let the user restore the start-up configuration of the controller at any time.

Events Callbacks

The Gaze Interface provides also an easy way to register events callbacks. For example, an event might be represented by the onset of the movement when the controller gets activated by receiving a new target or the end of the movement itself. The user can then attach a callback to any event generated by the interface.
It is required to inherit from the specific class GazeEvent that handles events and overrides the gazeEventCallback() method.

class MotionDoneEvent : public GazeEvent
{
public:
MotionDoneEvent()
{
gazeEventParameters.type="motion-done"; // select here the event type we want to listen to
}
virtual void gazeEventCallback()
{
cout<<"motion complete"<<endl;
}
};
MotionDoneEvent event;
igaze->registerEvent(event); // the tag "motion-done" identifies the event to be notified
igaze->lookAtFixationPoint(fp); // as soon as the motion is complete, the callback will print out the message

To know which events are available for notification, the user can do:

Bottle info;
igaze->getInfo(info);
cout<<info.find("events").asList()->toString().c_str()<<endl;

The class GazeEvent contains two structures: the gazeEventParameters and the gazeEventVariables. The former has to be filled by the user to set up the event details, whereas the latter is filled directly by the event handler in order to provide information to the callback.
For instance, to raise a callback at the middle point of the path, one can do:

class MotionMiddleEvent : public GazeEvent
{
public:
MotionMiddleEvent()
{
gazeEventParameters.type="motion-ongoing";
gazeEventParameters.motionOngoingCheckPoint=0.5; // middle point is at 50% of the path
}
virtual void gazeEventCallback()
{
cout<<"attained the "<<100.0*gazeEventVariables.motionOngoingCheckPoint;
cout<<"% of the path"<<endl;
}
};

The special wildcard "*" can be used to assign a callback to any event, regardless of its type, as done below.

class GeneralEvent : public GazeEvent
{
public:
GeneralEvent()
{
gazeEventParameters.type="*";
}
virtual void gazeEventCallback()
{
cout<<"event of type: "<<gazeEventVariables.type.c_str()<<endl;
cout<<"happened at: "<<gazeEventVariables.time<< "[s] (source time)"<<endl;
}
};