June 25, 2024

I promised you a surprise post for this week and here we are: I am announcing “Hybrid Hands Interactions”, a special opensource plugin that lets you interact in mixed reality with real objects so that you can create some magical experiences like this one:

The bottle and the glass are able to do some magic…

Hybrid interactions

In this period I’m experimenting a lot with Mixed Reality: I believe that true mixed reality is not just putting a passthrough background in your VR application, but it is about creating a true connection between the real and the virtual elements so that to generate a coherent reality which is the natural merging of the two. That’s why I did experiments like the one on the balcony, which lets you have a real balcony that sees a virtual background, with a final shared reality that is mind-blowing.

I’ve been able to do many of these tests thanks to the new features of the XR SDKs, like Meta Scene Setup which lets you map the main elements in your room. But there is something that is still missing in all the SDKs out there and I think it’s fundamental to create a real hybrid reality: interaction with objects. I currently have no way on Quest to take a physical bottle in my hand and put, for instance, a virtual spray can on top of it, so that when I hold my bottle, I see I’m holding the spray can, for a much-improved sense of presence. Or there is no way that I create an interaction between my physical bottle and my physical glass that generates a virtual animation, like some fireworks. This would be fundamental to create a real blend of realities, but currently, no MR headset to my knowledge supports object tracking, even if some AR frameworks, like Vuforia, do.

The intuition

So, as usual, I started thinking if I, a random guy in Italy, could solve a problem that thousands of super engineers at Meta are probably working on. And as I am a random guy, I do not even have access to the data that Meta people have, so for instance I can not use the camera images because Meta prevents access to them for privacy reasons (and I’m strongly advocating for this access to be granted to us trusted developers), so I can not even use computer vision algorithms to track the objects. This really seemed like a mission impossible to accomplish.

I develop in this position, too…

But then I had an intuition: we interact with objects using hands and all headsets now are able to track the hands: what if I could track the hands to approximate the tracking of the object? If I know where the bottle is when the application is started, and then I track the movements of the hand grabbing it, I can know where the bottle is at every frame. For instance, if I know that the hand grabbing it has moved 20 centimeters on the right, also the bottle has moved with it 20 centimeters on the right. It was a bit of a crazy idea but I decided to experiment with it. I thought it could have been fun playing around with this concept for a couple of days in my free time.

The opensource plugin

Three weeks and a few headaches later, I realized that I had underestimated the task. But I can now happily announce that I created a plugin to do these interactions with physical objects exploiting hand tracking and that I’m releasing it on GitHub, open-source with MIT license so that everyone can have fun with it and also use it in commercial applications. You can find it at this link:

https://github.com/TonyViT/HybridHandInteractions

The plugin lets you perform with your bare hands two different operations, which are fundamental for hybrid hand interactions:

  • A set-up moment, which I call “Placement”, that is when the user specifies where the physical objects are, so they can be tracked by the system. For instance, if the user has a physical bottle on his desk, he has to inform the system of where the bottle is to be able to use it in mixed reality. In this plugin, it is possible to perform this operation using bare hands: the user has just to pinch the physical object to set up its position. After all the objects have been registered, their pose can be saved to be later recovered
Using a natural punching gesture to say where is the glass on my desk
  • An interaction moment, that is where the user actually interacts with the elements. The interaction becomes possible because during the setup the system puts a collider around the physical element, so in the interaction phase, the application can track the relationship between the tracked virtual hand and the collider and try to make it consistent with the interaction between the physical hand and the physical object. The allowed interactions at this moment are:
    • Grab: you can take an object in your hand, like when you are holding a bottle
    • Slide: you can make an object slide on a line or a surface, like when you are making a paper sheet slide on your desk
    • Touch: you can activate an object, like when you are pressing a switch to turn on the light
Grabbing a bottle both in the real and the physical world

The two operations are separate, so you can use my code just for the placement or just for the interaction. The placement is very interesting per se: you can use a pinching gesture to put virtual elements in your space and then save their positions so that they are restored in future sessions. Since you can bind the coordinates of the elements to some persistent features detected in the room (e.g. the floor, or the desk), these coordinates are persistent even if you turn on and off the headset… unless you are going to change your Room Setup.

The plugin has been made with Unity, XR Interaction Toolkit, and AR Foundation, so it is theoretically compatible with every headset out there. I’ve tested it only with Meta Quest 3, but it’s made to be as generic as possible.

You can see a very cool demo made with it in the video here below (which is the full video of the initial GIF of this article)

With this video you can fully understand how the system works

Does it work?

Now you may be wondering: but does the original assumption work? Is it really possible to track objects by just using the hands? The honest answer is: it depends.

In my tests, I’ve verified that the system kinda works, but it is not very reliable. There are a few reasons behind it:

  • Hand tracking on Quest is reliable enough, but when you hold an object in your hand, the tracking quality degrades significantly. So the virtual counterpart of your real object may shake a lot. I know that Ultraleap has just released a new runtime that improves hand tracking when the hand is holding an object, so maybe I should do some tests in this sense
  • The hands do a lot of micromovements that we do not notice: for instance, when you are holding an object in your hand, it’s normal to slightly move the object to adjust the grip, but the moment you do this, you break my assumption that the object moves only together with the hand (or its fingers), because you are moving the object inside the hand
  • It’s just a first version and I have not implemented many algorithms to improve its reliability or to filter out the glitches

I wouldn’t use it in a production environment where reliability is vital (e.g. safety training), but it is already nice to be used for some installations where you want to surprise the user with some special FX around physical objects. And it’s also good if you want to simply experiment with mixed reality to do some sort of R&D.

The magic of it

Even if the system is not perfect, when it works, it creates something magical: the first time I was able to grab the bottle, pour the water into the glass, and see the virtual visual FX happening, I was completely amazed. I was using no tracker and no controllers... I was just using two normal objects with my bare hands, like I do every day, but this time, there were some visual augmentations enhancing my experience. And I already have some ideas for other tests I want to do related to magic, food, and other stuff. I guess you will see some videos on my social media channels in the next weeks.

Future projections

Even if the results are not perfect, I wrote the code in a tidy and modular way, with many comments, hoping that this can become the foundation for hand interaction with objects in mixed reality. I did this in my free time, but I would be very happy if some company would be interested in sponsoring this effort so that I could dedicate more time to improving this system and see how far it can go.

In the meantime, I would be very happy if you could get curious about it: go to the repo, try the demo scenes, find my bugs, help me improve it, donate to my Patreon, and promote this system on your social media channels. Everything helps. Let’s see if we can really make hybrid interaction with objects a reality.


Disclaimer: this blog contains advertisement and affiliate links to sustain itself. If you click on an affiliate link, I’ll be very happy because I’ll earn a small commission on your purchase. You can find my boring full disclosure here.