One of the projects I’m intending to pursue is the development of an advanced NUI (Natural User Interface) that integrates markerless mocap animation data for locomotion, body positioning and possibly gesture commands.
However, looking at the available tech, I’m faced with a lot of decisions to make before I can confidently chose a solution! Since I’ve set a load of time aside for HiFi, but am still unable to get inworld (see here), I thought I’d write up my research and share what I’ve found here.
There are two starting points:
- Look at what’s already supported in HiFi - What’s in here
- Look at the available hardware and SDKs - What’s out there
What's in here
Since both the Kinect and the Xtion Pro can already be used by Faceshift, I feel safe presuming that there is support for datastreams from these cameras already. If anyone could give me some detail on how this is done (e.g. which SDK / libraries are being used, which classes implement the functionality) that would be great. I also see the PrioVR is integrated into Interface. It’s good to see that dealing with the data stream doesn’t seem too complex (mapping rotations from input joints to avatar joints), but the PrioVR suit doesn’t really fit my requirements. There’s also Sixense support, but as far as I can see it currently provides support for the hydras only - and anyway, the Sixense STEM body capture is not markerless, so doesn’t fit my idea of a Natural UI either.
What's out there
There’s a huge range of solution combinations out there! I’m having trouble finding an ideal solution though, as matching both an SDK / library and a camera that fit the bill gets a little complex, particularly in terms of supported OSs and licensing issues. Here’s where I am:
Purely on specifications, the obvious choice is the Kinect 2, complete with 25 joint skeleton, larger operating distances and the highest camera resolutions. However, the SDK only runs on Windows 8. Similarly, the original Kinect’s SDK only runs on Win7 and Win8.
Looking at alternatives, the Asus Xtion Pro Live and the Primesense Carmine cameras seem to bubble up to the top of the list. The Asus Xtion Pro Live looks particularly good, as it’s SDK incorporates the OpenNI and NiTE middleware libraries (more on those below).
Primesense’s offering isn’t really in the running for me right now, as Apple bought them out last November and quickly took down all the Open Source OpenNI pages that Primesense originally supplied. It’s not clear what Apple will do next, but their decision to take down the web pages supporting a thriving Open Source community doesn’t exactly bode well. (note: Structure have taken up the OpenNI baton, and now host the OpenNI page here with the wry title “The rumors of my death have been greatly exaggerated…”).
Which leads me nicely onto the subject of available SDKs / libraries…
SDKs / libraries
This is where the harder decisions come in, as there doesn’t seem to be an option that fits all requirements:
~ OS support: Linux, Windows, Mac
~ Easily accessible skeleton stream.
~ Skeleton should have the highest possible number of bones / joints possible
~ No licensing issues
The Microsoft SDKs look like the easiest to get up and running, but of course the lack of OS support puts them straight out of the running for a long term solution.
There are two Open Source solutions worth looking at; OpenNI+SensorKinect and the libfreenect driver from the OpenKinect community, both of which are built on OpenNI.
The Asus Xtion Pro Live’s SDK looks great bar one thing - no Mac support - which I find weird, as I think there are Mac users using the device with FaceShift already - can anyone shed light on this?
The Xtion SDK does support Win 32/64 : XP , Vista, 7, 8, Linux Ubuntu 10.10: X86,32/64 bit and Android (by request). Since there is Linux support, I am holding hope that Mac support may not be too far away, but that’s quite possibly my over-optimism!
The Asus Xtion Pro Live’s SDK has the OpenNI library bundled - ok, I guess it’s time I went into some detail on OpenNI and NiTE.
OpenNI is an industry-led, non-profit organization formed to certify and promote the compatibility and interoperability of Natural Interaction (NI) devices, applications and middleware. (source)
OpenNI supplies, amongst other streams, a skeletal data stream.
OpenNI is now on it’s second version. Whilst version 1 of the OpenNI SDK had good OS support, the OpenNI 2 library, apparently due to Kinect license restrictions only supports Windows, so any HiFi development using OpenNI might have to be run on OpenNI 1.5 or limited to Windows (see the discussion on future Mac / Linux OpenNI 2 support using OpenNI2-Freenect here).
Sadly, it looks like OpenNI’s plan for the Kinect 2 is that it will only be supported by OpenNI 2, which in turn doesn’t support *nix. Bugger.
PrimeSense developed the NiTE Middleware, the software that analyzes the data from the hardware, are the modules for OpenNI providing hand and gesture tracking. They are free but not open source, being released only as binaries.
Quoting the Linkedin page: “NiTE identifies users and tracks their movements, and provides the framework API for implementing Natural-Interaction UI controls based on gestures.” The system can then interpret specific gestures, making completely hands-free control of electronic devices a reality. Including:
~ Identification of people, their body properties, movements and gestures
~ Classification of objects such as furniture
~ Location of walls and floor
If you’re interested to find out more about OpenNi and NiTE, there’s a very detailed youtube explanation with examples here.
For now, I’m tempted to use a Kinect for Windows SDK 2.0 / Kinect 2 solution for development purposes in the hope that OpenNI 2 will eventually support other OSs (possible, but licensing issues are a problem) and the Kinect 2 (seems very likely). Of course, I’d be forced to upgrade to Windows 8, which irritates me no end…
I’d be really interested to hear from anyone else who’s got any tidbits of info on the subject of full body tracking. Please note, I am still very new to the project, it’s architecture, codebase and capabilities, so as usual there’s a good chance I’ve spouted some complete bollocks here - really, please do feel free to correct / enlighten me as appropriate!