How to do Audio driven blendshapes /Lipsync?


I have a few questions about this and all of the information I have been able to find seems a bit vague.

So, first off, is it a thing that works “out of the box” do I need a script or other configuration?

The avatar standards documentation states:

For the audio drive solution, we’re targeting a phoneme based system that triggers 4 mouth and 2 eye shapes.

Eye: blink & brow up.
Mouth: m, o, ah & e.

great, I can work with that and I assume that my avatar .fst needs to bind my blendshapes to the correct blendshape constants, what are they called?
The documentation refers to default_avatar_full.fst in the resources > meshes higfidelity folder, I don’t seem to have a meshes subfolder but the one in github has 3 BS bindings:

bs = JawOpen = mouth_Open = 1
bs = LipsFunnel = Oo = 1
bs = BrowsU_L = brow_Up = 1
bs = EyeBlink_L = blink = 1

which is also fine, jaw flapping is ok with me, but then there’s the LipsFunnel binded to the O phoneme, is that a dedicated Duck face that activates when the selfie cam is activated?

Am I not seeing the whole picture here because correctly named blendshapes do not need to be in the .fst file to work?

Any help appreciated , i may be dead on the inside but would at least like to look alive on the outside .-)