Spine rebuilds mesh every frame on unity? This is slow.

hippocoder · 7. lip. 2013

Hi,

We found out recently, that Spine rebuilds the mesh from scratch every frame in Unity. Will you be able to change this behaviour so it only submits new vert positions? Rebuilding every frame is very, very slow in Unity and this will impact our performance.

While I can probably modify this easily enough, I don't feel confident doing so especially in light that the source code is likely to change as new improvements are added to Spine. As Unity is one of your core markets, do you agree going forward it's probably better to only update vertex attributes which have actually changed as opposed to the current brute force method of rebuilding the mesh every single frame for every single object?

It's a fairly major gotcha as Unity unlike cocos, is highly managed in how it behaves and what we do with meshes is filtered through their engine and numerous optimisations for different platforms (ie it will choose the best possible behaviour based on the target hardware, with several different mesh submission routines for different android devices for example).

So rebuilding mesh every frame is kinda out of the question for any serious game on unity.

hippocoder · 7. lip. 2013

extra info: http://docs.unity3d.com/Documentation/S ... /Mesh.html

Ideally the mesh should have a local array of verts, which you only need to update per frame, rather than submitting:

      mesh.vertices = vertices;
      mesh.colors = colors;
      mesh.uv = uvs;
      mesh.triangles = triangles;

...which is a lot more expensive, and should only be done for the initial first mesh, with future updates to the mesh being just mesh.vertices changes (or whatever else became dirty). Typically, ex2D and 2D Toolkit will only update dirty attributes which changed from the last frame. Within the unity framework, it's a lot better performance to do this.

It's also higher performance to keep a local copy of your verts instead of reading the verts from the structure because if you read the verts before modifying them, it will do an array copy internally first.

You can imagine just how slow this gets in Unity once you start having more than a couple of objects on mobile; it gets ugly fast. I imagine on other frameworks such as Cocos, it's a lot simpler to just throw quads at it. It's not that simple in Unity due the fact unity has a lot of internal optimisations for weird devices due to it's heavy cross platform nature. In fact regarding meshes, it's very aggressively optimised on a per-device basis in the android world.

For most, a typical game loop would be ending with mesh.vertices = vertices; and nothing else, where vertices is your own internal array.

Sorry if this comes off as being arrogant or annoying, I'm just trying to point out that existing frameworks like Smoothmoves are currently considerably faster than Spine, and this can be rectified without too much work on your part. We benchmarked both systems on mobile and the results weren't very favorable towards Spine.

I'm asking you guys to deal with it rather than me because if I do it and you add more features, I'm going to fall out of sync and the rest of your customers won't benefit. Thanks for listening and I only mean well for Spine as it's where we've put our support

hippocoder · 7. lip. 2013

More findings:

It's 4ms per frame spent on just SkeletonComponent.UpdateSkeleton() and AnimationState.Apply(), which is fairly brutal for an i5. I am investigating rewriting it, but it seems tricky given I don't know the code for it. Perhaps a community project?

If you (the authors) have plans to optimise please let me know. Also, you should be able to use the unity profiler on the trial version, use the Deep Profile option with a few spine objects playing - you will quickly be able to narrow things down. Things like GetComponent and setting materials every frame are also bad to do in Unity (I appreciate I sound a bit long-winded now, but just investigating for you).

Pharan · 7. lip. 2013

Really? We tested Spine on 1st-gen ipad and it was fine with like 30-40 of the suckers on screen. They were one batched draw 'cause it was basically instances of a prefabbed Spine Animation. And the animation had a lot of image switching so the quad replacements (not the mesh clearing/rebuilding) were actually necessary. No framerate drop.

We couldn't get that performance from Smoothmoves. My iPad kept crashing. Not sure if it was an iOS/mobile optimization issue in their case.

So.. weird that we have different results.

Still, I agree there's a bunch of things that could be less brute-force. I have to admit that when I read that part of the code last month, I was like... "really? Well, I guess that's just how it works."

But I think the code for the flow of data from Spine classes into a mesh is very likely to change soon (in the other runtimes too) especially since the multiple-materials/atlases, alpha blended slots, bounding boxes and animated draw order haven't been implemented yet.

If it's true that SmoothMoves' system of making one gameObject for each quad doesn't actually slow it down, maybe Spine-Unity could go that route. At the very least, a fork could. In fact, there's probably a lot that could be done in base Spine-C# so it can communicate more directly with Unity objects. Right now, it's pretty agnostic for compatibility and maintainability reasons... which I agree with in Nate's case.

Nate's basically the sole programmer (and co-architect with Shiu?) of Spine AND Spine Editor. I think he'll appreciate the help improving specific runtimes from people who have more standards info, experience and knowledge about specific frameworks' quirks as long as you try to keep up with the overall scheme of Spine's compatibility with different frameworks.

This specific concern doesn't affect other frameworks though, so I think it's cool to explore changes in whatever way we might think is a good idea. But again, there are those future features to consider.

On performance specifics about GetComponent<T>() and setting materials though:
The Spine Unity classes don't seem to use GetComponent in any of the loops. Nor any implicit calls to it like when you access GameObject properties like transform or rigidbody. The references are cached.
And since you mentioned it, what does this line do to performance?

renderer.sharedMaterial = skeletonDataAsset.atlasAsset.material;

outtoplay · 8. lip. 2013

New here to Spine, but I have to say I'm loving it. It's a true animator's program. Clearly anything that would improve the asset pipeline between Spine

Unity would be terrific. When I read that the mesh was being redrawn every frame, I must admit that was a bit scary. But it sounds like the issues is not monumental and could be dealt with handily.

Hippocoder is a huge influence and help back home on the Unity forum. IF he's considering implementing Spine in his team's workflow, that's gonna turn a lot of heads. Mine immediately. Will keep an eye on this thread for sure.

Best to all, have a great weekend!
Brendan

hippocoder · 8. lip. 2013

Hiya,

Thanks for the heads up, Pharan, it's good to know that the runtimes will be changing soon. These do need the optimisations I hinted at above. Changing material each frame is an overhead that's not needed. The ipad1 is particularly well-suited to handle Spine, more so than the iPhone4, and of course obliterates the 3GS. It also depends on bone count and number of keyframes. 40 is the same as 5 complex models as far as Spine's concerned.

In a demo, Spine's performance is good enough. In a full game with everything else running, it's a different story. On multi-core devices, unity's skinned mesh will be quicker.

Regarding the speed of Smooth Moves, that uses unity skinned mesh, its a lazy man's solution to the problem by not doing the maths in unity, that has pros and cons. On one hand you get much higher performance as your bone complexity rises, but on the other hand you get one draw call per mesh. I think with a couple of optimisations there will be no places where Smooth Moves actually beats out a tight optimised approach in Spine and that is what we all want.

What I want to know is my options in helping Nate, because I like him and I love their product.

While we are at it, shouldn't most runtimes have optional normal/tangents for those using normal mapped sprites / or hardware lighting?

Søren · 8. lip. 2013

Hey just wanted to tell you that Nate is currently in Turkey which is why there has been no reply and personally I don't know enough about the coding aspects to be of any assistance. Nate will be back in a couple of days.

hippocoder · 8. lip. 2013

Garbage collection hit:

AnimationStateData.cs - GetMix

		public float GetMix (Animation from, Animation to) {
			KeyValuePair<Animation, Animation> key = new KeyValuePair<Animation, Animation>(from, to);
			float duration;
			animationToMixTime.TryGetValue(key, out duration);
			return duration;
		}

Causes garbage collection and adds up quickly, if you change animations a lot - say 25 ai creatures running around, builds up on mobile for stutter. Probably not necessary to create new data at any point during an animation as you already know ahead of time. A fixed dictionary or pool would be best here for anims which change a lot. Eliminating GC is one of the main points about unity asset store / middleware stuff. Creating new items during gameplay is bad for mono's terrible garbage collector used in unity.

hippocoder · 8. lip. 2013

Shiu wrote
Hey just wanted to tell you that Nate is currently in Turkey which is why there has been no reply and personally I don't know enough about the coding aspects to be of any assistance. Nate will be back in a couple of days.

Ah good to know. I hope things are OK in Turkey with the current events being what they are.

Pharan · 8. lip. 2013

but on the other hand you get one draw call per mesh.

I'm not sure you get one draw call per mesh in the one-object-per-quad setup. Unity does dynamic batching. Try making several copies of one spine skeleton in a scene. Unity seems to batch their separate meshes as one draw call as long as they share a material.

criteria for it: http://docs.unity3d.com/Documentation/M ... ching.html
The per-vertex overhead is interesting though. That's probably the thing to watch.

Anyway, a one-object-per-quad setup isn't a clear winner.

hippocoder · 8. lip. 2013

Pharan wrote
but on the other hand you get one draw call per mesh.

I'm not sure you get one draw call per mesh in the one-object-per-quad setup. Unity does dynamic batching. Try making several copies of one spine skeleton in a scene. Unity seems to batch their separate meshes as one draw call as long as they share a material.

criteria for it: http://docs.unity3d.com/Documentation/M ... ching.html
The per-vertex overhead is interesting though. That's probably the thing to watch.

Anyway, a one-object-per-quad setup isn't a clear winner.

I don't think you understood me. Spine creates one mesh, which is dynamically batched. Smooth Moves creates one SKINNED MESH which cannot be dynamically batched. Which is why you get one draw call per mesh with SM. However, you can sidestep this by creating many animated objects within a single skinned mesh, which will always force a single draw call. Back in 2011 we exploited this in Physynth, reducing 70 draw calls to 1 draw call. It was of course, much faster.

This has nothing to do with dynamic batching and I've never ever implied anything to do with one gameobject per mesh. Here's the quote again:

Regarding the speed of Smooth Moves, that uses unity skinned mesh, its a lazy man's solution to the problem by not doing the maths in unity, that has pros and cons. On one hand you get much higher performance as your bone complexity rises, but on the other hand you get one draw call per mesh.

I should make the last bit read per SKINNED mesh, but I felt it was ok in context.

Pharan · 8. lip. 2013

Haha! Forgive me for misunderstanding, and thank you for the clarification. Just full disclosure: I'm likely worse than Nate in self-proclaimed Unity n00bishness. : p

I'm not familiar with what sort of Unity data structure a Skinned Mesh is. I've never even heard of it until now. But I'm not gonna ask you to explain it. I'll have to look that up.

When we were fooling around with SmoothMoves though, the sprite object broke down into its individual bones/quads in the hierarchy panel. We were even able to attach other completely unrelated objects to a specific bone's transform in the hierarchy (and got ourselves a cheap laugh out of it).

Just recalling that and its behavior, I just assumed (incorrectly) that they were individual GameObjects.

hippocoder · 9. lip. 2013

I don't really dig sm, I think it's got it's own set of wonderful bugs, and I don't think using skinned meshes provides the right kind of overall performance. I mean, if you wanted a lot of them, you need to jump through hoops and have 1 skinned mesh renderer for a lot of skinned meshes ie 1 mesh but lots of sub animations. That still gets your 1 draw call, but its horrible to work with and not worth it. As small indie developers, a decent workflow is that much more important.

Nate · 10. lip. 2013

Turkey was fun.

hippocoder wrote
In fact regarding meshes, it's very aggressively optimised on a per-device basis in the android world.

Maybe for some fancy stuff, but if we are just batching quads, black magic isn't required. Eg, libgdx has a single code path for batching quads for OpenGL on all devices and OSes, and performance of this is faster than Unity (according to benchmarks run by others). Unity likely has more overhead when used in a similar way. It is a bit off topic why exactly it may have extra overhead, we have to deal with it either way. I agree of course that the Spine runtime for Unity should be as optimized as possible. Unfortunately little insight is provided into what Unity does with the data it is given. It's a black box and it isn't clear what actions may be expensive. "Profiler" is grayed out for me. I'll look into getting a version where I can use the profiler.

Currently we set the vertices, colors, uvs, and indexes ("triangles") every frame. For a Spine skeleton that is animating, we can reasonably expect every vertex to change position every frame. We use vertex colors and it is possible that these change every frame. Images for slots can be added or removed at any time. We could easily avoid setting the indices every frame, so I've committed this. I doubt it would be worthwhile to attempt to avoid setting the vertices, colors, or uvs each frame.

Collecting vertices, uvs, and colors like this is a standard way of batching for 2D games. While Unity likely causes an extra copy somewhere, I would be surprised if Unity added enough overhead to make this too slow for complex 2D games, even on mobile. Note Vector2 and Vector3 are structs.

I moved setting "renderer.sharedMaterial" so it doesn't happen every frame. That might help performance. I would love spine-unity to be able to support using an atlas with multiple pages. Do you have any insight on how a Unity mesh can use multiple materials?

hippocoder wrote
It's also higher performance to keep a local copy of your verts instead of reading the verts from the structure because if you read the verts before modifying them, it will do an array copy internally first.

SkeletonComponent already keeps the vertices, colors, uvs, and indices in fields.

hippocoder wrote
Garbage collection hit:
AnimationStateData.cs - GetMix

KeyValuePair is a struct and will be allocated on the stack, so this method should never generate any garbage. KeyValuePair is immutable and can't be pooled. I don't think GetMix can be written any better, but it also isn't called very often, only when animations change.

hippocoder wrote
It's 4ms per frame spent on just SkeletonComponent.UpdateSkeleton() and AnimationState.Apply(), which is fairly brutal for an i5. I am investigating rewriting it, but it seems tricky given I don't know the code for it. Perhaps a community project?

Those methods should already be pretty well optimized, I doubt they can be made much faster. Maybe you had a debugger attached when you were profiling? The spine-unity project is on github and I'm happy to merge contributions, so it is effectively already a community project. We've already had 23 pull requests, most of which were merged. Using git can be a PITA, especially at first. I highly suggest SmartGit, even if you already know git.

The runtimes will continue to be maintained and improved, though for the most part the only large changes planned for the future are likely to be new features such as the event timeline and bounding boxes. If we can optimize or otherwise improve things, we'll do that by all means! Any and all help is welcome. I'm happy to discuss changes here on the forum and make them myself if we can identify improvements. If you'd like to submit pull requests via github, that would also be fantastic. If they are merged then they will be officially maintained moving forward.

I made a few other small optimizations. Can you try with the latest and see how it performs?

Pharan · 10. lip. 2013

Welcome back, Nate!

lol. I used to always get thrown off by the fact that structs are sometimes instantiated with the "new" keyword and a constructor. Coming from C++, there was a time when I thought the heap just turned into an ocean of discarded Unity Vector3's being GCed all the time. It felt so bad every time I had to make a new one. XD

hippocoder · 11. lip. 2013

Hey nate! welcome back as well. Thanks for the reply. Regarding mesh stuff, it's probably going to be only a minor change on your part to not resubmit each part of the mesh and have a small bool. For example if there's been no actual changes from the last frame in the quads, you could just submit .vertices, and if there's been a vert colour change, just submit .colors... triangles and = new Mesh are a last resort (as is making a new mesh) in unity. It's a pretty hefty hit (I presume due to the way unity works internally, with shaders, materials, batching, 3D stuff, sorting and so forth).

Those are minor easy changes you can make which will really reduce how much time unity spends internally with it's above nonsense + scenegraph stuff.

For 4+ it's also recommended in addition to the above tips of not making a new mesh every frame, you use http://docs.unity3d.com/Documentation/S ... namic.html - MarkDynamic - which helps unity's overhead to remain as small as possible. If it only unity was a simple as the other platforms!

Pharan · 11. lip. 2013

Does the Mesh class do a lot of extra work every time you resubmit everything VS only resubmitting the vertex array?

and is it better if you modify elements in mesh.vertices or mesh.colors directly in the loop that takes data from the skeleton instead of having the data pass through SkeletonComponent.vertices[] or the other array/value buffers?

Nate · 11. lip. 2013

Just to be clear, spine-unity never recreates the mesh and does not create a new vertices, colors, uvs, or triangles arrays each frame. It only repopulates the mesh's vertices, colors, and uvs arrays each frame and doing this doesn't allocate on the heap (it uses structs). The arrays are only recreated if more vertices are needed than we currently have. If the mesh has more vertices than needed, it zeros them out rather than reallocate the arrays. (Unity really should allow setting an offset and length instead of forcing the arrays to start at zero and be exactly the right length ).

I've added MarkDynamic, thanks! Someone using Unity 3 can just remove it. Since 4 is now free, I don't see a huge need to make the runtime work in 3 out of the box. Removing it is trivial.

Knowing if there has been a color change is not so easy. An animation can change it, or someone can directly modify a slot. The overhead for tracking if a color change has occurred and branching for each attachment is likely more than just setting new colors and is always worse if a color change has actually happened (which is reasonably common). To see the impact not setting colors would have, comment out the line that sets mesh.colors. I doubt you will see any measurable difference. The same could be done for UVs. Vertices will always need to be submitted each frame.

I feel the current code is pretty efficient. Is there a bottleneck we can identify and changes we can make that improve that bottleneck?

Lazrhog · 11. lip. 2013

Thats the trouble with dev libraries like unity. They are never going to do things in the most optimal and efficient manner. Only your own lovingly hand crafted opengles code can do that :p