{"id":5433,"date":"2019-06-25T16:26:54","date_gmt":"2019-06-25T15:26:54","guid":{"rendered":"https:\/\/www.positech.co.uk\/cliffsblog\/?p=5433"},"modified":"2019-06-25T16:26:54","modified_gmt":"2019-06-25T15:26:54","slug":"starting-the-game-an-in-depth-profiling-look","status":"publish","type":"post","link":"https:\/\/www.positech.co.uk\/cliffsblog\/2019\/06\/25\/starting-the-game-an-in-depth-profiling-look\/","title":{"rendered":"Starting the game: An in depth profiling look"},"content":{"rendered":"\n<p>How long does your indie game take to start up? from clicking the icon to actually being able to take input at the main menu? Just for fun, I decided to analyze whats involved in doing so for mine.<\/p>\n\n\n\n<p>Because the aim here is to actually analyze the REAL impact, not the best case, I need to ensure that the game (<a href=\"http:\/\/www.positech.co.uk\/productionline\">Production Line<\/a>) is not just happily sat there all in RAM from a recent run-through, so it seems best to oh&#8230;maybe launch battlefield V beforehand (and quit it) just to populate disk\/RAM with a load of other stuff and do my best to evict all my games code.<\/p>\n\n\n\n<p>Then&#8230;its time to fire-up aqtime and take a look. I decided to do line-level, rather than just function-level analysis, which slows the game massively, taking 17 seconds to start (the reality is dramatically faster), but I can still do relative comparisons.<\/p>\n\n\n\n<p>First thing to notice is that pretty much the entire time is inside Game::InitApp() which makes sense.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"680\" height=\"287\" src=\"https:\/\/www.positech.co.uk\/cliffsblog\/wp-content\/uploads\/2019\/06\/blog1.png\" alt=\"\" class=\"wp-image-5434\" srcset=\"https:\/\/www.positech.co.uk\/cliffsblog\/wp-content\/uploads\/2019\/06\/blog1.png 680w, https:\/\/www.positech.co.uk\/cliffsblog\/wp-content\/uploads\/2019\/06\/blog1-317x134.png 317w\" sizes=\"auto, (max-width: 680px) 100vw, 680px\" \/><\/figure>\n\n\n\n<p>Rather worryingly though, the vast majority appears to be inside SIM_Threadmanager::Initialise. That *may* be an artifact of aqtimes approach to thread profiling, but worth taking a look inside anyway&#8230; And it turns out that 100% of that time is inside SetThreadName() (which i only need for debugging anyway). This is a rare bit of code that I don&#8217;t understand well, and was from the evil interwebs:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#pragma pack(push,8)\ntypedef struct tagTHREADNAME_INFO\n{\n\tDWORD dwType; \/\/ Must be 0x1000.\n\tLPCSTR szName; \/\/ Pointer to name (in user addr space).\n\tDWORD dwThreadID; \/\/ Thread ID (-1=caller thread).\n\tDWORD dwFlags; \/\/ Reserved for future use, must be zero.\n} THREADNAME_INFO;\n#pragma pack(pop)\n\nvoid SetThreadName(DWORD dwThreadID, char* threadName)\n{\n\tTHREADNAME_INFO info;\n\tinfo.dwType = 0x1000;\n\tinfo.szName = threadName;\n\tinfo.dwThreadID = dwThreadID;\n\tinfo.dwFlags = 0;\n\n\t__try\n\t{\n\t\tRaiseException(MS_VC_EXCEPTION, 0, sizeof(info) \/ sizeof(ULONG_PTR), \n(ULONG_PTR*)&amp;info);\n\t}\n\t__except (EXCEPTION_EXECUTE_HANDLER)\n\t{\n\t\tvolatile int foo = 9;\n\t}\n}<\/code><\/pre>\n\n\n\n<p><\/p>\n\n\n\n<p>The exception is basically ALL of the time. WTF? Apparently there is a less hacky way outlined here:  <a href=\"https:\/\/docs.microsoft.com\/en-us\/visualstudio\/debugger\/how-to-set-a-thread-name-in-native-code?view=vs-2019\">https:\/\/docs.microsoft.com\/en-us\/visualstudio\/debugger\/how-to-set-a-thread-name-in-native-code?view=vs-2019<\/a>  Which I will try later. I suspect the waiting for visual studios debugger is the cause of the problem.<\/p>\n\n\n\n<p>Anyway&#8230;onwards and upwards, so whats next? It basically all Init3D() (click to enlarge)<\/p>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"https:\/\/www.positech.co.uk\/cliffsblog\/wp-content\/uploads\/2019\/06\/blog2.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"435\" src=\"https:\/\/www.positech.co.uk\/cliffsblog\/wp-content\/uploads\/2019\/06\/blog2-1024x435.png\" alt=\"\" class=\"wp-image-5435\" srcset=\"https:\/\/www.positech.co.uk\/cliffsblog\/wp-content\/uploads\/2019\/06\/blog2-1024x435.png 1024w, https:\/\/www.positech.co.uk\/cliffsblog\/wp-content\/uploads\/2019\/06\/blog2-680x289.png 680w, https:\/\/www.positech.co.uk\/cliffsblog\/wp-content\/uploads\/2019\/06\/blog2-768x326.png 768w, https:\/\/www.positech.co.uk\/cliffsblog\/wp-content\/uploads\/2019\/06\/blog2-317x135.png 317w, https:\/\/www.positech.co.uk\/cliffsblog\/wp-content\/uploads\/2019\/06\/blog2.png 1162w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<p>So basically my DirectX initialisation and the shadermanager stuff is most of the problem.  I suspect the DirectX initialisdation may be too black-boxed for me to influence further. The first big chunk is this line:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>PD3D9 = Direct3DCreate9(D3D_SDK_VERSION);    <\/code><\/pre>\n\n\n\n<p><\/p>\n\n\n\n<p>Which takes up 34.74% of the start time. The next slow bit is  the largest at 41% which is:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>hr = PD3D9->CreateDevice( AdapterToUse, DeviceType, WindowHandle,\n D3DCREATE_SOFTWARE_VERTEXPROCESSING, &amp;PresentParameters, &amp;PDevice);    <\/code><\/pre>\n\n\n\n<p><\/p>\n\n\n\n<p>So&#8230;holy crap. how can that line of code even be run? This can only happen if my checkcaps() code suggest the video card does not support hardware transform and lighting. I suspect some of the reporting here must be nonsense? Especially as my own debug logs suggest that the hardware TNL version is the one than ran&#8230; FFS :( lets look outside that code then&#8230;<\/p>\n\n\n\n<p>Most of the slowdown is in shader manager, which loads 11 shaders:<\/p>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"https:\/\/www.positech.co.uk\/cliffsblog\/wp-content\/uploads\/2019\/06\/blog3.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"516\" src=\"https:\/\/www.positech.co.uk\/cliffsblog\/wp-content\/uploads\/2019\/06\/blog3-1024x516.png\" alt=\"\" class=\"wp-image-5436\" srcset=\"https:\/\/www.positech.co.uk\/cliffsblog\/wp-content\/uploads\/2019\/06\/blog3-1024x516.png 1024w, https:\/\/www.positech.co.uk\/cliffsblog\/wp-content\/uploads\/2019\/06\/blog3-680x343.png 680w, https:\/\/www.positech.co.uk\/cliffsblog\/wp-content\/uploads\/2019\/06\/blog3-768x387.png 768w, https:\/\/www.positech.co.uk\/cliffsblog\/wp-content\/uploads\/2019\/06\/blog3-317x160.png 317w, https:\/\/www.positech.co.uk\/cliffsblog\/wp-content\/uploads\/2019\/06\/blog3.png 1419w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<p>so it looks like about half the loading time here is actually spent writing out debug data! This is hard to avoid though, as I do find this data invaluable for detecting errors. And because an app can crash and lose all its data, I flush each line of my debug logs to disk with a hard drive flush on each line&#8230;<\/p>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"https:\/\/www.positech.co.uk\/cliffsblog\/wp-content\/uploads\/2019\/06\/blog4.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"438\" src=\"https:\/\/www.positech.co.uk\/cliffsblog\/wp-content\/uploads\/2019\/06\/blog4-1024x438.png\" alt=\"\" class=\"wp-image-5437\" srcset=\"https:\/\/www.positech.co.uk\/cliffsblog\/wp-content\/uploads\/2019\/06\/blog4-1024x438.png 1024w, https:\/\/www.positech.co.uk\/cliffsblog\/wp-content\/uploads\/2019\/06\/blog4-680x291.png 680w, https:\/\/www.positech.co.uk\/cliffsblog\/wp-content\/uploads\/2019\/06\/blog4-768x329.png 768w, https:\/\/www.positech.co.uk\/cliffsblog\/wp-content\/uploads\/2019\/06\/blog4-317x136.png 317w, https:\/\/www.positech.co.uk\/cliffsblog\/wp-content\/uploads\/2019\/06\/blog4.png 1255w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<p>&#8230;so interestingly all the time seems to be inside OutputDebugString, which is only of any real use when the debugger is running. However! I *do* want to see that data in both release builds, and debug builds. Maybe I need a flag to tell if a debugger is present when the debug engine starts up? As a first pass I should at least build up a char* with the newline in to avoid twice the OutputDebugString calls. Here is the new code and timings.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><a href=\"https:\/\/www.positech.co.uk\/cliffsblog\/wp-content\/uploads\/2019\/06\/blog5.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"564\" src=\"https:\/\/www.positech.co.uk\/cliffsblog\/wp-content\/uploads\/2019\/06\/blog5-1024x564.png\" alt=\"\" class=\"wp-image-5438\" srcset=\"https:\/\/www.positech.co.uk\/cliffsblog\/wp-content\/uploads\/2019\/06\/blog5-1024x564.png 1024w, https:\/\/www.positech.co.uk\/cliffsblog\/wp-content\/uploads\/2019\/06\/blog5-680x375.png 680w, https:\/\/www.positech.co.uk\/cliffsblog\/wp-content\/uploads\/2019\/06\/blog5-768x423.png 768w, https:\/\/www.positech.co.uk\/cliffsblog\/wp-content\/uploads\/2019\/06\/blog5-317x175.png 317w, https:\/\/www.positech.co.uk\/cliffsblog\/wp-content\/uploads\/2019\/06\/blog5.png 1232w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<p>Ooooh. I&#8217;ve halved the time of it. I&#8217;ve done the same with my non-directx debug code too. Now I&#8217;ll try changing that thread stuff&#8230; It turns out that SetThreadDescription is windows 10 only, so I need a different system (and apparently would need to update my platform SDK&#8230;urrrgh), so maybe best to just skip calling that code when no debugger is detected?<\/p>\n\n\n\n<p>This works (using isDebuggerPresent) but the profiler actually trips that flag too, so to set it work I needed to compare time stamps on debug files. Without the debugger, it looks like time from app start to menu ready is 0.395 seconds. With the debugger its&#8230;  0.532 seconds. <\/p>\n\n\n\n<p>That sounds pretty small, but actually I&#8217;m quite happy as I lost ZERO functionality, and the changes to the debug data will affect any time that data is written, not just during startup. 9Its not a lot, but there is *some*, and I&#8217;m an efficiency obsessive.<\/p>\n\n\n\n<p>I think I&#8217;ll put a clause around the debugengines OutputDebugString and nuke that unless IsDebuggerPresent() too :D<\/p>\n","protected":false},"excerpt":{"rendered":"<p>How long does your indie game take to start up? from clicking the icon to actually being able to take input at the main menu? Just for fun, I decided to analyze whats involved in doing so for mine. Because the aim here is to actually analyze the REAL impact, not the best case, I<\/p>\n<p class=\"text-right\"><span class=\"screen-reader-text\">Continue Reading&#8230; Starting the game: An in depth profiling look<\/span><a class=\"btn btn-secondary continue-reading\" href=\"https:\/\/www.positech.co.uk\/cliffsblog\/2019\/06\/25\/starting-the-game-an-in-depth-profiling-look\/\">Continue Reading&#8230;<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[129,4],"tags":[],"class_list":["post-5433","post","type-post","status-publish","format-standard","hentry","category-production-line","category-programming"],"_links":{"self":[{"href":"https:\/\/www.positech.co.uk\/cliffsblog\/wp-json\/wp\/v2\/posts\/5433","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.positech.co.uk\/cliffsblog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.positech.co.uk\/cliffsblog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.positech.co.uk\/cliffsblog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.positech.co.uk\/cliffsblog\/wp-json\/wp\/v2\/comments?post=5433"}],"version-history":[{"count":7,"href":"https:\/\/www.positech.co.uk\/cliffsblog\/wp-json\/wp\/v2\/posts\/5433\/revisions"}],"predecessor-version":[{"id":5445,"href":"https:\/\/www.positech.co.uk\/cliffsblog\/wp-json\/wp\/v2\/posts\/5433\/revisions\/5445"}],"wp:attachment":[{"href":"https:\/\/www.positech.co.uk\/cliffsblog\/wp-json\/wp\/v2\/media?parent=5433"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.positech.co.uk\/cliffsblog\/wp-json\/wp\/v2\/categories?post=5433"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.positech.co.uk\/cliffsblog\/wp-json\/wp\/v2\/tags?post=5433"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}