{"id":1232,"date":"2023-03-07T16:19:14","date_gmt":"2023-03-07T20:19:14","guid":{"rendered":"https:\/\/jausoft.com\/blog\/?p=1232"},"modified":"2023-03-07T18:59:38","modified_gmt":"2023-03-07T22:59:38","slug":"graph_performance_1296_glyphs_in_10ms","status":"publish","type":"post","link":"https:\/\/jausoft.com\/blog\/2023\/03\/07\/graph_performance_1296_glyphs_in_10ms\/","title":{"rendered":"Graph Performance: 1296 Glyphs processed in ~11ms on Raspi 4b"},"content":{"rendered":"<p><a href=\"https:\/\/jogamp.org\/cgit\/jogl.git\/commit\/?id=9a14dd8d40be4f4d88ba8424e908129ff628e259\">Graph Perf Update: 1296 chars to Region per Frame: (<strong>updated post<\/strong> 2x)<\/a><\/p>\n<ul>\n<li>RaspiPi4 11.34ms (<em>regioned<\/em>) + 5.5ms (draw)<\/li>\n<li>PC 1.93ms (<em>regioned<\/em>) + 0.28ms (draw)<\/li>\n<\/ul>\n<p><!--more--><\/p>\n<p>Performance update from <a href=\"https:\/\/jogamp.org\/cgit\/jogl.git\/commit\/?id=607eb99b9cad227dd7be6d149c6b6cf57d060c35\">commit 607eb99b9cad227dd7be6d149c6b6cf57d060c35<\/a><br \/>\n(Note: There I mentioned the total duration for 20 frames, not per frame)<\/p>\n<p style=\"text-align: left;\"><strong><em>Edit<\/em><\/strong>: I have <a href=\"https:\/\/jogamp.org\/cgit\/jogl.git\/commit\/?id=ad1511295afc0256fa02d6d434db4b119f96f056\"><strong>updated<\/strong> the logs and numbers<\/a>, <em>now with vsync swap-buffers turned off<\/em>. However, this seems to make no difference on the Raspi 4b. Also added screenshots of the visual performance analysis.<a href=\"https:\/\/jausoft.com\/blog\/wp-content\/uploads\/2023\/03\/TestTextRendererNEWT00-snap0-msaa04-800x480-vbaa0004.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1244\" src=\"https:\/\/jausoft.com\/blog\/wp-content\/uploads\/2023\/03\/TestTextRendererNEWT00-snap0-msaa04-800x480-vbaa0004.png\" alt=\"\" width=\"800\" height=\"480\" srcset=\"https:\/\/jausoft.com\/blog\/wp-content\/uploads\/2023\/03\/TestTextRendererNEWT00-snap0-msaa04-800x480-vbaa0004.png 800w, https:\/\/jausoft.com\/blog\/wp-content\/uploads\/2023\/03\/TestTextRendererNEWT00-snap0-msaa04-800x480-vbaa0004-300x180.png 300w, https:\/\/jausoft.com\/blog\/wp-content\/uploads\/2023\/03\/TestTextRendererNEWT00-snap0-msaa04-800x480-vbaa0004-768x461.png 768w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\" \/><\/a>Above a screenshot from a Raspberry Pi 4b (console) using a 800&#215;480 pixel screen.<\/p>\n<p>All Raspberry Pi 4 results were using our DRM EGL\/GBM console driver and the Open Source ES3.1 driver and GLContext.<\/p>\n<p><em>regioned<\/em> is the process where all single pre-computed <em>OutlineShape<\/em> instances per <em>Glyph<\/em> are processed to become one <em>Region.\u00a0<\/em>This process includes our Font layouting and <em>Region.addOutlineShape()<\/em>.<\/p>\n<p><em>Region.addOutlineShape()<\/em> itself performs the triangulation of the shapes, compounding of all vertices and pushing all data down to the VBO buffer, ready to be rendered. Hence, the crucial <em>Graph<\/em> hotspot.<\/p>\n<p><strong>A<\/strong> Performance @ 2.4.0 with 119,787 vertices:<br \/>\n&#8211; <a href=\"https:\/\/jogamp.org\/cgit\/jogl.git\/tree\/doc\/curve\/tests\/perf00\/rpi4_old.log?id=9a14dd8d40be4f4d88ba8424e908129ff628e259\">doc\/curve\/tests\/perf00\/rpi4_old.log<\/a><br \/>\n&#8211; RaspiPi4 <strong>57.20ms<\/strong> (<em>regioned<\/em>) + 23.4ms (draw)<\/p>\n<p><strong>B<\/strong> Performance @ <a href=\"https:\/\/jogamp.org\/cgit\/jogl.git\/commit\/?id=607eb99b9cad227dd7be6d149c6b6cf57d060c35\">last commit<\/a> with 81,092 vertices:<br \/>\n&#8211; <a href=\"https:\/\/jogamp.org\/cgit\/jogl.git\/tree\/doc\/curve\/tests\/perf01\/rpi4_7.log?id=9a14dd8d40be4f4d88ba8424e908129ff628e259#n150\">doc\/curve\/tests\/perf01\/rpi4_7.log<\/a> + <a href=\"https:\/\/jogamp.org\/cgit\/jogl.git\/tree\/doc\/curve\/tests\/perf01\/pc_7.log?id=9a14dd8d40be4f4d88ba8424e908129ff628e259#n164\">doc\/curve\/tests\/perf01\/pc_7.log<\/a><br \/>\n&#8211; RaspiPi4 <strong>11.76ms<\/strong> (<em>regioned<\/em>) + 3.5ms (draw)<br \/>\n&#8211; PC 3.4ms (<em>regioned<\/em>) + 0.35ms (draw)<\/p>\n<p><strong>C<\/strong> Now with 81,092 vertices from a 1296 character string and font <em>FreeSans (<strong>vsync off)<\/strong><\/em>:<br \/>\n&#8211; <a href=\"https:\/\/jogamp.org\/cgit\/jogl.git\/tree\/doc\/curve\/tests\/perf02\/rpi4_10.log#n168\">doc\/curve\/tests\/perf02\/rpi4_10.log<\/a> + <a href=\"https:\/\/jogamp.org\/cgit\/jogl.git\/tree\/doc\/curve\/tests\/perf02\/pc_10.log#n181\">doc\/curve\/tests\/perf02\/pc_10.log<\/a><br \/>\n&#8211; RaspiPi4 <strong>11.34ms<\/strong> (<em>regioned<\/em>) + 5.5ms (draw)<br \/>\n&#8211; PC 1.93ms (<em>regioned<\/em>) + 0.28ms (draw)<\/p>\n<p>Hence we have achieved a reasonable performance enhanced from A -&gt; C.<\/p>\n<p>Most important is that neither <em>Flight Recorder<\/em> nor <em>Visual VM<\/em> could identify<br \/>\n<em>Region.addOutlineShape()<\/em>&#8216;s triangulation nor its compounding of all vertices to be a significant bottleneck.<br \/>\nAfter further triangulation bugfixes (delauny tessellation), we will re-validate this part.<\/p>\n<p>Enhancements of <em>VBO GLArrayData<\/em> data management<br \/>\nwhere <em>Region.addOutlineShape()<\/em> finally pushes the data into the <em>VBO <\/em>helped to remove certain overhead.<\/p>\n<p>The buffer-size enhancements including API-hooks to count the required vertices &amp; indices to issue <em>Region.setBufferCapacity()<\/em> helped to ease the GC.<\/p>\n<h3>Raspi 4b <em>Flight Recorder<\/em> Screenshot after 1 minute of <em>intensive method sampling<\/em><\/h3>\n<p>The following JVM launch options were used.<\/p>\n<blockquote><p>-XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints<br \/>\n-XX:FlightRecorderOptions=stackdepth=2048,threadbuffersize=16k<\/p><\/blockquote>\n<p><a href=\"https:\/\/jausoft.com\/blog\/wp-content\/uploads\/2023\/03\/GraphPerf_FlightRecorder_001.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1239\" src=\"https:\/\/jausoft.com\/blog\/wp-content\/uploads\/2023\/03\/GraphPerf_FlightRecorder_001.png\" alt=\"\" width=\"1652\" height=\"1045\" srcset=\"https:\/\/jausoft.com\/blog\/wp-content\/uploads\/2023\/03\/GraphPerf_FlightRecorder_001.png 1652w, https:\/\/jausoft.com\/blog\/wp-content\/uploads\/2023\/03\/GraphPerf_FlightRecorder_001-300x190.png 300w, https:\/\/jausoft.com\/blog\/wp-content\/uploads\/2023\/03\/GraphPerf_FlightRecorder_001-1024x648.png 1024w, https:\/\/jausoft.com\/blog\/wp-content\/uploads\/2023\/03\/GraphPerf_FlightRecorder_001-768x486.png 768w, https:\/\/jausoft.com\/blog\/wp-content\/uploads\/2023\/03\/GraphPerf_FlightRecorder_001-1536x972.png 1536w, https:\/\/jausoft.com\/blog\/wp-content\/uploads\/2023\/03\/GraphPerf_FlightRecorder_001-1088x688.png 1088w\" sizes=\"auto, (max-width: 1652px) 100vw, 1652px\" \/><\/a><\/p>\n<h3>Raspi 4b <em>Visual VM<\/em> Screenshot after 1 minute of <em>intensive CPU sampling<\/em> (10ms)<\/h3>\n<p><a href=\"https:\/\/jausoft.com\/blog\/wp-content\/uploads\/2023\/03\/GraphPerf_VisualVM_001.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1240\" src=\"https:\/\/jausoft.com\/blog\/wp-content\/uploads\/2023\/03\/GraphPerf_VisualVM_001.png\" alt=\"\" width=\"1757\" height=\"1375\" srcset=\"https:\/\/jausoft.com\/blog\/wp-content\/uploads\/2023\/03\/GraphPerf_VisualVM_001.png 1757w, https:\/\/jausoft.com\/blog\/wp-content\/uploads\/2023\/03\/GraphPerf_VisualVM_001-300x235.png 300w, https:\/\/jausoft.com\/blog\/wp-content\/uploads\/2023\/03\/GraphPerf_VisualVM_001-1024x801.png 1024w, https:\/\/jausoft.com\/blog\/wp-content\/uploads\/2023\/03\/GraphPerf_VisualVM_001-768x601.png 768w, https:\/\/jausoft.com\/blog\/wp-content\/uploads\/2023\/03\/GraphPerf_VisualVM_001-1536x1202.png 1536w, https:\/\/jausoft.com\/blog\/wp-content\/uploads\/2023\/03\/GraphPerf_VisualVM_001-1088x851.png 1088w\" sizes=\"auto, (max-width: 1757px) 100vw, 1757px\" \/><\/a>Interestingly Visual VM still shows swapBuffers to be a performance hog, despite not using VSync.<\/p>\n<p>Perhaps I have to test on a lower resolution, since this test here use a screen pixel size of 1920 x 1080, not quite the usual embedded device resolution maybe \ud83d\ude09<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Graph Perf Update: 1296 chars to Region per Frame: (updated post 2x) RaspiPi4 11.34ms (regioned) + 5.5ms (draw) PC 1.93ms (regioned) + 0.28ms (draw)<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[7,3,8],"tags":[9,13,22,71,16,44,31,17,46,68],"class_list":["post-1232","post","type-post","status-publish","format-standard","hentry","category-3d-opengl","category-computer-stuff","category-jogamp","tag-3d","tag-embedded-device","tag-fonts","tag-graph_type_rendering","tag-java","tag-jogamp","tag-mobile","tag-opengl","tag-openjdk","tag-type-rendering"],"_links":{"self":[{"href":"https:\/\/jausoft.com\/blog\/wp-json\/wp\/v2\/posts\/1232","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/jausoft.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/jausoft.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/jausoft.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/jausoft.com\/blog\/wp-json\/wp\/v2\/comments?post=1232"}],"version-history":[{"count":13,"href":"https:\/\/jausoft.com\/blog\/wp-json\/wp\/v2\/posts\/1232\/revisions"}],"predecessor-version":[{"id":1249,"href":"https:\/\/jausoft.com\/blog\/wp-json\/wp\/v2\/posts\/1232\/revisions\/1249"}],"wp:attachment":[{"href":"https:\/\/jausoft.com\/blog\/wp-json\/wp\/v2\/media?parent=1232"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/jausoft.com\/blog\/wp-json\/wp\/v2\/categories?post=1232"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/jausoft.com\/blog\/wp-json\/wp\/v2\/tags?post=1232"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}