ref: fdc89f1a20889c0295534f58dc2e82ce8c55a18b
doc/MemoryAnalysis.md
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 |
# Memory analysis The PineTime is equipped with the following memories: - The internal RAM : **64KB** - The internal Flash : **512KB** - The external (SPI) Flash : **4MB** Note that the NRF52832 cannot execute code stored in the external flash : we need to store the whole firmware in the internal flash memory, and use the external one to store graphicals assets, fonts... This document describes how the RAM and Flash memories are used in InfiniTime and how to analyze and monitor their usage. It was written in the context of [this memory analysis effort](https://github.com/InfiniTimeOrg/InfiniTime/issues/313). ## Code sections A binary is composed of multiple sections. Most of the time, these sections are : .text, .rodata, .data and .bss but more sections can be defined in the linker script. Here is a small description of these sections and where they end up in memory: - **TEXT** = code (FLASH) - **RODATA** = constants (FLASH) - **DATA** = initialized variables (FLASH + RAM) - **BSS** = uninitialized variables (RAM) ## Internal FLASH The internal flash memory stores the whole firmware: code, variable that are not default-initialized, constants... The content of the flash memory can be easily analyzed thanks to the MAP file generated by the compiler. This file lists all the symbols from the program along with their size and location (section and addresses) in RAM and FLASH. ![Map file](./memoryAnalysis/mapfile.png) As you can see on the picture above, this file contains a lot of information and is not easily readable by a human being. Fortunately, you can easily find tools that parse and display the content of the MAP file in a more understandable way. In this analysis, I used [Linkermapviz](https://github.com/PromyLOPh/linkermapviz). ### Linkermapviz [Linkermapviz](https://github.com/PromyLOPh/linkermapviz) parses the MAP file and displays its content on an HTML page as a graphic: ![linkermapviz](./memoryAnalysis/linkermapviz.png) Using this tool, you can compare the relative size of symbols. This can be helpful for checking memory usage at a glance. Also, as Linkermapviz is written in Python, you can easily modify and adapt it to your firmware or export data in another format. For example, [here it is modified to parse the contents of the MAP file and export it in a CSV file](https://github.com/InfiniTimeOrg/InfiniTime/issues/313#issuecomment-842338620). This file could later be opened in LibreOffice Calc where sort/filter functionality could be used to search for specific symbols in specific files... ### Puncover [Puncover](https://github.com/HBehrens/puncover) is another useful tools that analyses the binary file generated by the compiler (the .out file that contains all debug information). It provides valuable information about the symbols (data and code): name, position, size, max stack of each functions, callers, callees... ![Puncover](./memoryAnalysis/puncover.png) Puncover is really easy to install: - Clone the repo and cd into the cloned directory - Setup a venv - `python -m virtualenv venv` - `source venv/bin/activate` - Install : `pip install .` - Run : `puncover --gcc_tools_base=/path/to/gcc-arm-none-eabi-10.3-2021.10/bin/arm-none-eabi- --elf_file /path/to/build/directory/src/pinetime-app-1.1.0.out --src_root /path/to/sources --build_dir /path/to/build/directory` - Replace - `/path/to/gcc-arm-none-eabi-10.3-2021.10/bin` with the path to your gcc-arm-none-eabi toolchain - `/path/to/build/directory/src/pinetime-app-1.1.0.out` with the path to the binary generated by GCC (.out file) - `/path/to/sources` with the path to the root folder of the sources (checkout directory) - `/path/to/build/directory` with the path to the build directory - Launch a browser at http://localhost:5000/ ### Analysis Using the MAP file and tools, we can easily see what symbols are using most of the flash memory. In this case, unsurprisingly, fonts and graphics are the largest use of flash memory. ![Puncover](./memoryAnalysis/puncover-all-symbols.png) This way, you can easily check what needs to be optimized. We should find a way to store big static data (like fonts and graphics) in the external flash memory, for example. It's always a good idea to check the flash memory space when working on the project. This way, you can easily check that your developments are using a reasonable amount of space. ### Links - Analysis with linkermapviz : https://github.com/InfiniTimeOrg/InfiniTime/issues/313#issuecomment-842338620 - Analysis with Puncover : https://github.com/InfiniTimeOrg/InfiniTime/issues/313#issuecomment-847311392 ## RAM RAM memory contains all the data that can be modified at run-time: variables, stack, heap... ### Data RAM memory can be *statically* allocated, meaning that the size and position of the data are known at compile-time: You can easily analyze the memory used by variables declared in the global scope using the MAP. You'll find them in the .BSS or .DATA sections. Linkermapviz and Puncover can be used to analyze their memory usage. Variables declared in the scope of a function will be allocated on the stack. It means that the stack usage will vary according to the state of the program, and cannot be easily analyzed at compile time. ``` uint8_t buffer[1024] int main() { int a; } ``` #### Analysis In Infinitime 1.1, the biggest buffers are the buffers allocated for LVGL (14KB) and the one for FreeRTOS (16KB). Nimble also allocated 9KB of RAM. ### Stack The stack will be used for everything except tasks, which have their own stack allocated by FreeRTOS. The stack is 8192B and is allocated in the [linker script](https://github.com/InfiniTimeOrg/InfiniTime/blob/develop/nrf_common.ld#L148). An easy way to monitor its usage is by filling the section with a known pattern at boot time, then use the firmware and dump the memory. You can then check the maximum stack usage by checking the address from the beginning of the stack that were overwritten. #### Fill the stack section by a known pattern: Edit <NRFSDK>/modules/nrfx/mdk/gcc_startup_nrf52.S and add the following code after the copy of the data from read only memory to RAM at around line 243: ``` /* Loop to copy data from read only memory to RAM. * The ranges of copy from/to are specified by following symbols: * __etext: LMA of start of the section to copy from. Usually end of text * __data_start__: VMA of start of the section to copy to. * __bss_start__: VMA of end of the section to copy to. Normally __data_end__ is used, but by using __bss_start__ * the user can add their own initialized data section before BSS section with the INTERT AFTER command. * * All addresses must be aligned to 4 bytes boundary. */ ldr r1, =__etext ldr r2, =__data_start__ ldr r3, =__bss_start__ subs r3, r3, r2 ble .L_loop1_done .L_loop1: subs r3, r3, #4 ldr r0, [r1,r3] str r0, [r2,r3] bgt .L_loop1 .L_loop1_done: /* Add this code to fill the stack section with 0xFFEEDDBB */ ldr r0, =__StackLimit ldr r1, =8192 ldr r2, =0xFFEEDDBB .L_fill: str r2, [r0] adds r0, 4 subs r1, 4 bne .L_fill /* -- */ ``` #### Dump RAM memory and check usage Dumping the content of the ram is easy using JLink debugger and `nrfjprog`: ``` nrfjprog --readram ram.bin ``` You can then display the file using objdump: ``` hexdump ram.bin -v | less ``` The stack is positioned at the end of the RAM -> 0xFFFF. Its size is 8192 bytes, so the end of the stack is at 0xE000. On the following dump, the maximum stack usage is 520 bytes (0xFFFF - 0xFDF8): ``` 000fdb0 ddbb ffee ddbb ffee ddbb ffee ddbb ffee 000fdc0 ddbb ffee ddbb ffee ddbb ffee ddbb ffee 000fdd0 ddbb ffee ddbb ffee ddbb ffee ddbb ffee 000fde0 ddbb ffee ddbb ffee ddbb ffee ddbb ffee 000fdf0 ddbb ffee ddbb ffee ffff ffff c24b 0003 000fe00 ffff ffff ffff ffff ffff ffff 0000 0000 000fe10 0018 0000 0000 0000 0000 0000 fe58 2000 000fe20 0000 0000 0000 00ff ddbb 00ff 0018 0000 000fe30 929c 2000 0000 0000 0018 0000 0000 0000 000fe40 92c4 2000 0458 2000 0000 0000 80e7 0003 000fe50 0000 0000 8cd9 0003 ddbb ffee ddbb ffee 000fe60 00dc 2000 92c4 2000 0005 0000 929c 2000 000fe70 007f 0000 feb0 2000 92c4 2000 feb8 2000 000fe80 ddbb ffee 0005 0000 929c 2000 0000 0000 000fe90 aca0 2000 0000 0000 0028 0000 418b 0005 000fea0 02f4 2000 001f 0000 0000 0000 0013 0000 000feb0 b5a8 2000 2199 0005 b5a8 2000 2201 0005 000fec0 b5a8 2000 001e 0000 0000 0000 0013 0000 000fed0 b5b0 2000 0fe0 0006 b5a8 2000 0000 0000 000fee0 0013 0000 2319 0005 0013 0000 0000 0000 000fef0 0000 0000 3b1c 2000 3b1c 2000 d0e3 0000 000ff00 4b70 2000 54ac 2000 4b70 2000 ffff ffff 000ff10 0000 0000 1379 0003 6578 2000 0d75 0003 000ff20 6578 2000 ffff ffff 0000 0000 1379 0003 000ff30 000c 0000 cfeb 0002 39a1 2000 a824 2000 000ff40 0015 0000 cfeb 0002 39a1 2000 a824 2000 000ff50 39a1 2000 0015 0000 001b 0000 b4b9 0002 000ff60 0000 0000 a9f4 2000 4b70 2000 0d75 0003 000ff70 4b70 2000 ffff ffff 0000 0000 1379 0003 000ff80 ed00 e000 a820 2000 1000 4001 7fc0 2000 000ff90 7f64 2000 75a7 0001 a884 2000 7b04 2000 000ffa0 a8c0 2000 0000 0000 0000 0000 0000 0000 000ffb0 7fc0 2000 7f64 2000 8024 2000 a5a5 a5a5 000ffc0 ed00 e000 3fd5 0001 0000 0000 72c0 2000 000ffd0 0000 0000 72e4 2000 3f65 0001 7f64 2000 000ffe0 0000 2001 0000 0000 ef30 e000 0010 0000 000fff0 7fc0 2000 4217 0001 3f0a 0001 0000 6100 ``` #### Analysis According to my experimentations, we don't use the stack that much, and 8192 bytes is probably way too big for InfiniTime! #### Links - https://github.com/InfiniTimeOrg/InfiniTime/issues/313#issuecomment-851035070 ### Heap The heap is declared in the [linker script](https://github.com/InfiniTimeOrg/InfiniTime/blob/develop/nrf_common.ld#L136) and its current size is 8192 bytes. The heap is used for dynamic memory allocation(`malloc()`, `new`...). Heap monitoring is not easy, but it seems that we can use the following code to know the current usage of the heap: ``` auto m = mallinfo(); NRF_LOG_INFO("heap : %d", m.uordblks); ``` #### Analysis According to my experimentation, InfiniTime uses ~6000bytes of heap most of the time. Except when the Navigation app is launched, where the heap usage exceeds 9500 bytes (meaning that the heap overflows and could potentially corrupt the stack). This is a bug that should be fixed in #362. To know exactly what's consuming heap memory, you can `wrap` functions like `malloc()` into your own functions. In this wrapper, you can add logging code or put breakpoints: - Add ` -Wl,-wrap,malloc` to the cmake variable `LINK_FLAGS` of the target you want to debug (pinetime-app, most probably) - Add the following code in `main.cpp` ``` extern "C" { void *__real_malloc (size_t); void* __wrap_malloc(size_t size) { return __real_malloc(size); } } ``` Now, your function `__wrap_malloc()` will be called instead of `malloc()`. You can call the actual malloc from the stdlib by calling `__real_malloc()`. Using this technique, I was able to trace all malloc calls at boot (boot -> digital watch face): - system task = 3464 bytes (SystemTask could potentially be declared as a global variable to avoid heap allocation here) - string music = 31 (maybe we should not use std::string when not needed, as it does heap allocation) - ble_att_svr_start = 1720 - ble gatts start = 40 + 88 - ble ll task = 24 - display app = 104 - digital clock = 96 + 152 - hr task = 304 #### Links - https://github.com/InfiniTimeOrg/InfiniTime/issues/313#issuecomment-851035625 - https://www.embedded.com/mastering-stack-and-heap-for-system-reliability-part-1-calculating-stack-size/ - https://www.embedded.com/mastering-stack-and-heap-for-system-reliability-part-2-properly-allocating-stacks/ - https://www.embedded.com/mastering-stack-and-heap-for-system-reliability-part-3-avoiding-heap-errors/ ## LVGL I did a deep analysis of the usage of the buffer dedicated to lvgl (managed by lv_mem). This buffer is used by lvgl to allocated memory for drivers (display/touch), screens, themes, and all widgets created by the apps. The usage of this buffer can be monitored using this code : ``` lv_mem_monitor_t mon; lv_mem_monitor(&mon); NRF_LOG_INFO("\t Free %d / %d -- max %d", mon.free_size, mon.total_size, mon.max_used); ``` The most interesting metric is `mon.max_used` which specifies the maximum number of bytes used from this buffer since the initialization of lvgl. According to my measurements, initializing the theme, display/touch driver and screens cost **4752** bytes! Then, initializing the digital clock face costs **1541 bytes**. For example a simple lv_label needs **~140 bytes** of memory. I tried to monitor this max value while going through all the apps of InfiniTime 1.1 : the max value I've seen is **5660 bytes**. It means that we could probably **reduce the size of the buffer from 14KB to 6 - 10 KB** (we have to take the fragmentation of the memory into account). ### Links - https://github.com/InfiniTimeOrg/InfiniTime/issues/313#issuecomment-850890064 ## FreeRTOS heap and task stack FreeRTOS statically allocate its own heap buffer in a global variable named `ucHeap`. This is an array of *uint8_t*. Its size is specified by the definition `configTOTAL_HEAP_SIZE` in *FreeRTOSConfig.h* FreeRTOS uses this buffer to allocate memory for tasks stack and all the RTOS object created during runtime (timers, mutexes...). The function `xPortGetFreeHeapSize()` returns the amount of memory available in this *ucHeap* buffer. If this value reaches 0, FreeRTOS runs out of memory. ``` NRF_LOG_INFO("Free heap : %d", xPortGetFreeHeapSize()); ``` The function `uxTaskGetSystemState()` fetches some information about the running tasks like its name and the minimum amount of stack space that has remained for the task since the task was created: ``` TaskStatus_t tasksStatus[10] auto nb = uxTaskGetSystemState(tasksStatus, 10, NULL); for (int i = 0; i < nb; i++) { NRF_LOG_INFO("Task [%s] - %d", tasksStatus[i].pcTaskName, tasksStatus[i].usStackHighWaterMark); ``` |