A current project that I am working on at the moment that runs on an STM32F0 needs to be able to store some settings and a few other things. The device is something off the shelf so I couldn't just change the design and add some NOR flash or an external EEPROM that has drivers already. So I had a few options to choose from.

First option was to use ST's emulated EEPROM library, I started to import the code into the RTOS' build tree and I came across an issue, some of the types used aren't included in the OS (u*_t vs uint*_t) and I'd have to go through and manually fix it up. The code was also reasonably heavy and moves things into RAM, something I don't have a great deal of room to move on. There is also no wear leveling so I would need to come up with my own solution on top of this because I am using 8KB/4 pages rather than the 4KB/2 pages it is designed around (with no wear leveling).

The second option was to use ST's flash StdPeriph driver. The emulated EEPROM also requires this to work so half the work was already done. In hind-sight I should have just used this and modified it to suit my needs. The driver itself is fairly unremarkable, it just implements the required routines for erasing, and writing. I would still need to write a layer over the top to manage the storage but it handles all the lower level stuff with ease.

Third option was to adapt another open source solution for the STM32F403 to the F072 which unlike the former has uniform sector sizes (2 pages per sector) and erases per page rather than per sector. The code for this was however license incompatible so I didn't end up looking at this much further.

The option I took was the stupid one. In a word, don't. To avoid any potential license compatibility issues, and for what I thought would end up being the easiest way, I hit the datasheet for memory component and started writing my own. The current version of the code works well, and while not currently very portable, I will at some point clean up the code and make it more portable to other projects. The low level functions for writing and erasing would have been less work to just use the StdPeriph driver rather than implementing them from scratch. My layer doesn't implement everything their driver does due to it not being needed in my implementation.

My interface layer breaks sectors into their pages as a base unit, and each 2048byte page is broken down into 32 blocks. The page headers use a 32bit field to indicate whether a block has already been flashed, and another 32bit field to indicate whether a block has been invalidated for wear leveling purposes. Because of this I treat my base unit as 32bits. The STM32F0 flash can only be written in halfWords, so my writing routine, unlike the other solutions, only takes a word and breaks it into 2 halfWords to write it to flash.

For wear leveling, each page has a header which tracks the usage, and is rotated in sequence, lowest number is the oldest block and will be filled up first, to avoid unnecessary page erases, if data that is stored in a block is updated, it's block is marked invalid, and the contents are then moved into the next available block, and its block index is recorded in the page header so that the number that it is referenced to in the allocation index does not need to be updated. This decoupling of block order reduces writes to allocation indexes and allows data block 1 to reside in any page, allowing the wear leveling approach to move blocks anywhere within the flash pages.

A filesystem was the preferred approach to storing data in flash but due to RAM constraints, and adding complexity to the low level interface I again rolled my own. In the future I will decouple this from the driver, but at present the allocation indexes are integrated into the driver, as well as a data descriptor for the project this belongs to. The index entries contain a magic to identify valid entries (as uninitialised entries can be written to the block without invalidating or erasing the page), the data type to identify what it points to, the block number and the size in blocks. When requesting an entry it is looked up in the index, and a pointer to its location on flash is returned to save on RAM usage for large entries. The flash address space is accessible directly as memory for read access. To make changes or update the contents in flash, a copy must me made, and the drivers write block function must be called to update it. Where possible the driver will attempt to make the changes without invalidating the block before it follows the wear leveling procedure.

The TL;DR; version of this post is I made a driver that implements most of the functions that one needs to use STM32 flash, but the time it has taken to write when using an off the shelf implementation would have worked is ridiculous. After a week of writing the code I have a mostly functioning driver (just a few functions left to write for the API) and is reasonably efficient code size wise. Using 1500bytes of flash and 250 bytes of RAM. If I could go back and use an existing solution, and use my last week writing drivers for the stuff that doesn't have drivers would I? Yes.

The code for the flash driver will be released along side the rest of the project, hopefully soon. I have a huge list of projects that I've started and never finished, so I'm working hard on making sure that I do finish this one, because I really need to finish something for once.