Heres a new post from Makaron coder Deunan:

Since you asked so nicely I'm going to post some bits and pieces about M1. But first some necessary background.

There are several approaches to figuring out encryption / compression schemes, all involve deep analysis of the data before and after the decoding step.

Unless the system is very simple you won't get far by just looking at the (usually very few) patterns available. In fact it's pretty easy to come up with some hypothesises that will turn out to be false later and will derail the whole research, taking up days to weeks to disprove. This is what accounts for most of the time, and all the failures tend to be quite discouraging. This is the pure analytical approach. Andreas Naive does statistical analysis, it's easier in the sense that it gives you some hints and directions of what to try. However, to even attempt that you need to have the knowledge, tools and experience - things that I lack You also need to have plenty of test data if the results are to have any meaning.

We got a bit lucky with M2/M3 protection. I still have no idea how Andreas managed to identify the encryption so fast and reverse it (though if I was to hazard a guess then I suppose the work he did on Atomiswave carts helped a bit - it's not the same system but bears resemblance). I explained what I know about NAOMI to him and then we had some brainstorming (which was mostly him shooting down my silly suggestions), and eventually came up with some ideas on how to generate the test patterns he needed. I modified my dumper tools to output the data.

I should explain that M3 turned out to be just a variant of M2 that uses cart's own RAM buffer. This is also the major weakness of this system that we exploited. See, it's easy to say "generate test data", but how exactly do you present your own patterns to the decoder if all it can do is access read-only memory? Well, one way would be to inject the pattern into EPROM chip that serves as game program carrier. It's possible but you're limited by available space (up to some 4MBs), while erasing and reprogramming takes a lot of time. Not to mention EPROMs have rather limited number of erase cycles they can go through (about 100) before errors start to show up. Obviously you'll also need all the necessary equipment to do the job in the first place
Another problem was that M2/M3 is actually position-dependant. In other words, the same value at address A and A+n will decode to different result. Data is decrypted in 16-bit words, so to fully test just one address location I'd have to reprogram the EPROM over 65 thousand times. Not really my idea of fun.

But, as I said, there is a RAM buffer on the cart. We figured it will help us crack M3 and then we'll worry about M2. Having the ability to actually upload data to the cart and read back the result reduced the hardware testing times to mere minutes. Obviously Andreas needed very specific patterns but that was just a few changes in the program code on my side. This also allowed the tests to be run on pretty much every cart out there (not just those in my possesion) without resorting to any hardware modifications. It's thanks to this feature that we were able to crack the encryption keys on most games so quickly.

This was M2/M3. M1 is completly different system. The complexity of the encryption is nowhere near that of M2/M3 so it's much weaker from cryptographic point of view - but it took much more work and many more months of head scratching to figure it out. Kinda funny, now that I think about it, that I identified and named the methods used on NAOMI carts as M1 through M3, and it was M3 to go down first and M1 last. Well, we have M4 now too

Anyway, M1 didn't work the same way M2/M3 did. I had only one cart, with only one encrypted sequence on it, so not exactly a lot of data to work on. Trying to "decrypt" other ROM areas only produced confusing results. What's worse, trying to decrypt data from the EPROM area gave completly different and incoherent results. The only way to inject custom patterns into decoder was to spoof a ROM chip.
Soon it became obvious why EPROM approach doesn't work - on this cart type ROMs are paired and read two at once, to produce 2*16 = 32 bits of data for each address location. EPROM output is only 16 bits wide - and while cart compensates for that during normal data transfers, decryption engine just treats the other 16 bits as pulled up. Thus completly changing any patterns we'd want to test. I had to create a device that would make up for the lack of any writable memory on the cart. That's where Altera Cyclone II FPGA comes in again

Just to sum this up, I needed 32 bits for data, tristatable and preferably bidirectional for logic analyzer. Add at least 16 bits of address to that, and some additional control signals (chip selects mostly). Tons of soldering. To add insult to injury the ROMs are 5V and FPGA inputs can handle 3V3 tops (some people abuse the protection resistors and claim 5V tolerance but it's a bit too pricey chip for me to fry just because I'm lazy). I had to add additional overvoltage protection circuitry, that would still allow me to have the bus bidirectional. The final result was this wire mess:

http://dknute.livejournal.com/37246.html