Embedded developers celebrate small chips that punch above their weight. That is why I was excited to have both Google Coral and NVidia Jetson Nano on my bench last week.

Over the last few months, two ML inferencing development boards surfaced. Both aim at low power embedded applications. It was not a secret that many players were putting serious development time to their upcoming chips. Now that we have them in house, we can say much more.

This post is concerned about Google Coral, and a post on Jetson Nano will follow shortly.

Coral is Google's take on what IoT should be.

Google's TPU's were originally meant for their internal cloud use. They had very precise needs, and they were able to address them. The availability of Edge TPU to embedded developers is a statement of a bigger ambition. The debut of their ambition is recently made possible with the recent beta program for Google Coral.

Coral can cover a wide application domain with its peripherals:

  • MIPI camera/display interfaces,
  • 2 on board MEMS microphones,
  • a 3W speaker output,
  • an on board MIMO WiFi connection
  • a Quad core ARM A53 application processor (NXP iMX8M, which hits the mark for its intended audience, but remains slow when compared to the competition),
  • and a Microchip ATECC608A for cryptographic co-processor.

It has everything needed to showcase what Google thinks it is needed for an IoT device executing machine learning inference for audiovisual applications. Google simply states that both machine learning inference and IoT security is a serious issue, and should both be addressed with their dedicated hardware. I also believe that it is a complete set to span as many applications as possible in the domain.  

Do we need yet another processing unit?

The story goes that Google looked at possible candidates that might undertake its upcoming wave of ML inference demand, and thought they can do it better. It is only natural, since the operational philosophy of Google differs a lot from the other players, NVidia and Intel, and they can dictate a sizeable chunk of the ecosystem.

Press photograph of the Edge TPU ASIC, which is not visible to us within its enclosure. 

TPU architecture is thoroughly explained in the paper "A Domain-Specific Architecture for Deep Neural Networks" (with almost Alex Grey-like illustrations in its printed version). The TPU core challenges GPU based inference accelerators by shedding the non-essentials meant for graphical processing, and employing quantization of working datatype to bytes (rather than 32 bit floats in most other cases) within the hardware architecture. In the paper, they rule out comparison with NVidia cores Pascal P40 and Maxwell, due to the fact they they do not have intrinsic memory error checking. This only makes me wonder what would be the results if that comparison was made, and whether effort of memory error checking could have been taken account of in the comparison. This paper definitely deserves its own blog post since it supplies so much to discuss.

The rooflines from the paper A Domain-Specific Architecture for Deep Neural Networks, comparing Google TPU to K80 and Haswell. This plot avoids other NVidia cores, due to the fact that they don't implement datacenter-grade memory error checking requirements of Google. 

In short, Google demonstrated that an AI accelerator doesn't need to be as precise, but it need to be much cheaper and faster. Therefore artifacts from requirements on precision are eliminated in the design in favor of low latency and low unit cost.

So, back to the question. Do we need yet another processing unit?

My answer would be, yes.  I think that the inclusion of Tensor Processing Units (TPUs) to the range of AI accelerators should be celebrated.

All right it is fast, what is the compromise?

Enforcing quantization on the hardware, Coral requires Tensorflow Lite (TFLite). It effectively limiting the amount of ML models that one can run on it. The models must be compiled, trained for TFLite. Since Google also can dictate both the hardware, and the whole environment around it including, cloud infrastructure, the training and inference tool-chain, they opted to define the whole spectrum. We are still to see whether we can alter our current applications to run on it. For now, several Google backed demo applications are running on it. It is a temporal hit in its environment's development-friendliness Google calculated to take anyways.

We have further expectations from Edge TPU.

The fact that Google's TPU is at our disposal is exciting. We have a few expectations, though. We want the chip to be industrial and fanless, and have assurance of its extended availability.

In embedded world, longevity has its own gravitas. If it is not evident that a component will still be supported after 5 years, it might be a reason not to adopt it at all. This is why many suppliers like NXP, ST or Micron have product lines that they promise us an extended lifetime. Google is known for easily discontinuing its products facing the consumer. I know that it does not do justice to compare HW and SW vendorship but, they must give some assurance that this will not be the case since many will rely on them. Many vendors invent a pin-to-pin and backwards compatibility scheme that ensures the future of upcoming applications. It might be possible that Google is going to do this in an SoC level, but not on an ASIC level.

We also hope that the SOC will be available in industrial specs, that is operable within the temperatures of -40°C to 85°C. Consumer applications can readily leverage the high bandwidth connection to the cloud to execute its inference needs. Therefore we think that many applications that will utilize Google Coral will include harsher environmental conditions with high proximity to a reliable connection. On a related note, we would like to investigate whether designs based on Coral can be made fanless within common sense.