Remember the films in which a very fuzzy CCTV image of the villain is automagically transformed by the hero’s tech advisor into a crystal clear image? Well that day is rapidly approaching (may already be here?), as can be seen by reading this pdf.
“We design a simple pipeline that combines the best of both worlds:the first stage uses a convolutional neural network (CNN)to maps the input to a (overly-smoothed) image, and thesecond stage uses a pixel-wise nearest neighbor method
to map the smoothed output to multiple high-quality, high-frequency outputs in a controllable manner.”
Very simplistically the approach used is a pipeline method in which the raw image is first of all broadly categorized e.g. is it a dog’s head or that or a person? Then they map the raw image to a ‘smoothed’ generic images with similar characteristics, e.g. face in three quarter profile, coupled with another one of generic hair styles etc. (bit like the way the police build up a photofit). Then they run it through a neural network which looks at the pixel differences in the raw image (not just colour, light intensity, but also things like the ‘normal map’ output, edges etc). Their system then matches these to elements obtained from their database of test images e.g. a similar nose etc, and cobbles all this together into an output. Each stage of the operation can be ‘hand-tuned’ as it goes along which is a plus for ‘photofit’ cases but could not really be used for CCTV
The output from all these operations is something that has a good resemblance to the actual test object.
While this will obviously be a very useful step up on the old photofit method, I hope that such images are never allowed as evidence in a court of law as they are just a rule-based generation. The kicker being that using a neural network means that the ‘hidden layers’ can never be proved, merely interpreted.