This article is cross-posted from
Plenoxels convert 2D images into navigable, photorealistic 3D worlds in minutes
June 16 | UC Berkeley's College of Engineering
Imagine taking a few photos with your mobile phone and quickly converting them into a 3D scene that you could navigate. This may soon be possible with a new technology developed by UC Berkeley researchers that can reconstruct photorealistic 3D worlds in just minutes — without the aid of artificial intelligence.
This new technology, dubbed Plenoxels, evolved from NeRF, a state-of-the-art 3D rendering technology also developed by UC Berkeley researchers. Yet Plenoxels surpass NeRF in every way — from their speed to their image quality — which may broaden their potential for consumer, industry and scientific applications.
“NeRF is cool, but it takes a whole day to recover a 3D scene,” said Angjoo Kanazawa, professor of electrical engineering and computer sciences, who will present her team’s Plenoxels paper at the IEEE/CVF Computer Vision and Pattern Recognition (CVPR) Conference this month. “Plenoxels, however, make training really fast and practical by getting rid of neural networks.”
Building the next-generation 3D world
NeRF, short for neural radiance fields, revolutionized 3D rendering when it was developed two years ago. It used the power of neural networks — systems of computing nodes that act like neurons in the human brain to recognize patterns in data — to provide a photorealistic experience far superior to other technologies at that time.
Prior to NeRF, 3D rendering technologies would take multiple pictures of a scene and find parts of the images that were the same by matching image features. This matching process had to be done for many parts of the image in order to reconstruct the scene. But these technologies failed to reconstruct parts of the scene without good matches, which could occur when shiny or transparent objects were present.
“NeRF helped resolve this problem. Instead of using traditional methods to match stuff between images, we used neural networks to do these optimizations,” said Matthew Tancik, a Ph.D. student in Kanazawa’s lab and co-author of the original NeRF paper as well as the new Plenoxels study. “NeRF makes it more practical to reconstruct these complicated scenes and allows us to reconstruct a 3D scene that you can explore like a video game.”
With NeRF, the only input required to optimize the 3D representation is a set of images with known camera poses. Using classic volume rendering techniques, NeRF can render photorealistic views of complex scenes.
“NeRF works surprisingly well, but it is quite slow because the neural networks are complex and take a lot of time to optimize,” said Tancik. “This is where follow-up work like Plenoxels come in.”
Taking 3D rendering to the next level with Plenoxels
Before developing Plenoxels, Kanazawa and her team created PlenOctrees. Octrees are data tree structures that divide up 3D spaces and, in this case, possess plenoptic — or color-shifting — qualities.
PlenOctrees used neural networks for training, or inferring the 3D scene, and then converted it to plenoptic octrees for rendering. This resulted in faster computation that enabled real-time renderings.
The researchers then wondered if both steps — training and rendering — could be performed without neural networks. They found this was possible with Plenoxels.
In the hierarchy of computer graphics, Plenoxels are at the apex of dimensionality: Pixels are a 2D picture element; voxels are a 3D volume element; and Plenoxels — plenoptic voxels — are volume elements that change color depending on the angle from which they’re viewed.
A Plenoxel grid is made of tiny blocks, like those used to create a Minecraft world, except Plenoxels offer another level of dimensionality: view-dependent color. If you were to zoom out and look at these blocks all at once, you would see a high-resolution 3D world. But up close, at its core, you would see only little blocks that can change color.
Other members of the Berkeley research team are Alex Yu, Sara Fridovich-Keil, Qinhong Chen and Benjamin Recht, a professor of electrical engineering and computer sciences. In the Plenoxels study, the researchers looked at whether a neural network is necessary for rendering optimization.
“The question was: Could we keep everything that works with NeRF but change that underlying representation of this radiance field?” said Tancik. “Instead of having the radiance field be this black box neural network, we’re going to make that representation a grid of little Plenoxels.”
Initially, after several unsuccessful attempts, there was some doubt about whether Plenoxels would really work, but Yu and Fridovich-Keil pressed on. “We took a break over summer, and then one day they tried using trilinear interpolation,” said Kanazawa. “Suddenly, things started working.”
Trilinear interpolation takes the average of the neighboring blocks, instead of representing a given point in space with one block, or voxel. This smooths out the radiance field, improving the resolution of the resulting 3D rendering without the time lag of neural networks.
“By doing some tweaking, we were able to remove the neural network and really speed up the training procedure,” said Tancik. “I was not expecting these methods to be so quick. Instead of taking a full day, it can now take just a few minutes to create these very photorealistic renderings, which makes them more practical for a range of applications.”
Using Plenoxels in the real world
Plenoxels could potentially be used to create virtual and augmented reality displays. For example, unlike today’s virtual real estate tours, where the viewer is fixed in one place, a Plenoxels-created tour would enhance the experience by allowing viewers to walk around and fully explore the environment from their computer or AR/VR devices.
Consumers could also use Plenoxels to create and share personal memories, giving viewers a more immersive experience. “With this technology you can fully recover the environment you are in, so that you can re-explore it in the future,” said Tancik. “Being able to navigate the environment, or that memory, makes it feel more realistic or more tangible than a photograph or video.”
As imagined by Kanazawa, consumers would not need specialized equipment to capture these memories. “I think that this is going to be the new version of photographs or videos,” said Kanazawa. “What if you could just take those videos and explore and capture your memories in 3D, or even 4D, from your iPhone? In that sense, I think it’s very accessible.”
Because Plenoxels allow us to simulate the world, they also have potential applications within industry. Self-driving car companies could use this technology to simulate what their cars would see if they were traversing through the world. Similarly, robots could employ Plenoxels to extract the 3D geometry of the world to prevent collisions into other objects.
According to Kanazawa, Plenoxels may even be used in scientific research, perhaps in concert with technologies like remote sensing. She envisions ecologists someday using Plenoxels to survey forests in order to analyze the density of trees and the overall health of the ecosystem.
Looking ahead to machine learning
Kanazawa noted that although this study showed that Plenoxels-based technology does not need neural networks to convert photographs into an explorable 3D world, AI might be needed if people wish to use the technology for tasks that require learning. She believes this is the next step for Plenoxels.
“I think the next interesting thing will be to incorporate learning into this process, so that you can do similar things with far fewer pictures, far fewer observations,” said Kanazawa. “We use our previous experience about the world to perceive new images. This is where the real machine learning comes in. And now that we’ve made the 3D rendering process more practical, we can start thinking about it.”
The Plenoxels project is supported in part by the CONIX Research Center, Google, Office of Naval Research, National Science Foundation and the Defense Advanced Research Projects Agency.
Learn more: