Conversation
Can you program GPUs and do you want to become a HERO? #linuxphone
community needs your help.

We are trying record video, and have most pieces working, but one is
missing: fast enough debayering. That means about 23MB/sec on #librem5.

Debayering is not hard; camera images have subpixels split on two
lines, which need to be corrected. They also use different color
representation, but that's fixable by some table lookup and two matrix
multiplies.

Librem 5 has Vivante GPU, 4 in-order CPU cores and 3GB RAM. My feeling
is that it should be fast enough for that. If task is for some reason
impossible, that would be good to know, too.

Image data looks like this

RGRGRG...
xBxBxB...
.........
.........

Task is to turn that into usual rgbrgb.... format. rgb = RGB * color
matrix, with table lookups for better quality. I can fix that once I
get an example.

I'm looking for example code (#pinephone would work, too), reasons it
can not be done... and boosts if you have friends that can program
GPUs. #gpu #opensource
4
35
9

@pavel
I cannot program GPUs and do not desire a mythological protagonistic role :D Take my boost tho

0
0
1

@pavel No, it's

RGRGRG
GBGBGB

You lose meaningful data if you ignore half of green pixels.

I see no reason why it couldn't be done. Just take care not to introduce needless copies in your processing path. dmabufs are your friends.

4
0
1

@pavel Since I assume you're going to want to pass the rendered image into some kind of video encoder, you may want to make sure that you match stride and alignment requirements with your target buffer so etnaviv will be able to perform linear rendering rather than de-tile it afterwards (though IIRC it's currently gated behind ETNA_MESA_DEBUG).

2
0
1

@dos @pavel
adding to that, what data type is the image data (float, int, ???) and what data type is expected to come out?

instead of trying to outsource to the GPU, have you considered SIMD? (I assume librem5 and pinephone support NEON)

if the GPU is better suited, another question is whether there's support for compute shaders on the respective GPUs (what is the supported OpenGL version, assuming there is no Vulkan support on these devices)

2
0
1

@tizilogic @pavel It's either 8-bit int, or 10-bit int stored as 16-bit.

GC7000L supports compute shaders, but etnaviv isn't there yet.

Naive debayering is easy, but for good picture quality you need much more than that.

1
0
1

@pavel do you have a single frame of raw pixel data? What is the target API (OpenGL, -ES, Vulkan)?

2
0
1
@dos Lets keep the example simple :-). Yes, g = G1+G2/2 is superior, and there are advanced debayer algorithms. I know them. Examples are at https://gitlab.com/tui/debayer-gpu/ . There's just one small problem: It takes minute and I need it to take 10 seconds.
0
0
0
@dos That's problem for future Pavel :-). Right now, I'm storing frames on ramdisk, as "RGB3" basically.
0
0
0
@tizilogic @dos I tried simd, https://gitlab.com/tui/tui/-/blob/master/ucam/bayer2rgb.rs?ref_type=heads , it did not have good enough performance. (I could not do 512x384 at 23fps).

GL versions are:
Vendor: etnaviv
Renderer: Vivante GC7000 rev 6214
OpenGL Version: OpenGL ES 2.0 Mesa 21.2.6
GLSL Version: OpenGL ES GLSL ES 1.0.16

Doing input in u8, with output in u8 and internal computation in u16 fixed point should be "good enough". Doing everything in u16 would be even better. Floats are okay, too.
0
0
0
@dos @tizilogic I know. And I have "the rest" prototyped here: https://gitlab.com/tui/debayer-gpu/-/blob/master/isp.frg?ref_type=heads But I feel I need fast-enough naive debayering first, so that I can improve that.
0
0
0
@datenwolf Example of frame is here: https://gitlab.com/tui/tui/-/blob/master/ucam/bayer2rgb.rs?ref_type=heads (I also have frame generator and real frames captured from libobscura).

Anything that works on Librem 5 is fine, bonus points if I can understand it. Robot generated code using -lEGL -lGLESv2 -lm ... and that builds and does something. Librem 5 reports:

Vendor: etnaviv
Renderer: Vivante GC7000 rev 6214
OpenGL Version: OpenGL ES 2.0 Mesa 21.2.6
GLSL Version: OpenGL ES GLSL ES 1.0.16
1
0
0

@pavel It would be great to have some actual frame data from the camera sensor, or some test data, that I can load into a texture and write a shader to do the conversion. With OpenGL-ES (which is what you have) the trick is to load the pixels into a RG texture that is twice as wide and half as high as the original frame, so that "upstairs"/"downstairs" neighbor pixels in consecutive row are of the same primitive color; this avoids issues with arithmetic and texel addressing precision.

1
0
1
@datenwolf Sorry. Example frame is here: https://gitlab.com/tui/debayer-gpu/-/blob/master/test.png?ref_type=heads (You probably want to run pngtopnm it, so that your code only works with uncompressed data).

Alternatively, I started file format for this. https://gitlab.com/tui/tui/-/tree/master/4cc?ref_type=heads dirgen.sh can generate example frames using gstreamer. You get raw data after 128 bytes header.
0
0
0
@dos As for copies... Yes, I'm currently doing more copies than needed. I measured Librem 5 at about 2GB/sec memory bandwidth, and stream is about 30MB/sec. At 1Mpix/24fps resolution, gstreamer should be able to encode it in real time.

Here's huge problem with v4l, which gives uncached memory buffers to userspace. That means one whole CPU core is dedicated to copying that to "normal" memory. If that is ever solved, yes, other optimalizations are possible. Currently, this means it is not even possible to copy anything bigger than 1Mpix out of the v4l.
1
0
0

@pavel

This is a OpenGL ES 2.0 solution.
https://github.com/rasmus25/debayer-rpi

There's also support for a software isp in libcamera. I think I've seen some mentions of GPU backed debayering too.

1
0
1
@robertfoss I know about that one, see gitlab.com:tui/debayer-gpu.git . I could not get that to anywhere near the required performance.
0
0
0
@dos It seems that ignoring half of green pixels is right thing to do at the moment: https://gitlab.com/tui/debayer-gpu/-/tree/master/bwtest?ref_type=heads "Normal" debayer is 40% too slow. (That's better than 5 times too slow, but still not good enough). If you can can get it to 24 loops in second, you'll become a hero :-).
1
0
0

@pavel I'm confused. V4L lets you stream to a CMA dmabuf which should be importable as GL_TEXTURE_EXTERNAL_OES, right? Or am I missing something?

1
0
0
@dos If you have example of that, that would be welcome :-). That's not how megapixels work, at least.
1
0
0

@pavel On 9f076a5, I'm getting 88MB/s with one green channel, 82MB/s with two and 105MB/s with nothing but static gl_FragColor. The three copies it does could be eliminated and I believe texelFetch could make it slightly faster on the GPU side too.

1
0
0

@pavel Megapixels is not an example of how to do things in the most performant way :) OpenGL operates in a VRAM-centric model, it's very copy-heavy. We don't need to copy things around, as our GPUs operate on the exact same memory CPUs do.

See GL_OES_EGL_image_external and https://docs.kernel.org/userspace-api/media/v4l/dmabuf.html

1
0
0
@dos Sorry, hero, that's dark magic behind my understanding. I see the words but don't understand the sentences. :-(

I'd need working example here. I got surprisingly far vibecoding this, but even robots have their limits.
0
0
0
Yep, recent bwtest shows that (extremely simple) debayer is feasible, and possibly more. So far I integrated debayer-gpu + gstreamer, and I'm meeting the deadlines.
0
0
0