(www.creativebloq.com)

1. cloudking ◴[17 Oct 24 14:54 UTC] No.41870249[source]▶

This is the true power of generative AI, enabling new functionality for the user with simple UX while doing all the heavy lifting in the background. Prompting as a UX should be abstracted away from the user.

replies(1): >>41870324 #

2. jprete ◴[17 Oct 24 15:01 UTC] No.41870324[source]▶

>>41870249 (TP) #

This probably isn't backed by an LLM but instead some kind of geometric shape model.

replies(1): >>41870890 #

3. m3kw9 ◴[17 Oct 24 16:01 UTC] No.41870890[source]▶

>>41870324 #

How do you explain a horse 2 legs become 4 legs when rotated assuming they only drew 2 legs on the side view

replies(2): >>41871165 #>>41876393 #

4. atq2119 ◴[17 Oct 24 16:31 UTC] No.41871165{3}[source]▶

>>41870890 #

The second L in LLM stands for "language". Nothing of what you're describing has to do with language modeling.

They could be using transformers, sure. But plenty of transformers-based models are not LLMs.

replies(1): >>41871435 #

5. kubrickslair ◴[17 Oct 24 17:00 UTC] No.41871435{4}[source]▶

>>41871165 #

They are probably looking for LGMs - Large Generative Models which encapsulate vision & multi-modal models.

6. stevenhuang ◴[18 Oct 24 04:40 UTC] No.41876393{3}[source]▶

>>41870890 #

The model need only recognize from the shape that it is a horse, and would know to extrapolate from there. It would presumably have some text encoding as residual from training, but it doesn't need to be fed text from the text encoder side to know that. Think of the CLIP encoder used in stable diffusion.

↑

Adobe's new image rotation tool is one of the most impressive AI tools seen