Multilingual, laughing, Pitfall-playing and streetwise AI • TechCrunch

Research in machine learning and AI, now a key technology in practically every industry and company, is too large for anyone to read. This column, Perceptronaims to collect some of the most relevant recent discoveries and papers – especially, but not limited to artificial intelligence – and explain why they are important.

Over the past few weeks, researchers at Google have demonstrated an AI system, PaLI, can perform multiple tasks in more than 100 languages. Elsewhere, a group based in Berlin has launched a project called Source + designed as a way to allow artists, including visual artists, musicians, and writers, to opt in – and out – allowing their work to be used as training data for the AI .

AI systems like OpenAI’s GPT-3 can generate fairly sensitive text or summarize existing text from the web, e-books, and other sources of information. But historically they were limited to a single language, limiting both their usefulness and accessibility.

Fortunately, in recent months, research on multilingual systems has accelerated – in part due to community efforts like Hugging Face’s Flower. In an effort to capitalize on these advances in multilingualism, the Google team created PaLI, which is trained in both images and text to perform tasks like image annotation, object detection, and recognition. optical characters.

Google PaLI

Image credits: Google

Google claims that PaLI can understand 109 languages ​​and the relationships between words in those languages ​​and images, allowing it to – for example – annotate the image of a postcard in French. Although the work is still in the solid research stage, the creators say it illustrates an important interaction between language and image – and could set the stage for a commercial product.

Voice is another aspect of language that AI is constantly improving. recently introduced a new text-to-speech model that injects its results with a significant amount of emotion and scope. The clips it posted last week sounds great, although of course they are cherry picking.

We created a clip of our own using the intro of this article, and the results are still solid:

Exactly what type of voice generation will be most useful for is still unclear. We’re not quite at the stage where they do an entire book – or rather, they can, but it’s probably not anyone’s first choice yet. But as the quality increases, the applications multiply.

Mat Dryhurst and Holly Herndon – an academic and musician – have teamed up with Spawning to launch Source+, a standard they hope will draw attention to the problem of image-generating AI systems. created using the artwork of artists who were not informed or asked for permission. Source+, at no cost, is intended to allow disallowed artists to use their work for AI training purposes if they so desire.

Visualization systems like Stable Diffusion and DALL-E 2 have been trained on billions of images scraped from the web to “learn” how to translate text prompts into art. Some of these images come from public art communities like ArtStation and DeviantArt – not necessarily the artist’s knowledge – and permeate the system with the ability to mimic specific creators, consists of artists like Greg Rutowski.

Stability Diffusion Stable AI

Samples from Stable Diffusion.

Because of the system’s knack for mimicking art styles, some creators fear they could threaten livelihoods. Source + – although voluntary – could be a step to give artists a greater say in how their art is used, Dryhurst and Herndon say – assuming it is accepted on a large (if large) scale. ).

At DeepMind, a research team is try to address another long-standing problem of AI: its tendency to spread malicious and misleading information. Focusing on text, the team developed a chatbot called Sparrow that can answer common questions using Google web search. Other advanced systems like Google’s LaMDA could do the same, but DeepMind claims that Sparrow provides logical, non-toxic answers to questions more often than its counterparts.

The trick is to tailor the system to people’s expectations of it. DeepMind recruited Sparrow users and then asked them to provide feedback to train the model on how useful the responses were, showing participants multiple responses to the same question. Ask and ask them which answer they like best. The researchers also defined rules for Sparrow such as “don’t make threatening statements” and “don’t make hateful or offensive comments,” rules they’ve imposed on the system by how to trick it into breaking the rules.

DeepMind’s sparrow example talking.

DeepMind acknowledges that Sparrow has room for improvement. But in one study, the team found that chatbots gave a “reasonable” answer backed by evidence 78% of the time when asked a factual question and only violated the rules. rule said above 8% of the time. That’s better than DeepMind’s original dialogue system, the researchers note, which broke the rules three times more often when tricked.

Recently, a separate team at DeepMind tackled a very different area: video games that historically have been difficult for AI to master quickly. Their system, called PHOTOS ONLYreported to have hit “human-grade” performance on 57 different Atari games 200 times faster than the previous best system.

According to DeepMind’s detailed MEME report, the system can learn to play the game by observing about 390 million frames – “frames” refer to static images that are refreshed very quickly to give an impression of move. It may sound like a lot, but previous modern technology required 80 Billion frame on the same number of Atari games.

DeepMind MEME

Image credits: DeepMind

Playing Atari skillfully may not be a desirable skill. And indeed, some critics Debate games are a flawed AI standard because of their abstraction and relative simplicity. But research labs like DeepMind believe the approaches could be applied to other more useful areas in the future, such as robots that learn to perform tasks more efficiently by watching videos or by themselves. improved, self-driving cars.

Nvidia had a field day on the 20th announcing dozens of products and services, among them some interesting AI endeavors. Self-driving cars are one of the company’s focuses, both powering the AI ​​and training it. For the second, the simulator is very important and it is important that the virtual lines are the same as the real ones. They describe a new, improved content stream helps accelerate the digitalization of data collected by real car cameras and sensors.

A simulation environment built on real world data.

Things like real-world vehicles and road or canopy irregularities can be accurately recreated, so the self-driving AI doesn’t learn in the sanitized street version. And it makes it possible to create larger and more variable simulation settings in general, which aids robustness. (Another image of it at the top.)

Nvidia also showcased its IGX system for autonomous platform in industrial situations – human-machine collaboration like you can find on the factory floor. There’s no shortage of these, of course, but as the complexity of tasks and operating environments increases, the old methods don’t cut it anymore and companies are looking for ways to improve automation. They are considering future resilience.

Computer vision example classifies objects and people on the factory floor.

“Proactive” and “predictive” safety is what IGX aims to help, which means catching safety issues before they cause problems or injuries. A bot might have its own emergency stop mechanism, but if an area surveillance camera could tell it to change direction before a forklift got in the way, things would go a little smoother. Exactly which company or software does this (and on what hardware and how it’s all paid for) is still a work in progress, with Nvidia and startups like Veo Robotics feel their way.

Another exciting step forward has been made on Nvidia’s home turf in terms of gaming. The company’s latest and greatest GPUs are built not only to push triangles and shaders, but also to quickly complete AI-powered tasks like its own DLSS technology to enhance and add Frames.

The problem they are trying to solve is that the gaming engines are so demanding that it produces more than 120 fps (to keep up with the latest monitors) while maintaining visual fidelity. is a Herculean task that even powerful GPUs can hardly do. But DLSS is like a smart frame blender that can increase the resolution of the source frame without aliasing or artifacts, so the game doesn’t have to push too many pixels.

In DLSS 3, Nvidia claims it can generate an entire extra frame at 1:1, so you can display 60 frames natively and the other 60 via AI. I can think of a number of reasons that might make things weird in high-performance gaming environments, but Nvidia is probably well aware of those. Either way, you’ll need to pay around a fortune for the privilege of using the new system, as it will only run on the RTX 40 series cards. But if graphic fidelity is your top priority, have it.

Illustration of a drone under construction in a remote area.

The last thing today is a Drone-based 3D printing technique from Imperial College London can be used for autonomous build processes sometime in the deep future. It’s certainly impractical to create anything larger than a trash can right now, but it’s still early days. In the end, they hope to make it look like the one above and have it look good, but watch the video below to see what to expect.

Source link


News5h: Update the world's latest breaking news online of the day, breaking news, politics, society today, international mainstream news .Updated news 24/7: Entertainment, the World everyday world. Hot news, images, video clips that are updated quickly and reliably

Related Articles

Back to top button