If you understand how AI works, you will quickly grasp the fact that safety controls are completely meaningless.
AI experts have been talking about alignment and other nonsense for years now. It is meaningless. At this stage of the technology, whatever scheme you use to circumscribe or limit the AI will necessarily make it less performing.
I suspect all the experts understand this, well, most of them at least, and that all this discussion about security is mostly to reassure investors and authorities so that they can go on with their work quietly.
The key concept here is emergent properties which means that we really don't know exactly how it works. In practice, you create a transformer, run it a million times with huge database. Tweak it here and there to improve the results... and "things" happen! (Suddenly the computer creates a "mood neuron" for example because understanding your mood, positive or negative increases tremendously the accuracy of the language model!) We are walking ahead in the dark grasping around to try to understand AI. There is simply no control whatsoever!
Artificial intelligence (AI) systems that were trained to be secretly malicious resisted state-of-the-art safety methods designed to “purge” them of dishonesty, a disturbing new study found.
Researchers programmed various large language models (LLMs) — generative AI systems similar to ChatGPT — to behave maliciously. Then, they tried to remove this behavior by applying several safety training techniques designed to root out deception and ill intent.
They found that regardless of the training technique or size of the model, the LLMs continued to misbehave. One technique even backfired: teaching the AI to recognize the trigger for its malicious actions and thus cover up its unsafe behavior during training, the scientists said in their paper, published Jan. 17 to the preprint database arXiv.
No comments:
Post a Comment