You’re one of the founding members of the greater Seattle area polycule, aren’t you?
You’re one of the founding members of the greater Seattle area polycule, aren’t you?
A base plate that’s got a spring under it, except for a little nub that pokes the power button.
Terrible if you live in earthquake-prone areas.
Wait. Are we describing a bump stock for your computer?
Shirley you’ve heard of absurdist humor?
You say “Not even close.” in response to the suggestion that Apple’s research can be used to improve benchmarks for AI performance, but then later say the article talks about how we might need different approaches to achieve reasoning.
Now, mind you - achieving reasoning can only happen if the model is accurate and works well. And to have a good model, you must have good benchmarks.
Not to belabor the point, but here’s what the article and study says:
The article talks at length about the reliance on a standardized set of questions - GSM8K, and how the questions themselves may have made their way into the training data. It notes that modifying the questions dynamically leads to decreases in performance of the tested models, even if the complexity of the problem to be solved has not gone up.
The third sentence of the paper (Abstract section) says this “While the performance of LLMs on GSM8K has significantly improved in recent years, it remains unclear whether their mathematical reasoning capabilities have genuinely advanced, raising questions about the reliability of the reported metrics.” The rest of the abstract goes on to discuss (paraphrased in layman’s terms) that LLM’s are ‘studying for the test’ and not generally achieving real reasoning capabilities.
By presenting their methodology - dynamically changing the evaluation criteria to reduce data pollution and require models be capable of eliminating red herrings - the Apple researchers are offering a possible way benchmarking can be improved.
Which is what the person you replied to stated.
The commenter is fairly close, it seems.
Well. That’s it. Get the flamethrowers. Time to burn down the Amazon.
No. Not the one that’s already burning. The other one.
This is morbid but one of my favorite “butterfly” effect news stories in the last year was around the death of Angela Chao after she backed her car into a pond while intoxicated.
Okay, so - here’s the setup:
The Chao family is a very wealthy family. In the 1960’s the family patriarch got into the shipping business and has done very well, garnering money and power. Wealth and power beget wealth and power. Mitch McConnell is even married to one of the daughters - Elaine Chao.
Well, Bush appointed E. Chao to Labor Secretary during his presidency. Mind you, she’s not just Mitch’s wife - she has been in government since the late 80’s. One of the talking points in republican circles during the Bush years was that there was a massive decrease in worker safety complaints. They attributed this to businesses behaving themselves and say that this is evidence that self-regulation can work. What was learned later is that OSHA simply didn’t enforce many regulations or follow up on many complaints, instead choosing to focus on trying to find fraud within unions.
Cut to Trump. He appoints Elaine - still Mitch McConnell’s wife, and daughter of a transportation magnate - to be the Department of Transportation’s Secretary. The ethics concerns notwithstanding, the department hand waved many things through, such as the tesla doors mentioned in the article above, as well as the Tesla Model X’s confusing forward/reverse system, which is cited as being a reason for the death of Angela Chao, her sister.