Altszn.com
  • Home
  • Crypto
    • Altcoins
    • Bitcoin
    • Ethereum
    • Monero
    • XRP
    • Zcash
  • Web3
  • DeFi
  • NFTs
No Result
View All Result
Altszn.com
  • Home
  • Crypto
    • Altcoins
    • Bitcoin
    • Ethereum
    • Monero
    • XRP
    • Zcash
  • Web3
  • DeFi
  • NFTs
No Result
View All Result
Altszn.com
No Result
View All Result

Researchers Replicated OpenAI’s Work Based on Proximal Policy Optimisation (PPO) in RLHF

Altszn.com by Altszn.com
October 27, 2023
in Metaverse, Web3
0
Researchers Replicated OpenAI’s Work Based on Proximal Policy Optimisation (PPO) in RLHF
399
SHARES
2.3k
VIEWS
Share on FacebookShare on Twitter

[ad_1]

by Damir Yalalov

Published: October 27, 2023 at 8:56 am Updated: October 27, 2023 at 8:56 am

by Victor Dey

Edited and fact-checked:
October 27, 2023 12:00 am

Reinforcement Learning from Human Feedback (RLHF) is an integral part of training systems like ChatGPT, and it relies on specialized methods to achieve success. One of these methods, Proximal Policy Optimization (PPO), was initially conceived within the walls of OpenAI in 2017. At first glance, PPO stood out for its promise of simplicity in implementation and a relatively low number of hyperparameters required to fine-tune the model. However, as they say, the devil is in the details.

Researchers Replicated OpenAI’s Work Based on Proximal Policy Optimisation (PPO) in RLHF

Recently, a blog post titled “The 37 Implementation Details of Proximal Policy Optimization” shed light on the intricacies of PPO (prepared for the ICLR conference). The name alone hints at the challenges faced in implementing this supposedly straightforward method. Astonishingly, it took the authors three years to gather all the necessary information and reproduce the results.

Have you struggled to read the tensorflow 1.x code in openai/baselines’ PPO?

Our blog post helps you understand *everything* in it with

1) 🎥 video tutorials
2) 📜 detailed references and explanations
3) ⌨️ really simple code

This work took me 3 years. 2/32 pic.twitter.com/w5jpQZkD6L

— Costa Huang (@vwxyzjn) April 25, 2022

The code in the OpenAI repository underwent significant changes between versions, some aspects were left unexplained, and peculiarities that appeared as bugs somehow produced results. The complexity of PPO becomes evident when you delve into the details, and for those interested in a deep understanding or self-improvement, there’s a highly recommended video summary available.

https://www.youtube.com/watch?v=videoseries

But the story doesn’t end there. The same authors decided to revisit the openai/lm-human-preferences repository from 2019, which played a crucial role in fine-tuning language models based on human preferences, using PPO. This repository marked the early developments on ChatGPT. The recent blog post, “The N Implementation Details of RLHF with PPO,” closely replicates OpenAI’s work but uses PyTorch and modern libraries instead of the outdated TensorFlow. This transition came with its own set of challenges, such as differences in the implementation of the Adam optimizer between frameworks, making it impossible to replicate training without adjustments.

1. (most interesting one) TF and PT have different Adam optimizer impl and they impact performance. In particular, PT’s adam produces more aggressive updates early on in the training. pic.twitter.com/lJ99KTmD8M

— Costa Huang (@vwxyzjn) October 24, 2023

Perhaps the most intriguing aspect of this journey is the quest to run experiments on specific GPU setups to obtain original metrics and learning curves. It’s a journey filled with challenges, from memory constraints on various GPU types to the migration of OpenAI datasets between storage facilities.

In conclusion, the exploration of Proximal Policy Optimization (PPO) in Reinforcement Learning from Human Feedback (RLHF) reveals a fascinating world of complexities.

Disclaimer

Any data, text, or other content on this page is provided as general market information and not as investment advice. Past performance is not necessarily an indicator of future results.

The Trust Project
The Trust Project

The Trust Project is a worldwide group of news organizations working to establish transparency standards.

Damir is the team leader, product manager, and editor at Metaverse Post, covering topics such as AI/ML, AGI, LLMs, Metaverse, and Web3-related fields. His articles attract a massive audience of over a million users every month. He appears to be an expert with 10 years of experience in SEO and digital marketing. Damir has been mentioned in Mashable, Wired, Cointelegraph, The New Yorker, Inside.com, Entrepreneur, BeInCrypto, and other publications. He travels between the UAE, Turkey, Russia, and the CIS as a digital nomad. Damir earned a bachelor’s degree in physics, which he believes has given him the critical thinking skills needed to be successful in the ever-changing landscape of the internet. 

More articles

Damir is the team leader, product manager, and editor at Metaverse Post, covering topics such as AI/ML, AGI, LLMs, Metaverse, and Web3-related fields. His articles attract a massive audience of over a million users every month. He appears to be an expert with 10 years of experience in SEO and digital marketing. Damir has been mentioned in Mashable, Wired, Cointelegraph, The New Yorker, Inside.com, Entrepreneur, BeInCrypto, and other publications. He travels between the UAE, Turkey, Russia, and the CIS as a digital nomad. Damir earned a bachelor’s degree in physics, which he believes has given him the critical thinking skills needed to be successful in the ever-changing landscape of the internet. 



More articles



[ad_2]

Read More: mpost.io

Tags: basedMetaverseOpenAIsOptimisationPolicyPPOProximalreplicatedresearchersRLHFwork
ADVERTISEMENT

Recent

VIRTUAL Rallies Ahead of First Ethereum-Based AI Agent Launch

VIRTUAL Rallies Ahead of First Ethereum-Based AI Agent Launch

June 15, 2025
Is it the future of finance?

Is it the future of finance?

June 15, 2025
U.S. Lawmakers Unveil CLARITY Act Regulating Digital Assets

U.S. Lawmakers Unveil CLARITY Act Regulating Digital Assets

June 10, 2025

Categories

  • Bitcoin (4,091)
  • Blockchain (9,863)
  • Crypto (7,784)
  • Dark Web (292)
  • DeFi (7,666)
  • Ethereum (4,067)
  • Metaverse (5,814)
  • Monero (164)
  • NFT (588)
  • Solana (4,751)
  • Web3 (18,498)
  • Zcash (415)

Category

Select Category

    Advertise

    Advertise your site, company or product to millions of web3, NFT and cryptocurrency enthusiasts. Learn more

    Useful Links

    Advertise
    DMCA
    Contact Us
    Privacy Policy
    Shipping & Returns
    Terms of Use

    Resources

    Exchanges
    Changelly
    Web3 Jobs

    Recent News

    VIRTUAL Rallies Ahead of First Ethereum-Based AI Agent Launch

    VIRTUAL Rallies Ahead of First Ethereum-Based AI Agent Launch

    June 15, 2025
    Is it the future of finance?

    Is it the future of finance?

    June 15, 2025

    © 2022 Altszn.com. All Rights Reserved.

    No Result
    View All Result
    • Home
      • Home – Layout 1
      • Home – Layout 2
      • Home – Layout 3

    © Altszn.com. All Rights Reserved.

    • bitcoinBitcoin (BTC) $ 105,493.00
    • ethereumEthereum (ETH) $ 2,430.06
    • tetherTether (USDT) $ 1.00
    • xrpXRP (XRP) $ 2.20
    • bnbBNB (BNB) $ 641.45
    • solanaSolana (SOL) $ 145.31
    • usd-coinUSDC (USDC) $ 0.999867
    • tronTRON (TRX) $ 0.271966
    • dogecoinDogecoin (DOGE) $ 0.164531
    • staked-etherLido Staked Ether (STETH) $ 2,430.85
    • cardanoCardano (ADA) $ 0.585441
    • wrapped-bitcoinWrapped Bitcoin (WBTC) $ 105,405.00
    • hyperliquidHyperliquid (HYPE) $ 37.23
    • wrapped-stethWrapped stETH (WSTETH) $ 2,932.45
    • suiSui (SUI) $ 2.82
    • bitcoin-cashBitcoin Cash (BCH) $ 461.77
    • chainlinkChainlink (LINK) $ 13.19
    • leo-tokenLEO Token (LEO) $ 9.15
    • stellarStellar (XLM) $ 0.247864
    • avalanche-2Avalanche (AVAX) $ 18.19
    • usdsUSDS (USDS) $ 0.999809
    • the-open-networkToncoin (TON) $ 2.92
    • whitebitWhiteBIT Coin (WBT) $ 48.25
    • shiba-inuShiba Inu (SHIB) $ 0.000012
    • wethWETH (WETH) $ 2,433.92
    • litecoinLitecoin (LTC) $ 84.96
    • hedera-hashgraphHedera (HBAR) $ 0.152220
    • wrapped-eethWrapped eETH (WEETH) $ 2,605.47
    • binance-bridged-usdt-bnb-smart-chainBinance Bridged USDT (BNB Smart Chain) (BSC-USD) $ 0.998395
    • moneroMonero (XMR) $ 317.85
    • ethena-usdeEthena USDe (USDE) $ 1.00
    • polkadotPolkadot (DOT) $ 3.45
    • bitget-tokenBitget Token (BGB) $ 4.23
    • coinbase-wrapped-btcCoinbase Wrapped BTC (CBBTC) $ 105,561.00
    • pepePepe (PEPE) $ 0.000010
    • uniswapUniswap (UNI) $ 6.98
    • pi-networkPi Network (PI) $ 0.535313
    • aaveAave (AAVE) $ 261.23
    • daiDai (DAI) $ 0.999815
    • ethena-staked-usdeEthena Staked USDe (SUSDE) $ 1.18
    • bittensorBittensor (TAO) $ 356.88
    • okbOKB (OKB) $ 52.04
    • blackrock-usd-institutional-digital-liquidity-fundBlackRock USD Institutional Digital Liquidity Fund (BUIDL) $ 1.00
    • aptosAptos (APT) $ 4.34
    • nearNEAR Protocol (NEAR) $ 2.19
    • susdssUSDS (SUSDS) $ 1.06
    • internet-computerInternet Computer (ICP) $ 4.98
    • crypto-com-chainCronos (CRO) $ 0.085343
    • jito-staked-solJito Staked SOL (JITOSOL) $ 176.36
    • ethereum-classicEthereum Classic (ETC) $ 16.42
    • bitcoinBitcoin (BTC) $ 105,493.00
    • ethereumEthereum (ETH) $ 2,430.06
    • tetherTether (USDT) $ 1.00
    • xrpXRP (XRP) $ 2.20
    • bnbBNB (BNB) $ 641.45
    • solanaSolana (SOL) $ 145.31
    • usd-coinUSDC (USDC) $ 0.999867
    • tronTRON (TRX) $ 0.271966
    • dogecoinDogecoin (DOGE) $ 0.164531
    • staked-etherLido Staked Ether (STETH) $ 2,430.85
    • cardanoCardano (ADA) $ 0.585441
    • wrapped-bitcoinWrapped Bitcoin (WBTC) $ 105,405.00
    • hyperliquidHyperliquid (HYPE) $ 37.23
    • wrapped-stethWrapped stETH (WSTETH) $ 2,932.45
    • suiSui (SUI) $ 2.82
    • bitcoin-cashBitcoin Cash (BCH) $ 461.77
    • chainlinkChainlink (LINK) $ 13.19
    • leo-tokenLEO Token (LEO) $ 9.15
    • stellarStellar (XLM) $ 0.247864
    • avalanche-2Avalanche (AVAX) $ 18.19
    • usdsUSDS (USDS) $ 0.999809
    • the-open-networkToncoin (TON) $ 2.92
    • whitebitWhiteBIT Coin (WBT) $ 48.25
    • shiba-inuShiba Inu (SHIB) $ 0.000012
    • wethWETH (WETH) $ 2,433.92
    • litecoinLitecoin (LTC) $ 84.96
    • hedera-hashgraphHedera (HBAR) $ 0.152220
    • wrapped-eethWrapped eETH (WEETH) $ 2,605.47
    • binance-bridged-usdt-bnb-smart-chainBinance Bridged USDT (BNB Smart Chain) (BSC-USD) $ 0.998395
    • moneroMonero (XMR) $ 317.85
    • ethena-usdeEthena USDe (USDE) $ 1.00
    • polkadotPolkadot (DOT) $ 3.45
    • bitget-tokenBitget Token (BGB) $ 4.23
    • coinbase-wrapped-btcCoinbase Wrapped BTC (CBBTC) $ 105,561.00
    • pepePepe (PEPE) $ 0.000010
    • uniswapUniswap (UNI) $ 6.98
    • pi-networkPi Network (PI) $ 0.535313
    • aaveAave (AAVE) $ 261.23
    • daiDai (DAI) $ 0.999815
    • ethena-staked-usdeEthena Staked USDe (SUSDE) $ 1.18
    • bittensorBittensor (TAO) $ 356.88
    • okbOKB (OKB) $ 52.04
    • blackrock-usd-institutional-digital-liquidity-fundBlackRock USD Institutional Digital Liquidity Fund (BUIDL) $ 1.00
    • aptosAptos (APT) $ 4.34
    • nearNEAR Protocol (NEAR) $ 2.19
    • susdssUSDS (SUSDS) $ 1.06
    • internet-computerInternet Computer (ICP) $ 4.98
    • crypto-com-chainCronos (CRO) $ 0.085343
    • jito-staked-solJito Staked SOL (JITOSOL) $ 176.36
    • ethereum-classicEthereum Classic (ETC) $ 16.42