2024년 12월 16일 월요일

[GCP] AWS 와 GCP 비교하기 (1. Resource & Access Management)

Coursera 과정 중 AWS Professional 을 위한 Google Cloud 강의가 있습니다. 
GCP 의 기본 구조에 대해 설명하게 되는데, AWS 에 익숙하신 분들이 처음 GCP 를 접할 때 참조하면 좋을 것 같아서 해당 내용을 정리해둡니다.

1. 리소스 Hierarchy

1.1 AWS 리소스 구조

AWS 의 리소스 구조는 다음과 같이 표현됩니다. 
- Organization root, Organization Unit(Optional), Account (Management Account or Member Account), Resource
- 가장 상단에 위치한 Organization root 는 조직 내에 공유되는 모든 계정들을 포괄합니다.
- Account(계정)은 리소스를 담는 컨테이너입니다. 관리계정, 혹은 멤버계정이 될 수 있고, Management account(관리계정)은 해당 Organization root 에서 발생하는 모든 비용을 책임집니다. 이 외의 모든 다른 계정은 Member account 로 동작합니다.


1.2 GCP 리소스 구조

GCP 의 리소스 구조는 다음과 같습니다.
- Organization, Folder, Project, Resource

- GCP 상에서는 최상위 Organization(조직) 하에 여러개의 폴더와 프로젝트가 생성될 수 있습니다. 실제 사용하는 리소스는 항상 특정한 프로젝트 내에 생성되게 됩니다. 
- 폴더는 부서나, 팀, 혹은 업무 환경 등을 나타낼 수 있고, 계층적으로 하위 폴더를 가질수도 있습니다. 
- 프로젝트는 폴더 내부, 혹은 하위 폴더 내부, 혹은 조직에 붙어있을 수 있습니다.


1.3 정책의 상속

GCP 에서는 상위(Organization)에서 하위(리소스) 방향으로, 부모 리소스의 정책, IAM 설정을 상속합니다. 
GCP 내 환경의 적절한 제어를 위해 "Policy(정책)"을 설정할 수 있는데, 이 정책은 프로젝트, 폴더, 조직(Organization) 수준에서 정의할 수 있습니다. 가령, 조직 레벨에서 Organization node policy 중 일부를 활성화시켜 두었다면, 해당 조직 내 모든 리소스는 활성화된 정책의 영향을 받게됩니다.



1.4 조직 (Organization)

GCP 의 Organization Admin (조직 관리자) 는 조직 내 모든 클라우드 리소스를 제어할 수 있습니다. 
또한 Project creator (프로젝트 생성자) 는 해당 조직 내 프로젝트의 생성을 제어합니다. 

조직의 생성 및 관리

GCP 의 Organization (조직) 리소스는 Google Workspace(GWS) 혹은 Cloud Identity 와 긴밀히 연결됩니다. GWS, Cloud Identity 에 계정이 있는 사용자가 GCP 프로젝트를 생성하면 조직 리소스가 자동으로 프로비저닝됩니다. 이후, GWS, Cloud Identity 의 Super admin 은 GCP 에 생성된 조직과  하부 리소스를 제어할 수 있게됩니다.

- Super Admin : GWS, Cloud Identity 최고 관리자로서, 일부 사용자에게 Organization admin 역할을 부여할수 있고, 조직 리소스의 수명 주기를 제어함.
- Organization Admin : GCP 상에서 IAM 정책을 정의하고, 리소스 계층 구조를 결정, IAM roles 설정을 통해 중요 구성 요소에 대한 책임 위임함. (이 외, 폴더/프로젝트 등의 리소스 생성에 대한 부분은 별도 IAM 권한 설정이 필요함.)

1.5 폴더

GCP 폴더는 AWS 상에서의 Organization Unit 단위가 AWS Account 를 구성하는 방식과 유사하게 동작합니다. 일종의 조직 내 하위 조직을 구성할 수 있는 경계를 설정합니다. 

- 폴더 레벨 별로 리소스에 대한 관리 권한 위임이 가능
- 각 폴더에 하위 폴더, 하나 이상의 프로젝트 포함 가능

1.6 프로젝트

프로젝트는 조직 (혹은 폴더) 아래의 별도의 엔터티로, 리소스를 보관합니다. 리소스는 생성 시, 단 하나의 프로젝트에만 포함됩니다. (프로젝트를 선택하고, 해당 프로젝트 내 리소스를 생성하는 형식)
프로젝트 별로 사용된 리소스 양에 따라 별도로 청구가 되며, 관리됩니다. 

(참고사항) 프로젝트 생성 시 자동으로 붙게되는 id, number 는 고유하며, 프로젝트명 자체는 변경 가능합니다.

1.7 Resource Manager

AWS 상에서는 리소스 계층 관리를 위해 AWS Organizations, AWS Control Tower, AWS Resource Access manager 등의 도구모음들을 제공합니다. 
- AWS Organizations : 관리자가 조직 단위 생성, 업데이트, 삭제, 계정에 대한 가드레일 역할하는 서비스 제어 정책(SCP) 적용
- AWS Control Tower : 관리자가 계정 프로비저닝을 중심으로 가드레일과 자동화 설정, 계정이 조직 모범 사례 따르고 있는지 확인할 수 있는 기능도 제공
- AWS Resource access manager : 조직 내 계정(Account) 간 리소스 공유를 위한 도구로서 제공


Q. AWS 리소스 계층 구조 상에서 "리소스"는 최초 생성된 상위 Account 에서 다른 Account 로 이동할 수 있는가? 
==> A. 리소스는 생성 시 특정 계정에 귀속되며, 소유권은 변경이 불가능함. 단, 다른 계정에서 해당 리소스를 사용할 수 있도록 공유하거나, 리소스를 복사하거나, IAM 역할을 사용하여 다른 계정의 리소스에 접근하도록 할 수 있음. 
==> GCP 에서는? 리소스는 프로젝트에 귀속되며, 마찬가지로 소유권의 변경이 불가능함. 리소스의 복사는 지원하지 않지만, 타 프로젝트에서 리소스에 접근하고자 할때 프로젝트 간 VPC 연결(ex. VPC Sharing/Peering)을 통해 구성이 가능함. IAM 의 역할 설정을 통해, 타 프로젝트에 속한 서비스 계정에서 리소스에 접근하도록 설정할 수 있음. (* 서비스 계정에 대한 부분은 뒤에서 설명) 

GCP 상에서 Resource Manager 는 계정과 관련된 모든 프로젝트의 목록을 수집, 새 프로젝트 생성, 기존 프로젝트 업데이트, 삭제 등이 가능한 API 로 제공됩니다. (RPC API, REST API) 

* AWS 의 기본 제공 Policy, Custom Policy vs GCP 의 기본 제공 roles, Custom roles
  AWS 상에서 권한, 해당 권한 적용되는 리소스의 컬렉션 = Policy 
  AWS 는 서비스 제어 정책(SCP) 제한되지 않는 한, Account root 가 역할, 정책을 완전히 관리함. 이러한 Policy 는 사용자, 사용자 그룹, 역할에 첨부 가능

  GCP 상에서는 미리 정의된 역할 세트(권한의 모음)를 roles 로 제공함. 해당 역할이 적용될 수 있는 "위치"를 정의하게 됨. 

* GCP 권고사항
- 조직 구조에 계층구조를 반영하되, 스타트업의 경우 평평한 리소스 계층 구조로 시작 후 이후에 조직 리소스 확보도 가능
- 프로젝트는 동일한 신뢰 경계를 공유하는 리소스를 그룹화
- 개별 사용자 대신 Google Group 에 역할을 부여, 이후 멤버 관리
- 최소 권한 보안 원칙으로 IAM 역할 부여 (이미 정의된 roles 외, 사용자 지정 roles 를 생성 가능)

1.8 리소스 계층 구조 및 청구 차이점

GCP 는 프로젝트 별로 그룹화되어 비용이 청구되고, 여러개의 billing account(청구 계정)을 활용할 수 있습니다. 이에 비하여, AWS 는 Account(계정) 별 청구를 허용합니다. 

- Billing account : AWS 계정 당 billing account 하나씩 존재하며, 통합 청구 기능을 사용할때 조직 당 billing account 가 하나 존재합니다. 이에 비하여, GCP 는 조직 내에서 여러개의 billing account 를 생성할 수 있고, 하나의 billing account 를 동시에 여러 프로젝트에 매핑하여 비용을 청구할 수 있습니다.
 
- Policy : AWS 의 Policy 는 IAM principal 에만 적용할 수 있고, 상속할 수 없습니다. GCP 에서는 다양한 Policies 들(ex. Organization Policies, Roles)을 조직, 폴더, 프로젝트, 계정 수준에서 적용할 수 있습니다. 

- Admin : AWS 는 계정 관리를 위한 Root User 가 필요합니다. GCP 는 Gmail 사용자, GWS Super admin 을 통해 조직 리소스와 계정에 대한 관리가 가능하며, 조직 관리자(Organization Admin) 권한을 특정 사용자에게 부여하여 계정 및 조직 리소스 관리(Super Admin)GCP 상의 조직 내 리소스 관리자(Organization Admin)를 분리할 수 있습니다.

2. IAM 및 접근 제어

2.1 Cloud ID 관리

GCP 에서 사용자 ID 는 GCP 외부에서 관리(GWS, Gmail)하며, Cloud Identity 라는 도구를 사용해서 조직에서 Google admin console 사용하여 정책 정의 및 사용자, 그룹 관리 수행이 가능함. 기존에 AD 를 사용하는 경우 사용자 페더레이션 접근 관리가 가능함. 

AWS Directory Service 통해 기존 Active Directory(AD) 솔루션과 통합 가능함.

** 기존 사용하던 LDAP 솔루션과의 통합?  (참고 : GCP 의 인증 및 인가)
GCP 에서는 Cloud Identity, Google Workspace 를 통한 계정 관리 뿐만 아니라, 기존에 사용하던 LDAP 솔루션과의 통합도 제공합니다. Active Directory, Microsoft Entra ID(Azure AD), 그 외 기타 ID 공급 업체(ex. Okta)를 사용하는 경우, 기존 환경ID 를 페더레이션 합니다. 
  - 방법 1. Cloud Identity 의 Google Cloud Directory Sync(GCDS) 를 활용해서 기존 IdP의 사용자 ID 를 Google Cloud ID 로 동기화하여 사용
  - 방법 2. Workforce Identity Federation : 기존 IdP 사용자 ID 를 Cloud ID 로 동기화하지 않고, 속성 기반 단일 로그인 방식을 지원


2.2. IAM 원칙 및 정책

GCP 에서는 IAM 을 활용해서, "누가" "어떤 리소스에서" "무엇을 할수있는지" 제어합니다. 
- "누가" = Principals 가 대상이 됩니다. (개별 구글 계정, 그룹, 서비스 계정, Cloud ID 혹은 GWS 계정) 
- "어떤 리소스에서" = 스토리지, Compute engine, 빅쿼리 등의 클라우드 리소스
- "무엇을 할수 있는지"  = Admin, Viewer, Editor, Creator, User 등 미리 정의된 Permissions 들의 묶음인 Roles 을 선택하여 부여합니다. 


IAM 의 정책은,
  - Roles 에 주체 목록(Principals)을 바인딩하여 목록으로 구성됩니다.
  - 리소스에 대해 누가, 어떤 접근 권한을 가지는지 정의할 때 허용하는 정책을 만들어 리소스에 바인딩합니다.
  - IAM 정책은 모든 리소스 수준에서 설정되며(조직, 폴더, 프로젝트), 부모 노드의 정책을 상속합니다.
     조직노드 정책 --> 폴더 정책 --> 프로젝트 정책 --> 리소스 로 상속됩니다. 만약 상위 노드(ex. 조직)에서 특정 주체에 대해 리소스의 editor 권한을 설정했는데, 하위 노드(ex. 프로젝트)에서 동일 주체에 대해 리소스의 viewer 권한을 설정하더라ㄷ, 상위 노드의 권한을 상속받아 editor 권한을 갖게 됩니다.

  - 프로젝트를 폴더 A 에서 폴더 B 로 이동할 경우, 폴더 수준에 설정된 권한을 상속받게 됩니다. (즉, 권한이 폴더 B 에 설정된 것으로 적용됨.)

2.3 IAM 역할(Roles) 및 조건

GCP IAM 에는 기본 역할, 사전 정의된 역할, 사용자 지정 역할이 있습니다.

AWS IAM 은 세가지 유형의 ID 기반 정책을 제공합니다.
 - AWS 관리, 고객 관리, 인라인
 - 정책은 식별된 리소스에 대해 허용되는 작업의 집합입니다. 정책은 IAM ID 에 연결되며, 조직 관리자는 정책을 사용해서 IAM ID 의 권한을 제한합니다. 이 부분은 상속 가능한 서비스 제어 정책(SCP)을 적용하여 해당 그룹의 ID, 리소스에 적용할 수 있습니다. (!= IAM 정책과 다름)
    - IAM 사용자는 프로그래밍 방식, 콘솔 액세스가 부여됨 != 정책의 일부가 아님
 - AWS IAM 정책은 ID 와 권한을 관리하는 AWS IAM 을 통해 관리되며, 1) RBAC 을 통해 특정 리소스에 대한 접근을 허용하거나, 2) ABAC 을 통해 ID, 리소스의 속성 조건에 따라 특정 AWS 리소스에 대한 접근을 허용함.

IAM 조건(Condition) 은, 
 - 클라우드 리소스에 대한 조건부 ABAC 적용. 즉, 구성된 조건이 충족할 때만 ID(멤버)에 리소스 접근 권한 부여함
 - GCP 에서 IAM 조건 지정은, IAM 정책의 role 바인딩 시 지정됨. 조건이 존재하는 경우, 조건 표현식이 true 일때만 접근 요청 허용됨.  (참고 블로그 : GCP IAM Condition)

2.4 플랫폼 간 주요 IAM 개념 차이점


2.5 IAM 모범 사례 및 시나리오

- 최소 권한의 원칙
- 개인이 아닌 그룹에 역할을 부여
- 서비스 계정에 역할 부여 시(serviceAccountUser) 주의가 필요합니다. 서비스 계정의 모든 리소스에 대한 접근을 제공하게 됩니다. (serviceAccount.keys.list() method 로 키 감사)

- Identity-Aware Proxy(IAP) 를 사용
  IAP 을 사용하면, VPN 없이 사용 중인 제품에서 구현한 세분화된 접근 제어가 적용됩니다. IAP 는 네트워크 수준 방화벽에 의존하는 대신 애플리케이션 수준의 접근 제어 모델을 활용합니다. (HTTPS 로 접근하는 어플리케이션에 대한 중앙 권한 부여 계층을 설정) IAP 로 보호되는 애플리케이션과 리소스는 올바른 IAM roles 이 있는 사용자, 그룹만 프록시를 통해 접근 가능합니다. 

3. Service Account

3.1 서비스간 인증의 차이점


AWS 인스턴스 프로필
 - 어플리케이션은 IAM 역할과 인스턴스 프로필을 사용하여 어플리케이션의 권한을 관리. 즉, 적절한 권한이 있는 역할이 서비스와 연결된 인스턴스 프로필에 생성되고 구성됨. 
 - 인스턴스 프로필은 AWS EC2 컨테이너에서 실행되는 어플리케이션에 연결 가능한 IAM 역할의 컨테이너임. 명명된 역할에서 부여한 권한을 제공.  즉, 인스턴스 프로필에서 자격 증명 관리가 없어도 지정된 리소스에 접근하는데 사용할 수 있는 지정된 역할이 포함되어 있음. (임시 자격 증명 및 해당 자격 증명의 순환을 처리)

GCP 서비스 계정
 - Google Cloud IAM 서비스 계정은 Google Cloud 인스턴스에 할당되고, GCP 서비스, 리소스, 애플리케이션 코드에 접근하는데 사용됨
 - 서비스 계정은 이메일 주소로 명명되지만, 비밀번호 대신 암호화 키를 사용하여 리소스에 접근함

3.2 Google Cloud 의 서비스 계정

세가지 유형의 서비스 계정이 존재함.

- 사용자가 임의로 생성하는 서비스 계정
- 기본적으로 제공되는 서비스 계정 : Compute engine 같이 일부 GCP 서비스는 다른 GCP 리소스에 접근하는 작업을 배포할 수 있도록 하는 "기본 서비스 계정"을 만들게 됨. 기본 서비스 계정은 프로젝트에 대한 editor 권한 가짐. 
- 구글이 관리하는 서비스 계정 : GCP 서비스에 대해 구글 자체적으로 서비스 계정을 만들고 사용자를 대신하여 동작시킬 때 사용. 예를 들어, Cloud Run 사용 시, 해당 서비스는 컨테이너를 트리거 할 수 있는 모든 Pub/Sub Topic 에 접근할 수 있도록 하는 권한을 구글 서비스 계정으로 동작시킴. ==> 구글 관리 서비스 계정의 "서비스 계정" 페이지 목록에 보이지 않으며, 동작은 감사 로그에서 확인 가능.

서비스 계정 키
  - 서비스 계정은 이메일의 형태를 띄지만, 비밀번호 대신 키를 이용함
  - 구글은 서비스 계정 키를 자동으로 관리함. 단, GCP 외부에서 서비스 계정을 사용하거나, 구글에서 자체적으로 운영하는 키 로테이션 외 다른 로테이션 기간으로 관리하려는 경우 수동으로 키를 생성/관리 가능함
  - 각 서비스 계정은 공개/비공개 RSA 키 쌍과 연결
  - Service Account Credentials API 는 내부 키 쌍을 사용하여 단기 서비스 계정 자격 증명 생성, blob 과 JSON 웹 토큰(JWT)에 서명함 = 구글 관리 키 쌍
  - 이 외, 사용자는 여러개의 공개/비공개 RSA 키 쌍을 만들고 개인 키를 사용하여 구글 API 로 인증할 수 있음 = 사용자 관리 키 쌍. 공개 키 부분은 구글에 저장하지만, 개인 키는 고객이 보안 책임을 가지고 유지함.



4. Cloud Shell





2020년 6월 15일 월요일

RHEL 7.6, WML-CE 1.6.2 설치하기 (ppc64le, AC922 서버)

IBM 에서는 Watson Machine Learning Community Edition(WML-CE, 구 PowerAI)를 무료로 사용할 수 있게끔 배포하고 있습니다. 구 버전의 PowerAI를 설치하려면 rpm, deb 패키지를 다운로드 받아야 했으나, WML-CE로 변경 후에는 Anaconda installer를 다운로드 받아서 손쉽게 설치할 수 있도록 변경되었습니다.

WML-CE의 설치 과정은 아래 링크에서 소개하고 있습니다.
https://www.ibm.com/support/knowledgecenter/en/SS5SF7_1.6.2/navigation/wmlce_systemsetup.html

아래에서는 RHEL 7.6 버전의 OS 환경(ppc64le, AC922, POWER9)에서 WML-CE 1.6.2 를 설치하는 과정을 소개합니다.

[root@p1311-met1 ~]# yum -y install wget nano bzip2
[root@p1311-met1 ~]# wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm

[root@p1311-met1 ~]# rpm -ihv epel-release-latest-7.noarch.rpm
[root@p1311-met1 ~]# yum install kernel-devel-$(uname -r) kernel-headers-$(uname -r)
[root@p1311-met1 ~]# yum update kernel-tools kernel-tools-libs kernel-bootwrapper

[root@p1311-met1 ~]# cp /lib/udev/rules.d/40-redhat.rules /etc/udev/rules.d/
[root@p1311-met1 ~]# vi /etc/udev/rules.d/40-redhat.rules
# Memory hotadd request
#SUBSYSTEM!="memory", ACTION!="add", GOTO="memory_hotplug_end"
#PROGRAM="/bin/uname -p", RESULT=="s390*", GOTO="memory_hotplug_end"

#ENV{.state}="online"
#PROGRAM="/bin/systemd-detect-virt", RESULT=="none", ENV{.state}="online_movable"
#ATTR{state}=="offline", ATTR{state}="$env{.state}"

#LABEL="memory_hotplug_end"

이전에 설치된 nvidia driver 가 있는지 확인합니다. 만약 목록에 나오면, 삭제합니다.
[root@p1311-met1 ~]# rpm -qa | egrep 'cuda.*(9-2|10-0)'
[root@p1311-met1 ~]# rpm -qa | egrep '(cuda|nvidia).*(396|410)\.'
[root@p1311-met1 ~]# rpm -qa | egrep '(cuda|nvidia).*repo'

NVIDIA 공식 사이트에서 gpu driver 를 다운로드 받고, 서버에 올려둡니다. (https://www.nvidia.com/Download/index.aspx)

저는 ppc64le용 418.216.02 driver 버전으로 다운로드 받았습니다. chmod 명령어로 파일에 실행권한을 주고, .run 파일을 실행시킵니다.
만약 시스템에 cc, gcc 가 설치되어 있지 않거나, PATH에서 찾을 수 없으면 정상적으로 설치가 되지 않습니다. (yum install gcc)

[root@p1311-met1 ~]# ls -al
total 37304
dr-xr-x---  6 root root     4096 Jun 15 00:12 .
dr-xr-xr-x 21 root root     4096 Jun 12 03:16 ..
-rw-r--r--  1 root root       18 Dec 28  2013 .bash_logout
-rw-r--r--  1 root root      176 Dec 28  2013 .bash_profile
-rw-r--r--  1 root root      176 Dec 28  2013 .bashrc
-rw-r--r--  1 root root      100 Dec 28  2013 .cshrc
-rw-r--r--  1 root root    15264 Sep 18  2019 epel-release-latest-7.noarch.rpm
-rw-rw-r--  1 root root 38137801 Jun 15 00:10 NVIDIA-Linux-ppc64le-418.126.02.run
drwxr-----  3 root root     4096 Jun 14 23:59 .pki
drwxr-xr-x  2 root root     4096 Feb 25  2019 .rpmdb
drwx------  2 root root     4096 Jun 12 03:16 .ssh
drwxr-xr-x  2 root root     4096 Jun 14 03:36 support-scripts
-rw-r--r--  1 root root      129 Dec 28  2013 .tcshrc
[root@p1311-met1 ~]# chmod +x NVIDIA-Linux-ppc64le-418.126.02.run
[root@p1311-met1 ~]# ./NVIDIA-Linux-ppc64le-418.126.02.run
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-ppc64le 418.126.02.....................................................................................................................................................................................................................................................................................................................................................................................................................................................................................




nvidia driver 설치가 완료되면, nvidia-smi 명령어로 gpu 의 상태를 확인할 수 있습니다.

[root@p1311-met1 ~]# nvidia-smi
Mon Jun 15 00:23:02 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.126.02   Driver Version: 418.126.02   CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000004:04:00.0 Off |                    0 |
| N/A   32C    P0    54W / 300W |      0MiB / 32480MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  Off  | 00000004:05:00.0 Off |                    0 |
| N/A   35C    P0    55W / 300W |      0MiB / 32480MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2...  Off  | 00000035:03:00.0 Off |                    0 |
| N/A   27C    P0    52W / 300W |      0MiB / 32480MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-SXM2...  Off  | 00000035:04:00.0 Off |                    0 |
| N/A   29C    P0    54W / 300W |      0MiB / 32480MiB |      3%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

시스템을 재시작합니다.
[root@p1311-met1 ~]# shutdown -r now

이후 WML-CE 를 설치하기 전에, 필요 시 Mellanox driver를 우선 설치합니다. Infiniband 등의 구성이 필요한 경우에 필요하며, Infiniband 구성이 없는 경우 아래 단계는 skip 하셔도 됩니다.

다운로드 사이트에 가면, Current versions (가장 최신) 및 Archive Versions 각 탭에서 원하는 OS distribution에 맞추어 상세 버전, 아키텍처에 적합한 드라이버를 다운로드 받을 수 있습니다.

[root@p1311-met1 ~]# ls -al
total 579612
dr-xr-x---  6 root root      4096 Jun 15 00:40 .
dr-xr-xr-x 21 root root      4096 Jun 15 00:28 ..
-rw-------  1 root root      1636 Jun 15 00:25 .bash_history
-rw-r--r--  1 root root        18 Dec 28  2013 .bash_logout
-rw-r--r--  1 root root       176 Dec 28  2013 .bash_profile
-rw-r--r--  1 root root       176 Dec 28  2013 .bashrc
-rw-r--r--  1 root root       100 Dec 28  2013 .cshrc
-rw-r--r--  1 root root     15264 Sep 18  2019 epel-release-latest-7.noarch.rpm
-rw-r--r--  1 root root 236506898 Jun 15 00:41 MLNX_OFED_LINUX-4.5-1.0.1.0-rhel7.6alternate-ppc64le.tgz
-rw-r--r--  1 root root 318803553 Jun 15 00:41 MLNX_OFED_LINUX-5.0-2.1.8.0-rhel7.6alternate-ppc64le.tgz
-rwxrwxr-x  1 root root  38137801 Jun 15 00:10 NVIDIA-Linux-ppc64le-418.126.02.run
drwxr-----  3 root root      4096 Jun 14 23:59 .pki
drwxr-xr-x  2 root root      4096 Feb 25  2019 .rpmdb
drwx------  2 root root      4096 Jun 12 03:16 .ssh
drwxr-xr-x  2 root root      4096 Jun 15 00:29 support-scripts
-rw-r--r--  1 root root       129 Dec 28  2013 .tcshrc

Mellanox driver 설치 전에, 아래 패키지를 설치해줍니다.
[root@p1311-met1 ~]# yum install lsof tcl gcc-gfortran tcsh tk

[root@p1311-met1 ~]# tar -xf MLNX_OFED_LINUX-4.5-1.0.1.0-rhel7.6alternate-ppc64le.tgz

[root@p1311-met1 ~]# cd MLNX_OFED_LINUX-4.5-1.0.1.0-rhel7.6alternate-ppc64le
[root@p1311-met1 MLNX_OFED_LINUX-4.5-1.0.1.0-rhel7.6alternate-ppc64le]# ls
common_installers.pl            distro            LICENSE                     RPM-GPG-KEY-Mellanox  src
common.pl                       docs              mlnx_add_kernel_support.sh  RPMS                  uninstall.sh
create_mlnx_ofed_installers.pl  is_kmp_compat.sh  mlnxofedinstall             RPMS_UPSTREAM_LIBS

[root@p1311-met1 MLNX_OFED_LINUX-4.5-1.0.1.0-rhel7.6alternate-ppc64le]# ./mlnxofedinstall
Detected rhel7u6alternate ppc64le. Disabling installing 32bit rpms...
Logs dir: /tmp/MLNX_OFED_LINUX.13901.logs
General log file: /tmp/MLNX_OFED_LINUX.13901.logs/general.log
Verifying KMP rpms compatibility with target kernel...
This program will install the MLNX_OFED_LINUX package on your machine.
Note that all other Mellanox, OEM, OFED, RDMA or Distribution IB packages will be removed.
Those packages are removed due to conflicts with MLNX_OFED_LINUX, do not reinstall them.

Do you want to continue?[y/N]:y

----------------------------------------

WML-CE 1.6.2 버전을 설치하기 앞서, Anaconda installer 2019.07 버전을 다운로드합니다. 

[root@p1311-met1 ~]# wget https://repo.anaconda.com/archive/Anaconda3-2019.07-Linux-ppc64le.sh

[root@p1311-met1 ~]# chmod +x Anaconda3-2019.07-Linux-ppc64le.sh
[root@p1311-met1 ~]# ./Anaconda3-2019.07-Linux-ppc64le.sh
Welcome to Anaconda3 2019.07

In order to continue the installation process, please review the license
agreement.
Please, press ENTER to continue
>>> ENTER키 입력

(생략)
Do you accept the license terms? [yes|no]
[no] >>> yes

Anaconda3 will now be installed into this location:
/root/anaconda3

  - Press ENTER to confirm the location
  - Press CTRL-C to abort the installation
  - Or specify a different location below

[/root/anaconda3] >>> /opt/anaconda3

(생략)

Do you wish the installer to initialize Anaconda3
by running conda init? [yes|no]
[no] >>> yes
no change     /opt/anaconda3/condabin/conda
no change     /opt/anaconda3/bin/conda
no change     /opt/anaconda3/bin/conda-env
no change     /opt/anaconda3/bin/activate
no change     /opt/anaconda3/bin/deactivate
no change     /opt/anaconda3/etc/profile.d/conda.sh
no change     /opt/anaconda3/etc/fish/conf.d/conda.fish
no change     /opt/anaconda3/shell/condabin/Conda.psm1
no change     /opt/anaconda3/shell/condabin/conda-hook.ps1
no change     /opt/anaconda3/lib/python3.7/site-packages/xontrib/conda.xsh
no change     /opt/anaconda3/etc/profile.d/conda.csh
modified      /root/.bashrc

==> For changes to take effect, close and re-open your current shell. <==

If you'd prefer that conda's base environment not be activated on startup,
   set the auto_activate_base parameter to false:

conda config --set auto_activate_base false

Thank you for installing Anaconda3!

[root@p1311-met1 ~]# export PATH=/opt/anaconda3/bin:$PATH


/opt/anaconda3 경로에 Anaconda를 설치완료했습니다. WML-CE를 설치하기 위한 ibm repository channel을 등록합니다.

[root@p1311-met1 ~]# conda config --prepend channels \
> https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/

[root@p1311-met1 ~]# conda create --name wmlce_env python=3.6
Proceed ([y]/n)? y

Downloading and Extracting Packages
libffi-3.3           | 56 KB     | ############################################################################################################### | 100%
ncurses-6.2          | 853 KB    | ############################################################################################################### | 100%
readline-8.0         | 478 KB    | ############################################################################################################### | 100%
sqlite-3.31.1        | 2.4 MB    | ############################################################################################################### | 100%
xz-5.2.5             | 495 KB    | ############################################################################################################### | 100%
ca-certificates-2020 | 125 KB    | ############################################################################################################### | 100%
ld_impl_linux-ppc64l | 797 KB    | ############################################################################################################### | 100%
libgcc-ng-8.2.0      | 4.8 MB    | ############################################################################################################### | 100%
certifi-2020.4.5.1   | 155 KB    | ############################################################################################################### | 100%
wheel-0.34.2         | 51 KB     | ############################################################################################################### | 100%
openssl-1.1.1g       | 2.5 MB    | ############################################################################################################### | 100%
pip-20.0.2           | 1.7 MB    | ############################################################################################################### | 100%
python-3.6.10        | 29.9 MB   | ############################################################################################################### | 100%
setuptools-47.1.1    | 514 KB    | ############################################################################################################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate wmlce_env
#
# To deactivate an active environment, use
#
#     $ conda deactivate

[root@p1311-met1 ~]# conda init bash
no change     /opt/anaconda3/condabin/conda
no change     /opt/anaconda3/bin/conda
no change     /opt/anaconda3/bin/conda-env
no change     /opt/anaconda3/bin/activate
no change     /opt/anaconda3/bin/deactivate
no change     /opt/anaconda3/etc/profile.d/conda.sh
no change     /opt/anaconda3/etc/fish/conf.d/conda.fish
no change     /opt/anaconda3/shell/condabin/Conda.psm1
no change     /opt/anaconda3/shell/condabin/conda-hook.ps1
no change     /opt/anaconda3/lib/python3.7/site-packages/xontrib/conda.xsh
no change     /opt/anaconda3/etc/profile.d/conda.csh
no change     /root/.bashrc
No action taken.

세션에 재접속 후, conda activate 를 수행합니다.
(base) [root@p1311-met1 ~]# conda activate wmlce_env
(wmlce_env) [root@p1311-met1 ~]#
(wmlce_env) [root@p1311-met1 ~]# export IBM_POWERAI_LICENSE_ACCEPT=yes
(wmlce_env) [root@p1311-met1 ~]# conda search powerai
Loading channels: done
# Name                       Version           Build  Channel
powerai                        1.6.0     424.9f195f4  ibmdl/export/pub/software/server/ibm-ai/conda
powerai                        1.6.1    511.g0f4acb3  ibmdl/export/pub/software/server/ibm-ai/conda
powerai                        1.6.2    615.g1dade79  ibmdl/export/pub/software/server/ibm-ai/conda
powerai                        1.7.0    679.g5b5a006  ibmdl/export/pub/software/server/ibm-ai/conda

(wmlce_env) [root@p1311-met1 ~]# conda install powerai==1.6.2
Proceed ([y]/n)? y

Downloading and Extracting Packages
dask-2.3.0           | 12 KB     | ############################################################################################################### | 100%
sortedcontainers-2.1 | 45 KB     | ############################################################################################################### | 100%
keras-applications-1 | 32 KB     | ############################################################################################################### | 100%
powerai-license-1.6. | 6.4 MB    | ############################################################################################################### | 100%
libvpx-1.7.0         | 2.6 MB    | ############################################################################################################### | 100%
opencv-3.4.7         | 4 KB      | ############################################################################################################### | 100%
gnutls-3.6.5         | 2.1 MB    | ############################################################################################################### | 100%
tensorflow-1.15.2    | 4 KB      | ############################################################################################################### | 100%
libopencv-3.4.7      | 17.1 MB   | ############################################################################################################### | 100%
gast-0.2.2           | 138 KB    | ############################################################################################################### | 100%
cudnn-7.6.3_10.1     | 472.5 MB  | ############################################################################################################### | 100%
fsspec-0.7.4         | 63 KB     | ############################################################################################################### | 100%
torchvision-base-0.4 | 3.0 MB    | ############################################################################################################### | 100%
termcolor-1.1.0      | 7 KB      | ############################################################################################################### | 100%
lame-3.100           | 384 KB    | ############################################################################################################### | 100%
typing-3.6.4         | 44 KB     | ############################################################################################################### | 100%
libxgboost-base-0.90 | 168.4 MB  | ############################################################################################################### | 100%
more-itertools-8.3.0 | 43 KB     | ############################################################################################################### | 100%
nomkl-3.0            | 6 KB      | ############################################################################################################### | 100%
pciutils-3.6.2       | 324 KB    | ############################################################################################################### | 100%
python-lmdb-0.94     | 142 KB    | ############################################################################################################### | 100%
_pytorch_select-2.0  | 3 KB      | ############################################################################################################### | 100%
powerai-1.6.2        | 3 KB      | ############################################################################################################### | 100%
tblib-1.6.0          | 16 KB     | ############################################################################################################### | 100%
powerai-tools-1.6.2  | 7 KB      | ############################################################################################################### | 100%
pytorch-base-1.2.0   | 517.1 MB  | ############################################################################################################### | 100%
psutil-5.5.0         | 317 KB    | ############################################################################################################### | 100%
heapdict-1.0.1       | 9 KB      | ############################################################################################################### | 100%
libopenblas-0.3.6    | 5.0 MB    | ############################################################################################################### | 100%
boost-1.67.0         | 11 KB     | ############################################################################################################### | 100%
snapml-spark-1.4.0   | 1.6 MB    | ############################################################################################################### | 100%
py-xgboost-gpu-0.90  | 3 KB      | ############################################################################################################### | 100%
future-0.17.1        | 698 KB    | ############################################################################################################### | 100%
libtiff-4.1.0        | 492 KB    | ############################################################################################################### | 100%
nettle-3.4.1         | 4.3 MB    | ############################################################################################################### | 100%
openh264-2.1.0       | 1.9 MB    | ############################################################################################################### | 100%
dask-cuda-0.9.1      | 25 KB     | ############################################################################################################### | 100%
_tflow_select-2.1.0  | 3 KB      | ############################################################################################################### | 100%
cryptography-2.9.2   | 540 KB    | ############################################################################################################### | 100%
scipy-1.3.1          | 14.1 MB   | ############################################################################################################### | 100%
ddl-1.5.0            | 763 KB    | ############################################################################################################### | 100%
chardet-3.0.4        | 197 KB    | ############################################################################################################### | 100%
wrapt-1.11.2         | 48 KB     | ############################################################################################################### | 100%
tensorflow-gpu-1.15. | 3 KB      | ############################################################################################################### | 100%
grpcio-1.16.1        | 1.1 MB    | ############################################################################################################### | 100%
glog-0.3.5           | 161 KB    | ############################################################################################################### | 100%
markdown-3.1.1       | 113 KB    | ############################################################################################################### | 100%
pai4sk-1.5.0         | 7.6 MB    | ############################################################################################################### | 100%
pandas-1.0.3         | 7.8 MB    | ############################################################################################################### | 100%
jinja2-2.11.2        | 103 KB    | ############################################################################################################### | 100%
bokeh-2.0.2          | 5.3 MB    | ############################################################################################################### | 100%
pytorch-1.2.0        | 3 KB      | ############################################################################################################### | 100%
spectrum-mpi-10.03   | 22.3 MB   | ############################################################################################################### | 100%
click-7.0            | 118 KB    | ############################################################################################################### | 100%
networkx-2.2         | 2.0 MB    | ############################################################################################################### | 100%
py-opencv-3.4.7      | 1.5 MB    | ############################################################################################################### | 100%
torchtext-0.4.0      | 1.1 MB    | ############################################################################################################### | 100%
decorator-4.4.2      | 14 KB     | ############################################################################################################### | 100%
harfbuzz-1.8.8       | 1001 KB   | ############################################################################################################### | 100%
keras-preprocessing- | 37 KB     | ############################################################################################################### | 100%
tensorflow-base-1.15 | 548.5 MB  | ############################################################################################################### | 100%
cytoolz-0.10.1       | 390 KB    | ############################################################################################################### | 100%
google-pasta-0.1.6   | 82 KB     | ############################################################################################################### | 100%
tensorflow-serving-a | 36 KB     | ############################################################################################################### | 100%
matplotlib-3.1.3     | 21 KB     | ############################################################################################################### | 100%
joblib-0.13.2        | 365 KB    | ############################################################################################################### | 100%
caffe-1.0_1.6.2      | 3 KB      | ############################################################################################################### | 100%
simsearch-1.1.0      | 26.0 MB   | ############################################################################################################### | 100%
cloudpickle-1.4.1    | 30 KB     | ############################################################################################################### | 100%
glib-2.63.1          | 1.9 MB    | ############################################################################################################### | 100%
hypothesis-3.59.1    | 352 KB    | ############################################################################################################### | 100%
absl-py-0.7.1        | 157 KB    | ############################################################################################################### | 100%
py-1.8.1             | 71 KB     | ############################################################################################################### | 100%
leveldb-1.20         | 394 KB    | ############################################################################################################### | 100%
protobuf-3.8.0       | 699 KB    | ############################################################################################################### | 100%
matplotlib-base-3.1. | 5.0 MB    | ############################################################################################################### | 100%
uff-0.6.5            | 79 KB     | ############################################################################################################### | 100%
dask-core-2.3.0      | 545 KB    | ############################################################################################################### | 100%
mock-2.0.0           | 104 KB    | ############################################################################################################### | 100%
libprotobuf-3.8.0    | 6.6 MB    | ############################################################################################################### | 100%
pycparser-2.20       | 92 KB     | ############################################################################################################### | 100%
gflags-2.2.2         | 238 KB    | ############################################################################################################### | 100%
ninja-1.9.0          | 1.9 MB    | ############################################################################################################### | 100%
powerai-release-1.6. | 3 KB      | ############################################################################################################### | 100%
atomicwrites-1.4.0   | 11 KB     | ############################################################################################################### | 100%
jasper-2.0.14        | 1.4 MB    | ############################################################################################################### | 100%
libopus-1.3.1        | 879 KB    | ############################################################################################################### | 100%
tensorflow-probabili | 2.1 MB    | ############################################################################################################### | 100%
urllib3-1.25.8       | 166 KB    | ############################################################################################################### | 100%
pyparsing-2.4.7      | 65 KB     | ############################################################################################################### | 100%
openblas-devel-0.3.6 | 74 KB     | ############################################################################################################### | 100%
numba-0.45.1         | 3.2 MB    | ############################################################################################################### | 100%
graphite2-1.3.13     | 109 KB    | ############################################################################################################### | 100%
x264-1!157.20191217  | 3.1 MB    | ############################################################################################################### | 100%
cycler-0.10.0        | 13 KB     | ############################################################################################################### | 100%
tensorflow-large-mod | 60 KB     | ############################################################################################################### | 100%
zict-2.0.0           | 13 KB     | ############################################################################################################### | 100%
numpy-base-1.16.6    | 3.5 MB    | ############################################################################################################### | 100%
dask-xgboost-0.1.7   | 23 KB     | ############################################################################################################### | 100%
numpy-1.16.6         | 47 KB     | ############################################################################################################### | 100%
pytz-2020.1          | 184 KB    | ############################################################################################################### | 100%
pyyaml-5.3.1         | 171 KB    | ############################################################################################################### | 100%
locket-0.2.0         | 8 KB      | ############################################################################################################### | 100%
packaging-20.3       | 36 KB     | ############################################################################################################### | 100%
numactl-2.0.12       | 139 KB    | ############################################################################################################### | 100%
pyopenssl-19.1.0     | 86 KB     | ############################################################################################################### | 100%
py-xgboost-base-0.90 | 84.7 MB   | ############################################################################################################### | 100%
nccl-2.4.8           | 137.4 MB  | ############################################################################################################### | 100%
requests-2.22.0      | 90 KB     | ############################################################################################################### | 100%
partd-1.1.0          | 20 KB     | ############################################################################################################### | 100%
llvmlite-0.29.0      | 16.6 MB   | ############################################################################################################### | 100%
_py-xgboost-mutex-1. | 7 KB      | ############################################################################################################### | 100%
openblas-0.3.6       | 19 KB     | ############################################################################################################### | 100%
c-ares-1.15.0        | 107 KB    | ############################################################################################################### | 100%
lmdb-0.9.22          | 506 KB    | ############################################################################################################### | 100%
python-dateutil-2.8. | 215 KB    | ############################################################################################################### | 100%
h5py-2.8.0           | 962 KB    | ############################################################################################################### | 100%
pysocks-1.7.1        | 30 KB     | ############################################################################################################### | 100%
idna-2.8             | 133 KB    | ############################################################################################################### | 100%
pywavelets-1.1.1     | 4.4 MB    | ############################################################################################################### | 100%
pluggy-0.13.1        | 32 KB     | ############################################################################################################### | 100%
six-1.12.0           | 22 KB     | ############################################################################################################### | 100%
freeglut-3.0.0       | 308 KB    | ############################################################################################################### | 100%
markupsafe-1.1.1     | 30 KB     | ############################################################################################################### | 100%
zipp-3.1.0           | 13 KB     | ############################################################################################################### | 100%
importlib_metadata-1 | 11 KB     | ############################################################################################################### | 100%
lz4-c-1.9.2          | 261 KB    | ############################################################################################################### | 100%
kiwisolver-1.2.0     | 93 KB     | ############################################################################################################### | 100%
apex-0.1.0_1.6.2     | 1.6 MB    | ############################################################################################################### | 100%
astor-0.7.1          | 43 KB     | ############################################################################################################### | 100%
tabulate-0.8.2       | 36 KB     | ############################################################################################################### | 100%
attrs-19.3.0         | 40 KB     | ############################################################################################################### | 100%
typing_extensions-3. | 41 KB     | ############################################################################################################### | 100%
libglu-9.0.0         | 635 KB    | ############################################################################################################### | 100%
tensorboard-1.15.0   | 3.8 MB    | ############################################################################################################### | 100%
graphsurgeon-0.4.1   | 27 KB     | ############################################################################################################### | 100%
pillow-7.0.0         | 649 KB    | ############################################################################################################### | 100%
ddl-tensorflow-1.5.0 | 2.4 MB    | ############################################################################################################### | 100%
scikit-learn-0.22.1  | 4.9 MB    | ############################################################################################################### | 100%
cudatoolkit-10.1.243 | 510.3 MB  | ############################################################################################################### | 100%
caffe-base-1.0_1.6.2 | 27.6 MB   | ############################################################################################################### | 100%
tornado-6.0.4        | 597 KB    | ############################################################################################################### | 100%
onnx-1.5.0           | 2.9 MB    | ############################################################################################################### | 100%
scikit-image-0.15.0  | 28.2 MB   | ############################################################################################################### | 100%
imageio-2.8.0        | 3.0 MB    | ############################################################################################################### | 100%
tensorrt-6.0.1.5     | 437.3 MB  | ############################################################################################################### | 100%
tensorflow-estimator | 666 KB    | ############################################################################################################### | 100%
libxml2-2.9.10       | 1.3 MB    | ############################################################################################################### | 100%
pbr-5.4.4            | 76 KB     | ############################################################################################################### | 100%
cffi-1.12.3          | 236 KB    | ############################################################################################################### | 100%
distributed-2.3.2    | 369 KB    | ############################################################################################################### | 100%
python-3.6.10        | 29.9 MB   | ############################################################################################################### | 100%
olefile-0.46         | 48 KB     | ############################################################################################################### | 100%
icu-58.2             | 10.9 MB   | ############################################################################################################### | 100%
ffmpeg-4.2.2         | 70.3 MB   | ############################################################################################################### | 100%
tf_cnn_benchmarks-1. | 189 KB    | ############################################################################################################### | 100%
zstd-1.4.4           | 1009 KB   | ############################################################################################################### | 100%
msgpack-python-1.0.0 | 93 KB     | ############################################################################################################### | 100%
libboost-1.67.0      | 22.1 MB   | ############################################################################################################### | 100%
py-boost-1.67.0      | 358 KB    | ############################################################################################################### | 100%
pytest-4.4.2         | 358 KB    | ############################################################################################################### | 100%
importlib-metadata-1 | 49 KB     | ############################################################################################################### | 100%
coverage-4.5.4       | 227 KB    | ############################################################################################################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done

위 과정이 완료되면 Powerai(WML-CE) 1.6.2 버전에서 함께 제공되는 패키지가 모두 설치된 것입니다. 주로 제공되는 패키지는 아래의 목록에 포함되어 있고, 이 외에도 포함된 패키지들이 있습니다. 


Anaconda에 등록된 채널 중, ibm 제공 채널에서 제공하는 패키지를 우선적으로 설치하려는 경우, 아래의 명령어로 채널 우선순위를 고정시켜 두거나, 패키지 설치 명령 뒤에 채널을 임의로 명기할 수도 있습니다.

conda config --show
conda config --set channel_priority strict
conda install pytorch powerai-release=1.6.2

설치된 패키지를 실행하여, gpu를 이용하여 동작 가능한지 확인해봅니다.

(wmlce_env) [root@p1311-met1 ~]# python
Python 3.6.10 |Anaconda, Inc.| (default, Mar 26 2020, 00:22:27)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2020-06-15 03:32:52.174173: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
>>> tf.test.gpu_device_name()
2020-06-15 03:33:01.615986: I tensorflow/core/platform/profile_utils/cpu_utils.cc:101] CPU Frequency: 3783000000 Hz
2020-06-15 03:33:01.671666: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1222ac670 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-06-15 03:33:01.671693: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-06-15 03:33:01.675095: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-06-15 03:33:02.262105: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x12236ac60 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-06-15 03:33:02.262149: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Tesla V100-SXM2-32GB, Compute Capability 7.0
2020-06-15 03:33:02.262158: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (1): Tesla V100-SXM2-32GB, Compute Capability 7.0
2020-06-15 03:33:02.262165: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (2): Tesla V100-SXM2-32GB, Compute Capability 7.0
2020-06-15 03:33:02.262173: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (3): Tesla V100-SXM2-32GB, Compute Capability 7.0
2020-06-15 03:33:02.267713: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 0 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0004:04:00.0
2020-06-15 03:33:02.270363: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 1 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0004:05:00.0
2020-06-15 03:33:02.272889: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 2 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0035:03:00.0
2020-06-15 03:33:02.275410: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Found device 3 with properties:
name: Tesla V100-SXM2-32GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0035:04:00.0
2020-06-15 03:33:02.275435: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-06-15 03:33:02.276784: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-06-15 03:33:02.277831: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-06-15 03:33:02.278243: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-06-15 03:33:02.279466: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-06-15 03:33:02.280453: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-06-15 03:33:02.283459: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-06-15 03:33:02.303752: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1767] Adding visible gpu devices: 0, 1, 2, 3
2020-06-15 03:33:02.303784: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-06-15 03:33:03.783369: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1180] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-15 03:33:03.783427: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186]      0 1 2 3
2020-06-15 03:33:03.783443: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 0:   N Y Y Y
2020-06-15 03:33:03.783460: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 1:   Y N Y Y
2020-06-15 03:33:03.783476: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 2:   Y Y N Y
2020-06-15 03:33:03.783491: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1199] 3:   Y Y Y N
2020-06-15 03:33:03.796899: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/device:GPU:0 with 30459 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0004:04:00.0, compute capability: 7.0)
2020-06-15 03:33:03.801573: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/device:GPU:1 with 30459 MB memory) -> physical GPU (device: 1, name: Tesla V100-SXM2-32GB, pci bus id: 0004:05:00.0, compute capability: 7.0)
2020-06-15 03:33:03.806183: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/device:GPU:2 with 30459 MB memory) -> physical GPU (device: 2, name: Tesla V100-SXM2-32GB, pci bus id: 0035:03:00.0, compute capability: 7.0)
2020-06-15 03:33:03.810723: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1325] Created TensorFlow device (/device:GPU:3 with 30459 MB memory) -> physical GPU (device: 3, name: Tesla V100-SXM2-32GB, pci bus id: 0035:04:00.0, compute capability: 7.0)
'/device:GPU:0'
>>>

tensorflow 에서 총 4개의 gpu를 확인할 수 있습니다. 이제 각 프레임워크를 이용하여 필요한 작업을 진행하시면 됩니다.

추가로 설치가 필요한 Python 관련 패키지의 경우, conda나 pip 명령어를 이용하여 설치하면 됩니다. 예를 들어 datetime 이라는 패키지의 경우, conda search 명령어로 찾으면 datetime 이라는 동일한 이름의 패키지를 찾을수가 없습니다. 반면에 pip search 명령어로는 바로 DateTime(4.3) 버전이 검색됩니다. conda로 검색되는 경우, conda install <패키지이름> 으로 설치할 수 있고, pip 로 검색되는 경우에는 pip install <패키지 이름> 으로 설치합니다.

(wmlce_env) [root@p1311-met1 ~]# conda search datetime
Loading channels: done
No match found for: datetime. Search: *datetime*
# Name                       Version           Build  Channel
parsedatetime                    2.4          py27_0  pkgs/main
parsedatetime                    2.4          py35_0  pkgs/main
parsedatetime                    2.4          py36_0  pkgs/main
parsedatetime                    2.4          py37_0  pkgs/main
parsedatetime                    2.4          py38_0  pkgs/main
r-assertive.datetimes           0.0_2   r36h6115d3f_0  pkgs/r
r-datetime                     0.1.4   r36h6115d3f_0  pkgs/r
r-datetimeutils                0.3_0   r36h6115d3f_0  pkgs/r

(wmlce_env) [root@p1311-met1 ~]# pip search datetime
DateTime (4.3)     

(wmlce_env) [root@p1311-met1 ~]# pip install datetime
Collecting datetime
  Downloading DateTime-4.3-py2.py3-none-any.whl (60 kB)
     |████████████████████████████████| 60 kB 4.6 MB/s
Collecting zope.interface
  Downloading zope.interface-5.1.0.tar.gz (225 kB)
     |████████████████████████████████| 225 kB 10.6 MB/s
Requirement already satisfied: pytz in /opt/anaconda3/envs/wmlce_env/lib/python3.6/site-packages (from datetime) (2020.1)
Requirement already satisfied: setuptools in /opt/anaconda3/envs/wmlce_env/lib/python3.6/site-packages (from zope.interface->datetime) (47.1.1.post20200604)
Building wheels for collected packages: zope.interface
  Building wheel for zope.interface (setup.py) ... done
  Created wheel for zope.interface: filename=zope.interface-5.1.0-cp36-cp36m-linux_ppc64le.whl size=232220 sha256=09c0d443c68581ba3773bc4097479cdc35276ba41ac2f9a5f75936113fa936df
  Stored in directory: /root/.cache/pip/wheels/4d/b1/d4/071ce5e5e22295b5c70b72f4df166ea45f8a60c9374c9feb3c
Successfully built zope.interface
Installing collected packages: zope.interface, datetime
Successfully installed datetime-4.3 zope.interface-5.1.0
(wmlce_env) [root@p1311-met1 ~]# conda list datetime
# packages in environment at /opt/anaconda3/envs/wmlce_env:
#
# Name                    Version                   Build  Channel
datetime                  4.3                      pypi_0    pypi