京橋のバイオインフォマティシャンの日常

南国のビーチパラソルの下で、Rプログラムを打ってる日常を求めて、、Daily Life of Bioinformatician in Kyobashi of Osaka

R言語/Webスクレイピングで、Wikipediaページのテーブル情報からNASDAQ-100の銘柄リストを取得して、2021年中の株価推移を見てみた件

はじめに

R言語を利用して、 Wikipediaページから銘柄コード情報を「Webスクレイピング」で取得して、 さらに各銘柄の株価情報を「quantmod パッケージ」を使って収集する。

今回の記事では、2021年年初から12月末までのNASDAQ-100銘柄(2022年1月3日現在)のパフォーマンスを計算して、その結果をGIFアニメーションにしてみた。

2021年のNASDAQ-100パフォーマンス。緑パネルはプラスパフォ、赤パネルはマイナスパフォを意味してます。

2021年は、LCID (ルーシッド・グループ)、FTNT (フォーティネット)、NVDA (エヌビディア)、MRNA (モデルナ)あたりがキラリと光りましたね。

NASDAQ-100のWikipediaページ、Rでの下準備

Wikipediaの英語ページから、銘柄コードを取得する。

Wikipediaでは、各指数の銘柄リストをテーブル形式にして、まとめてくれている。

NASDAQ-100

NASDAQ-100

en.wikipedia.org

まずは、R/RStudioを起動する。 下準備として、URLを変数に格納しておく。

NASDAQ100_url <- "https://en.wikipedia.org/wiki/NASDAQ-100"

#ブラウザで確認
#browseURL(NASDAQ100_url)

使用するパッケージの事前準備

まずは、rvestquantmodmagrittrパッケージあたりをインストールして準備する。

#インストール
install.packages(c("rvest", "quantmod", "magrittr", "tidyr"))

#ロード
library(rvest)
library(quantmod)
library(magrittr)
library(tidyr)

Wikipediaページからの銘柄リストの取得

Wikipediaは自由編集なので仕方ないのだが、 それぞれのページで、何番目のテーブルに銘柄コードが記載されているかが変わる。 また、テーブルの形や列名も違うので、それぞれに合わせて、 必要な箇所を取得することになる。

以下に、NASDAQ-100(ナスダック100指数)のティッカー収集の実行コードを示す。

NASDAQ-100

#ナスダック-100指数
NASDAQ100 <- NASDAQ100_url %>%
  read_html() %>%
  html_nodes("table") %>%
  .[[4]] %>%
  html_table() %>%
  data.frame()

#取得結果の表示
head(NASDAQ100)
#                 Company Ticker            GICS.Sector                  GICS.Sub.Industry
#1    Activision Blizzard   ATVI Communication Services     Interactive Home Entertainment
#2                  Adobe   ADBE Information Technology               Application Software
#3 Advanced Micro Devices    AMD Information Technology                     Semiconductors
#4                 Airbnb   ABNB Consumer Discretionary Internet & Direct Marketing Retail
#5       Align Technology   ALGN            Health Care               Health Care Supplies
#6     Alphabet (Class A)  GOOGL Communication Services       Interactive Media & Services

#Tickerの表示
NASDAQ100$Ticker
# [1] "ATVI"  "ADBE"  "AMD"   "ABNB"  "ALGN"  "GOOGL" "GOOG"  "AMZN" 
# [9] "AEP"   "AMGN"  "ADI"   "ANSS"  "AAPL"  "AMAT"  "ASML"  "TEAM" 
#[17] "ADSK"  "ADP"   "BIDU"  "BIIB"  "BKNG"  "AVGO"  "CDNS"  "CHTR" 
#[25] "CTAS"  "CSCO"  "CTSH"  "CMCSA" "CPRT"  "COST"  "CRWD"  "CSX"  
#[33] "DDOG"  "DXCM"  "DOCU"  "DLTR"  "EBAY"  "EA"    "EXC"   "FAST" 
#[41] "FISV"  "FTNT"  "GILD"  "HON"   "IDXX"  "ILMN"  "INTC"  "INTU" 
#[49] "ISRG"  "JD"    "KDP"   "KLAC"  "KHC"   "LRCX"  "LCID"  "LULU" 
#[57] "MAR"   "MRVL"  "MTCH"  "MELI"  "FB"    "MCHP"  "MU"    "MSFT" 
#[65] "MRNA"  "MDLZ"  "MNST"  "NTES"  "NFLX"  "NVDA"  "NXPI"  "ORLY" 
#[73] "OKTA"  "PCAR"  "PANW"  "PAYX"  "PYPL"  "PTON"  "PEP"   "PDD"  
#[81] "QCOM"  "REGN"  "ROST"  "SGEN"  "SIRI"  "SWKS"  "SPLK"  "SBUX" 
#[89] "SNPS"  "TMUS"  "TSLA"  "TXN"   "VRSN"  "VRSK"  "VRTX"  "WBA"  
#[97] "WDAY"  "XEL"   "XLNX"  "ZM"    "ZS"  

NASDAQ-100銘柄の年間パフォーマンスをアニメーションにする

NASDAQ-100銘柄の2021年中の株価を取得して、 アニメーションにしてみる。

それでは、さっそく、実行コードを書いてみる。

はじめに、quantmod::getSymbols関数を使って、 2021年中のNASDAQ-100銘柄の値動きをすべて取得してみる。

#NASDAQ100の銘柄コード
nasdaq100.tic <- NASDAQ100$Ticker
nasdaq100.tic
#  [1] "ATVI"  "ADBE"  "AMD"   "ABNB"  "ALGN"  "GOOGL" "GOOG"  "AMZN" 
#  [9] "AEP"   "AMGN"  "ADI"   "ANSS"  "AAPL"  "AMAT"  "ASML"  "TEAM" 
# [17] "ADSK"  "ADP"   "BIDU"  "BIIB"  "BKNG"  "AVGO"  "CDNS"  "CHTR" 
# [25] "CTAS"  "CSCO"  "CTSH"  "CMCSA" "CPRT"  "COST"  "CRWD"  "CSX"  
# [33] "DDOG"  "DXCM"  "DOCU"  "DLTR"  "EBAY"  "EA"    "EXC"   "FAST" 
# [41] "FISV"  "FTNT"  "GILD"  "HON"   "IDXX"  "ILMN"  "INTC"  "INTU" 
# [49] "ISRG"  "JD"    "KDP"   "KLAC"  "KHC"   "LRCX"  "LCID"  "LULU" 
# [57] "MAR"   "MRVL"  "MTCH"  "MELI"  "FB"    "MCHP"  "MU"    "MSFT" 
# [65] "MRNA"  "MDLZ"  "MNST"  "NTES"  "NFLX"  "NVDA"  "NXPI"  "ORLY" 
# [73] "OKTA"  "PCAR"  "PANW"  "PAYX"  "PYPL"  "PTON"  "PEP"   "PDD"  
# [81] "QCOM"  "REGN"  "ROST"  "SGEN"  "SIRI"  "SWKS"  "SPLK"  "SBUX" 
# [89] "SNPS"  "TMUS"  "TSLA"  "TXN"   "VRSN"  "VRSK"  "VRTX"  "WBA"  
# [97] "WDAY"  "XEL"   "XLNX"  "ZM"    "ZS"   

#2021年中の株価取得
Date <- c("2021-01-01", "2021-12-31")
list <- as.character(unlist(nasdaq100.tic))
quantmod::getSymbols(list, src = "yahoo", verbose = T, from = Date[1], to=Date[2])

#空のデータフレームの作成
stock <- data.frame(matrix(NA, 
                           nrow=dim(get(nasdaq100.tic[1]))[1],
                           ncol=length(list)))
#列名を付与する
colnames(stock) <- list

#表示
head(stock)
#  ATVI ADBE AMD ABNB ALGN GOOGL GOOG AMZN AEP AMGN ADI ANSS AAPL AMAT
#1   NA   NA  NA   NA   NA    NA   NA   NA  NA   NA  NA   NA   NA   NA
#2   NA   NA  NA   NA   NA    NA   NA   NA  NA   NA  NA   NA   NA   NA
#3   NA   NA  NA   NA   NA    NA   NA   NA  NA   NA  NA   NA   NA   NA
#4   NA   NA  NA   NA   NA    NA   NA   NA  NA   NA  NA   NA   NA   NA
#5   NA   NA  NA   NA   NA    NA   NA   NA  NA   NA  NA   NA   NA   NA
#6   NA   NA  NA   NA   NA    NA   NA   NA  NA   NA  NA   NA   NA   NA

#データの代入
#文字列(ex.  "assign('a', ATVI[,4])" )を作成して、
#eval(parse(text = "..."))で、その文字列を命令文として実行する
for(n in seq_len(length(list))){
try(eval(parse(text = paste("assign('a', ", list[n], "[,4])", sep=""))))
stock[,n] <- a
}

#行名を日付にする
rownames(stock) <- rownames(data.frame(a))

#データ取得完了
head(stock)
#            ATVI   ADBE   AMD   ABNB   ALGN   GOOGL    GOOG    AMZN
#2021-01-04 89.90 485.34 92.30 139.15 526.46 1726.13 1728.24 3186.63
#2021-01-05 90.69 485.69 92.77 148.30 543.65 1740.05 1740.92 3218.51
#2021-01-06 88.00 466.31 90.33 142.77 540.39 1722.88 1735.29 3138.38
#2021-01-07 89.67 477.74 95.16 151.27 558.36 1774.34 1787.25 3162.16
#2021-01-08 91.30 485.10 94.58 149.77 570.53 1797.83 1807.21 3182.70
#2021-01-11 90.91 474.24 97.25 148.13 557.04 1756.29 1766.72 3114.21

次に、年初時(2021年1月4日)の株価を 「100」 に 補正して、アニメーション用にデータを加工する。

#年初時の株価を「100」に補正
stock.c <- stock
for(n in 1:ncol(stock)){
stock.c[,n] <- round(as.numeric(stock[,n])/as.numeric(stock[1,n])*100, 3)
}

#途中表示
head(stock.c)
#              ATVI    ADBE     AMD    ABNB    ALGN   GOOGL    GOOG
#2021-01-04 100.000 100.000 100.000 100.000 100.000 100.000 100.000
#2021-01-05 100.879 100.072 100.509 106.576 103.265 100.806 100.734
#2021-01-06  97.887  96.079  97.866 102.602 102.646  99.812 100.408
#2021-01-07  99.744  98.434 103.099 108.710 106.059 102.793 103.414
#2021-01-08 101.557  99.951 102.470 107.632 108.371 104.154 104.569
#2021-01-11 101.123  97.713 105.363 106.453 105.809 101.747 102.227

#データの行列を入れ替える
stock.t <- t(stock.c)

#途中表示
head(stock.t)
#      2021-01-04 2021-01-05 2021-01-06 2021-01-07 2021-01-08
#ATVI         100    100.879     97.887     99.744    101.557
#ADBE         100    100.072     96.079     98.434     99.951
#AMD          100    100.509     97.866    103.099    102.470

#セクター列を追加する
stock01 <- data.frame(tic=rownames(stock.t), Sector=NASDAQ100$"GICS.Sector", stock.t)
rownames(stock01) <- 1:nrow(stock01)

#途中表示
head(stock01)
#   tic                 Sector X2021.01.04 X2021.01.05 X2021.01.06
#1 ATVI Communication Services         100     100.879      97.887
#2 ADBE Information Technology         100     100.072      96.079
#3  AMD Information Technology         100     100.509      97.866
#  X2021.01.07 X2021.01.08 X2021.01.11 X2021.01.12 X2021.01.13
#1      99.744     101.557     101.123      99.277      99.855
#2      98.434      99.951      97.713      97.179      97.262
#3     103.099     102.470     105.363     103.315      99.437

#少しデータを間引く
stock02 <- stock01[,c(1:2, seq(3, ncol(stock01), by=5))]

#データの並びを変える
stock03 <- tidyr::gather(stock02, key="date", value="close", -c(tic, Sector)) 
stock03$date <- sub("X", "", stock03$date)
stock03$date <- gsub("\\.", "/", stock03$date)
stock03$date <- paste0(stock03$date, "-16-00-00")

#途中経過を表示
head(stock03)
#    tic                 Sector                date close
#1  ATVI Communication Services 2021/01/04-16-00-00   100
#2  ADBE Information Technology 2021/01/04-16-00-00   100
#3   AMD Information Technology 2021/01/04-16-00-00   100
#4  ABNB Consumer Discretionary 2021/01/04-16-00-00   100
#5  ALGN            Health Care 2021/01/04-16-00-00   100
#6 GOOGL Communication Services 2021/01/04-16-00-00   100

ここで、必要なパッケージの準備を行う。

#インストール
install.packages(c("ggplot2", "treemapify", "gganimate", "gapminder", "gifski"))

#ロード
library(ggplot2)
library(treemapify)
library(gganimate)
library(gapminder)
library(gifski)

次に、stock03のデータを使って、アニメーションを作成してみる。

#日時列に変える
stock03$date  <- as.Date(stock03$date)

#途中経過を表示
head(stock03)
#    tic                 Sector       date close
#1  ATVI Communication Services 2021-01-04   100
#2  ADBE Information Technology 2021-01-04   100
#3   AMD Information Technology 2021-01-04   100
#4  ABNB Consumer Discretionary 2021-01-04   100
#5  ALGN            Health Care 2021-01-04   100
#6 GOOGL Communication Services 2021-01-04   100

#株価の変動幅から、カラーを決める
stock03$dclose <- stock03$close - 100
stock03$dclose2 <- NA
colfunc <- grDevices::colorRampPalette(c("brown3", "white", "darkgreen"))

#色で区分け
a <- colfunc(17)
b1 <- seq(range(stock03$dclose)[1]-10, 0, length.out=9)
b2 <- seq(0, range(stock03$dclose)[2]+10, length.out=9)
b3 <- c(b1, b2[-1])
for(n in length(b3):1){stock03$dclose2[stock03$dclose < b3[n]] <- a[n]  }

#途中経過を表示
head(stock03)
#    tic                 Sector       date close dclose dclose2
#1  ATVI Communication Services 2021-01-04   100      0 #DFEBDF
#2  ADBE Information Technology 2021-01-04   100      0 #DFEBDF
#3   AMD Information Technology 2021-01-04   100      0 #DFEBDF
#4  ABNB Consumer Discretionary 2021-01-04   100      0 #DFEBDF
#5  ALGN            Health Care 2021-01-04   100      0 #DFEBDF
#6 GOOGL Communication Services 2021-01-04   100      0 #DFEBDF

1年間のパフォーマンス

#年変動の結果表示
stock2021 <- stock03[stock03$date == "2021-12-30", c(1:3,5)]
stock2021[order(stock2021$dclose, decreasing = T),]

       tic                 Sector       date  dclose
5105  LCID Consumer Discretionary 2021-12-30 285.956
5092  FTNT Information Technology 2021-12-30 147.170
5120  NVDA Information Technology 2021-12-30 125.615
5115  MRNA            Health Care 2021-12-30 125.186
5083  DDOG Information Technology 2021-12-30  96.131
5108  MRVL Information Technology 2021-12-30  88.815
5064  AMAT Information Technology 2021-12-30  81.858
5098  INTU Information Technology 2021-12-30  73.086
5056 GOOGL Communication Services 2021-12-30  69.397
5057  GOOG Communication Services 2021-12-30  68.961
5066  TEAM Information Technology 2021-12-30  66.419
5151    ZS Information Technology 2021-12-30  65.531
5102  KLAC Information Technology 2021-12-30  64.053
5065  ASML Information Technology 2021-12-30  60.044
5125  PANW Information Technology 2021-12-30  59.649
5053   AMD Information Technology 2021-12-30  57.259
5072  AVGO Information Technology 2021-12-30  56.408
5114  MSFT Information Technology 2021-12-30  55.873
5122  ORLY Consumer Discretionary 2021-12-30  54.585
5084  DXCM            Health Care 2021-12-30  51.314
5126  PAYX Information Technology 2021-12-30  51.007
5104  LRCX Information Technology 2021-12-30  50.272
5149  XLNX Information Technology 2021-12-30  50.172
5080  COST       Consumer Staples 2021-12-30  48.339
5141  TSLA Consumer Discretionary 2021-12-30  46.668
5139  SNPS Information Technology 2021-12-30  45.587
5068   ADP Information Technology 2021-12-30  45.336
5076  CSCO Information Technology 2021-12-30  44.722
5121  NXPI Information Technology 2021-12-30  41.076
5089   EXC              Utilities 2021-12-30  39.300
5073  CDNS Information Technology 2021-12-30  38.664
5063  AAPL Information Technology 2021-12-30  37.702
5099  ISRG            Health Care 2021-12-30  36.238
5095  IDXX            Health Care 2021-12-30  34.478
5090  FAST            Industrials 2021-12-30  33.914
5132  REGN            Health Care 2021-12-30  33.063
5107   MAR Consumer Discretionary 2021-12-30  32.131
5086  DLTR Consumer Discretionary 2021-12-30  32.102
5087  EBAY Consumer Discretionary 2021-12-30  29.670
5111    FB Communication Services 2021-12-30  28.043
5075  CTAS            Industrials 2021-12-30  27.925
5082   CSX            Industrials 2021-12-30  27.242
5112  MCHP Information Technology 2021-12-30  26.957
5113    MU Information Technology 2021-12-30  26.793
5055  ALGN            Health Care 2021-12-30  25.787
5146   WBA       Consumer Staples 2021-12-30  25.580
5079  CPRT            Industrials 2021-12-30  25.222
5131  QCOM Information Technology 2021-12-30  23.051
5093  GILD            Health Care 2021-12-30  22.043
5054  ABNB Consumer Discretionary 2021-12-30  21.294
5147  WDAY Information Technology 2021-12-30  21.229
5061   ADI Information Technology 2021-12-30  19.696
5129   PEP       Consumer Staples 2021-12-30  19.685
5143  VRSN Information Technology 2021-12-30  19.070
5052  ADBE Information Technology 2021-12-30  17.553
5119  NFLX Communication Services 2021-12-30  17.066
5142   TXN Information Technology 2021-12-30  16.761
5101   KDP       Consumer Staples 2021-12-30  16.113
5116  MDLZ       Consumer Staples 2021-12-30  13.519
5062  ANSS Information Technology 2021-12-30  13.423
5138  SBUX Consumer Discretionary 2021-12-30  12.745
5144  VRSK            Industrials 2021-12-30  12.426
5077  CTSH Information Technology 2021-12-30  11.987
5106  LULU Consumer Discretionary 2021-12-30  11.866
5071  BKNG Consumer Discretionary 2021-12-30  10.713
5059   AEP              Utilities 2021-12-30   8.769
5118  NTES Communication Services 2021-12-30   7.246
5058  AMZN Consumer Discretionary 2021-12-30   5.845
5117  MNST       Consumer Staples 2021-12-30   5.287
5096  ILMN            Health Care 2021-12-30   4.481
5081  CRWD Information Technology 2021-12-30   4.220
5103   KHC       Consumer Staples 2021-12-30   4.178
5097  INTC Information Technology 2021-12-30   4.168
5135  SIRI Communication Services 2021-12-30   4.052
5148   XEL              Utilities 2021-12-30   3.914
5136  SWKS Information Technology 2021-12-30   3.370
5124  PCAR            Industrials 2021-12-30   3.187
5074  CHTR Communication Services 2021-12-30   1.576
5078 CMCSA Communication Services 2021-12-30   0.158
5060  AMGN            Health Care 2021-12-30  -0.084
5094   HON            Industrials 2021-12-30  -0.404
5070  BIIB            Health Care 2021-12-30  -1.214
5133  ROST Consumer Discretionary 2021-12-30  -2.392
5145  VRTX            Health Care 2021-12-30  -3.042
5088    EA Communication Services 2021-12-30  -3.620
5067  ADSK Information Technology 2021-12-30  -5.097
5091  FISV Information Technology 2021-12-30  -6.612
5134  SGEN            Health Care 2021-12-30  -6.673
5123  OKTA Information Technology 2021-12-30 -10.221
5109  MTCH Communication Services 2021-12-30 -11.153
5140  TMUS Communication Services 2021-12-30 -12.021
5127  PYPL Information Technology 2021-12-30 -17.265
5110  MELI Consumer Discretionary 2021-12-30 -17.289
5100    JD Consumer Discretionary 2021-12-30 -18.452
5051  ATVI Communication Services 2021-12-30 -24.928
5137  SPLK Information Technology 2021-12-30 -30.222
5085  DOCU Information Technology 2021-12-30 -30.237
5069  BIDU Communication Services 2021-12-30 -30.530
5150    ZM Information Technology 2021-12-30 -47.075
5130   PDD Consumer Discretionary 2021-12-30 -64.354
5128  PTON Consumer Discretionary 2021-12-30 -74.520

ツリーマップのアニメーションを作成する

#ツリーマップの作成
p <- ggplot(stock03, aes(label=tic, area = close, 
            fill = dclose2, subgroup = Sector )) +
  geom_treemap( layout = "squarified", colour="white", start="topleft") +
  scale_fill_identity() +
  geom_treemap_subgroup_border(layout = "squarified", colour = "white", size = 5, start="topleft") +
  geom_treemap_subgroup_text(layout = "squarified", place = "top",
                             grow = T, alpha = 1, colour = "#FAFAFA",
                             min.size = 0, start = "topleft") +
  geom_treemap_text(layout = "squarified", place = "centre", grow = TRUE, 
                    colour = "grey50", min.size = 8, reflow = T, start = "topleft") +
  transition_time(date) +
  labs(title = "NASDAQ-100, Date: {frame_time}") +
  ease_aes('linear')

#アニメーションとして出力(2-3分くらいかかる)
animate(p, duration = 50, width = 500, height = 500, renderer = gifski_renderer("NASDAQ100_animation.gif"))

2021年のNASDAQ-100パフォーマンス。

まとめ

2021年の振り返りに、NASDAQ-100の全銘柄コードの取得から、 株価変動のアニメーション作成までのRコードと実行結果を紹介した。

2021年も、指数を自己流ポートフォリオで、 オーバーパフォームするのはとってもとっても難しかった(泣)。

Webスクレイピングについての関連図書

Webスクレイピングの関連図書を列挙しておきます。

過去の関連記事

skume.net