- はじめに
- NASDAQ-100のWikipediaページ、Rでの下準備
- 使用するパッケージの事前準備
- Wikipediaページからの銘柄リストの取得
- NASDAQ-100銘柄の年間パフォーマンスをアニメーションにする
- まとめ
- 過去の関連記事
はじめに
R言語を利用して、 Wikipediaページから銘柄コード情報を「Webスクレイピング」で取得して、 さらに各銘柄の株価情報を「quantmod パッケージ」を使って収集する。
今回の記事では、2021年年初から12月末までのNASDAQ-100銘柄(2022年1月3日現在)のパフォーマンスを計算して、その結果をGIFアニメーションにしてみた。
2021年は、LCID (ルーシッド・グループ)、FTNT (フォーティネット)、NVDA (エヌビディア)、MRNA (モデルナ)あたりがキラリと光りましたね。
NASDAQ-100のWikipediaページ、Rでの下準備
Wikipediaの英語ページから、銘柄コードを取得する。
Wikipediaでは、各指数の銘柄リストをテーブル形式にして、まとめてくれている。
NASDAQ-100
まずは、R/RStudioを起動する。 下準備として、URLを変数に格納しておく。
NASDAQ100_url <- "https://en.wikipedia.org/wiki/NASDAQ-100" #ブラウザで確認 #browseURL(NASDAQ100_url)
使用するパッケージの事前準備
まずは、rvest
、quantmod
、magrittr
パッケージあたりをインストールして準備する。
#インストール install.packages(c("rvest", "quantmod", "magrittr", "tidyr")) #ロード library(rvest) library(quantmod) library(magrittr) library(tidyr)
Wikipediaページからの銘柄リストの取得
Wikipediaは自由編集なので仕方ないのだが、 それぞれのページで、何番目のテーブルに銘柄コードが記載されているかが変わる。 また、テーブルの形や列名も違うので、それぞれに合わせて、 必要な箇所を取得することになる。
以下に、NASDAQ-100(ナスダック100指数)のティッカー収集の実行コードを示す。
NASDAQ-100
#ナスダック-100指数 NASDAQ100 <- NASDAQ100_url %>% read_html() %>% html_nodes("table") %>% .[[4]] %>% html_table() %>% data.frame() #取得結果の表示 head(NASDAQ100) # Company Ticker GICS.Sector GICS.Sub.Industry #1 Activision Blizzard ATVI Communication Services Interactive Home Entertainment #2 Adobe ADBE Information Technology Application Software #3 Advanced Micro Devices AMD Information Technology Semiconductors #4 Airbnb ABNB Consumer Discretionary Internet & Direct Marketing Retail #5 Align Technology ALGN Health Care Health Care Supplies #6 Alphabet (Class A) GOOGL Communication Services Interactive Media & Services #Tickerの表示 NASDAQ100$Ticker # [1] "ATVI" "ADBE" "AMD" "ABNB" "ALGN" "GOOGL" "GOOG" "AMZN" # [9] "AEP" "AMGN" "ADI" "ANSS" "AAPL" "AMAT" "ASML" "TEAM" #[17] "ADSK" "ADP" "BIDU" "BIIB" "BKNG" "AVGO" "CDNS" "CHTR" #[25] "CTAS" "CSCO" "CTSH" "CMCSA" "CPRT" "COST" "CRWD" "CSX" #[33] "DDOG" "DXCM" "DOCU" "DLTR" "EBAY" "EA" "EXC" "FAST" #[41] "FISV" "FTNT" "GILD" "HON" "IDXX" "ILMN" "INTC" "INTU" #[49] "ISRG" "JD" "KDP" "KLAC" "KHC" "LRCX" "LCID" "LULU" #[57] "MAR" "MRVL" "MTCH" "MELI" "FB" "MCHP" "MU" "MSFT" #[65] "MRNA" "MDLZ" "MNST" "NTES" "NFLX" "NVDA" "NXPI" "ORLY" #[73] "OKTA" "PCAR" "PANW" "PAYX" "PYPL" "PTON" "PEP" "PDD" #[81] "QCOM" "REGN" "ROST" "SGEN" "SIRI" "SWKS" "SPLK" "SBUX" #[89] "SNPS" "TMUS" "TSLA" "TXN" "VRSN" "VRSK" "VRTX" "WBA" #[97] "WDAY" "XEL" "XLNX" "ZM" "ZS"
NASDAQ-100銘柄の年間パフォーマンスをアニメーションにする
NASDAQ-100銘柄の2021年中の株価を取得して、 アニメーションにしてみる。
それでは、さっそく、実行コードを書いてみる。
はじめに、quantmod::getSymbols関数を使って、 2021年中のNASDAQ-100銘柄の値動きをすべて取得してみる。
#NASDAQ100の銘柄コード nasdaq100.tic <- NASDAQ100$Ticker nasdaq100.tic # [1] "ATVI" "ADBE" "AMD" "ABNB" "ALGN" "GOOGL" "GOOG" "AMZN" # [9] "AEP" "AMGN" "ADI" "ANSS" "AAPL" "AMAT" "ASML" "TEAM" # [17] "ADSK" "ADP" "BIDU" "BIIB" "BKNG" "AVGO" "CDNS" "CHTR" # [25] "CTAS" "CSCO" "CTSH" "CMCSA" "CPRT" "COST" "CRWD" "CSX" # [33] "DDOG" "DXCM" "DOCU" "DLTR" "EBAY" "EA" "EXC" "FAST" # [41] "FISV" "FTNT" "GILD" "HON" "IDXX" "ILMN" "INTC" "INTU" # [49] "ISRG" "JD" "KDP" "KLAC" "KHC" "LRCX" "LCID" "LULU" # [57] "MAR" "MRVL" "MTCH" "MELI" "FB" "MCHP" "MU" "MSFT" # [65] "MRNA" "MDLZ" "MNST" "NTES" "NFLX" "NVDA" "NXPI" "ORLY" # [73] "OKTA" "PCAR" "PANW" "PAYX" "PYPL" "PTON" "PEP" "PDD" # [81] "QCOM" "REGN" "ROST" "SGEN" "SIRI" "SWKS" "SPLK" "SBUX" # [89] "SNPS" "TMUS" "TSLA" "TXN" "VRSN" "VRSK" "VRTX" "WBA" # [97] "WDAY" "XEL" "XLNX" "ZM" "ZS" #2021年中の株価取得 Date <- c("2021-01-01", "2021-12-31") list <- as.character(unlist(nasdaq100.tic)) quantmod::getSymbols(list, src = "yahoo", verbose = T, from = Date[1], to=Date[2]) #空のデータフレームの作成 stock <- data.frame(matrix(NA, nrow=dim(get(nasdaq100.tic[1]))[1], ncol=length(list))) #列名を付与する colnames(stock) <- list #表示 head(stock) # ATVI ADBE AMD ABNB ALGN GOOGL GOOG AMZN AEP AMGN ADI ANSS AAPL AMAT #1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA #2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA #3 NA NA NA NA NA NA NA NA NA NA NA NA NA NA #4 NA NA NA NA NA NA NA NA NA NA NA NA NA NA #5 NA NA NA NA NA NA NA NA NA NA NA NA NA NA #6 NA NA NA NA NA NA NA NA NA NA NA NA NA NA #データの代入 #文字列(ex. "assign('a', ATVI[,4])" )を作成して、 #eval(parse(text = "..."))で、その文字列を命令文として実行する for(n in seq_len(length(list))){ try(eval(parse(text = paste("assign('a', ", list[n], "[,4])", sep="")))) stock[,n] <- a } #行名を日付にする rownames(stock) <- rownames(data.frame(a)) #データ取得完了 head(stock) # ATVI ADBE AMD ABNB ALGN GOOGL GOOG AMZN #2021-01-04 89.90 485.34 92.30 139.15 526.46 1726.13 1728.24 3186.63 #2021-01-05 90.69 485.69 92.77 148.30 543.65 1740.05 1740.92 3218.51 #2021-01-06 88.00 466.31 90.33 142.77 540.39 1722.88 1735.29 3138.38 #2021-01-07 89.67 477.74 95.16 151.27 558.36 1774.34 1787.25 3162.16 #2021-01-08 91.30 485.10 94.58 149.77 570.53 1797.83 1807.21 3182.70 #2021-01-11 90.91 474.24 97.25 148.13 557.04 1756.29 1766.72 3114.21
次に、年初時(2021年1月4日)の株価を 「100」 に 補正して、アニメーション用にデータを加工する。
#年初時の株価を「100」に補正 stock.c <- stock for(n in 1:ncol(stock)){ stock.c[,n] <- round(as.numeric(stock[,n])/as.numeric(stock[1,n])*100, 3) } #途中表示 head(stock.c) # ATVI ADBE AMD ABNB ALGN GOOGL GOOG #2021-01-04 100.000 100.000 100.000 100.000 100.000 100.000 100.000 #2021-01-05 100.879 100.072 100.509 106.576 103.265 100.806 100.734 #2021-01-06 97.887 96.079 97.866 102.602 102.646 99.812 100.408 #2021-01-07 99.744 98.434 103.099 108.710 106.059 102.793 103.414 #2021-01-08 101.557 99.951 102.470 107.632 108.371 104.154 104.569 #2021-01-11 101.123 97.713 105.363 106.453 105.809 101.747 102.227 #データの行列を入れ替える stock.t <- t(stock.c) #途中表示 head(stock.t) # 2021-01-04 2021-01-05 2021-01-06 2021-01-07 2021-01-08 #ATVI 100 100.879 97.887 99.744 101.557 #ADBE 100 100.072 96.079 98.434 99.951 #AMD 100 100.509 97.866 103.099 102.470 #セクター列を追加する stock01 <- data.frame(tic=rownames(stock.t), Sector=NASDAQ100$"GICS.Sector", stock.t) rownames(stock01) <- 1:nrow(stock01) #途中表示 head(stock01) # tic Sector X2021.01.04 X2021.01.05 X2021.01.06 #1 ATVI Communication Services 100 100.879 97.887 #2 ADBE Information Technology 100 100.072 96.079 #3 AMD Information Technology 100 100.509 97.866 # X2021.01.07 X2021.01.08 X2021.01.11 X2021.01.12 X2021.01.13 #1 99.744 101.557 101.123 99.277 99.855 #2 98.434 99.951 97.713 97.179 97.262 #3 103.099 102.470 105.363 103.315 99.437 #少しデータを間引く stock02 <- stock01[,c(1:2, seq(3, ncol(stock01), by=5))] #データの並びを変える stock03 <- tidyr::gather(stock02, key="date", value="close", -c(tic, Sector)) stock03$date <- sub("X", "", stock03$date) stock03$date <- gsub("\\.", "/", stock03$date) stock03$date <- paste0(stock03$date, "-16-00-00") #途中経過を表示 head(stock03) # tic Sector date close #1 ATVI Communication Services 2021/01/04-16-00-00 100 #2 ADBE Information Technology 2021/01/04-16-00-00 100 #3 AMD Information Technology 2021/01/04-16-00-00 100 #4 ABNB Consumer Discretionary 2021/01/04-16-00-00 100 #5 ALGN Health Care 2021/01/04-16-00-00 100 #6 GOOGL Communication Services 2021/01/04-16-00-00 100
ここで、必要なパッケージの準備を行う。
#インストール install.packages(c("ggplot2", "treemapify", "gganimate", "gapminder", "gifski")) #ロード library(ggplot2) library(treemapify) library(gganimate) library(gapminder) library(gifski)
次に、stock03
のデータを使って、アニメーションを作成してみる。
#日時列に変える stock03$date <- as.Date(stock03$date) #途中経過を表示 head(stock03) # tic Sector date close #1 ATVI Communication Services 2021-01-04 100 #2 ADBE Information Technology 2021-01-04 100 #3 AMD Information Technology 2021-01-04 100 #4 ABNB Consumer Discretionary 2021-01-04 100 #5 ALGN Health Care 2021-01-04 100 #6 GOOGL Communication Services 2021-01-04 100 #株価の変動幅から、カラーを決める stock03$dclose <- stock03$close - 100 stock03$dclose2 <- NA colfunc <- grDevices::colorRampPalette(c("brown3", "white", "darkgreen")) #色で区分け a <- colfunc(17) b1 <- seq(range(stock03$dclose)[1]-10, 0, length.out=9) b2 <- seq(0, range(stock03$dclose)[2]+10, length.out=9) b3 <- c(b1, b2[-1]) for(n in length(b3):1){stock03$dclose2[stock03$dclose < b3[n]] <- a[n] } #途中経過を表示 head(stock03) # tic Sector date close dclose dclose2 #1 ATVI Communication Services 2021-01-04 100 0 #DFEBDF #2 ADBE Information Technology 2021-01-04 100 0 #DFEBDF #3 AMD Information Technology 2021-01-04 100 0 #DFEBDF #4 ABNB Consumer Discretionary 2021-01-04 100 0 #DFEBDF #5 ALGN Health Care 2021-01-04 100 0 #DFEBDF #6 GOOGL Communication Services 2021-01-04 100 0 #DFEBDF
1年間のパフォーマンス
#年変動の結果表示 stock2021 <- stock03[stock03$date == "2021-12-30", c(1:3,5)] stock2021[order(stock2021$dclose, decreasing = T),] tic Sector date dclose 5105 LCID Consumer Discretionary 2021-12-30 285.956 5092 FTNT Information Technology 2021-12-30 147.170 5120 NVDA Information Technology 2021-12-30 125.615 5115 MRNA Health Care 2021-12-30 125.186 5083 DDOG Information Technology 2021-12-30 96.131 5108 MRVL Information Technology 2021-12-30 88.815 5064 AMAT Information Technology 2021-12-30 81.858 5098 INTU Information Technology 2021-12-30 73.086 5056 GOOGL Communication Services 2021-12-30 69.397 5057 GOOG Communication Services 2021-12-30 68.961 5066 TEAM Information Technology 2021-12-30 66.419 5151 ZS Information Technology 2021-12-30 65.531 5102 KLAC Information Technology 2021-12-30 64.053 5065 ASML Information Technology 2021-12-30 60.044 5125 PANW Information Technology 2021-12-30 59.649 5053 AMD Information Technology 2021-12-30 57.259 5072 AVGO Information Technology 2021-12-30 56.408 5114 MSFT Information Technology 2021-12-30 55.873 5122 ORLY Consumer Discretionary 2021-12-30 54.585 5084 DXCM Health Care 2021-12-30 51.314 5126 PAYX Information Technology 2021-12-30 51.007 5104 LRCX Information Technology 2021-12-30 50.272 5149 XLNX Information Technology 2021-12-30 50.172 5080 COST Consumer Staples 2021-12-30 48.339 5141 TSLA Consumer Discretionary 2021-12-30 46.668 5139 SNPS Information Technology 2021-12-30 45.587 5068 ADP Information Technology 2021-12-30 45.336 5076 CSCO Information Technology 2021-12-30 44.722 5121 NXPI Information Technology 2021-12-30 41.076 5089 EXC Utilities 2021-12-30 39.300 5073 CDNS Information Technology 2021-12-30 38.664 5063 AAPL Information Technology 2021-12-30 37.702 5099 ISRG Health Care 2021-12-30 36.238 5095 IDXX Health Care 2021-12-30 34.478 5090 FAST Industrials 2021-12-30 33.914 5132 REGN Health Care 2021-12-30 33.063 5107 MAR Consumer Discretionary 2021-12-30 32.131 5086 DLTR Consumer Discretionary 2021-12-30 32.102 5087 EBAY Consumer Discretionary 2021-12-30 29.670 5111 FB Communication Services 2021-12-30 28.043 5075 CTAS Industrials 2021-12-30 27.925 5082 CSX Industrials 2021-12-30 27.242 5112 MCHP Information Technology 2021-12-30 26.957 5113 MU Information Technology 2021-12-30 26.793 5055 ALGN Health Care 2021-12-30 25.787 5146 WBA Consumer Staples 2021-12-30 25.580 5079 CPRT Industrials 2021-12-30 25.222 5131 QCOM Information Technology 2021-12-30 23.051 5093 GILD Health Care 2021-12-30 22.043 5054 ABNB Consumer Discretionary 2021-12-30 21.294 5147 WDAY Information Technology 2021-12-30 21.229 5061 ADI Information Technology 2021-12-30 19.696 5129 PEP Consumer Staples 2021-12-30 19.685 5143 VRSN Information Technology 2021-12-30 19.070 5052 ADBE Information Technology 2021-12-30 17.553 5119 NFLX Communication Services 2021-12-30 17.066 5142 TXN Information Technology 2021-12-30 16.761 5101 KDP Consumer Staples 2021-12-30 16.113 5116 MDLZ Consumer Staples 2021-12-30 13.519 5062 ANSS Information Technology 2021-12-30 13.423 5138 SBUX Consumer Discretionary 2021-12-30 12.745 5144 VRSK Industrials 2021-12-30 12.426 5077 CTSH Information Technology 2021-12-30 11.987 5106 LULU Consumer Discretionary 2021-12-30 11.866 5071 BKNG Consumer Discretionary 2021-12-30 10.713 5059 AEP Utilities 2021-12-30 8.769 5118 NTES Communication Services 2021-12-30 7.246 5058 AMZN Consumer Discretionary 2021-12-30 5.845 5117 MNST Consumer Staples 2021-12-30 5.287 5096 ILMN Health Care 2021-12-30 4.481 5081 CRWD Information Technology 2021-12-30 4.220 5103 KHC Consumer Staples 2021-12-30 4.178 5097 INTC Information Technology 2021-12-30 4.168 5135 SIRI Communication Services 2021-12-30 4.052 5148 XEL Utilities 2021-12-30 3.914 5136 SWKS Information Technology 2021-12-30 3.370 5124 PCAR Industrials 2021-12-30 3.187 5074 CHTR Communication Services 2021-12-30 1.576 5078 CMCSA Communication Services 2021-12-30 0.158 5060 AMGN Health Care 2021-12-30 -0.084 5094 HON Industrials 2021-12-30 -0.404 5070 BIIB Health Care 2021-12-30 -1.214 5133 ROST Consumer Discretionary 2021-12-30 -2.392 5145 VRTX Health Care 2021-12-30 -3.042 5088 EA Communication Services 2021-12-30 -3.620 5067 ADSK Information Technology 2021-12-30 -5.097 5091 FISV Information Technology 2021-12-30 -6.612 5134 SGEN Health Care 2021-12-30 -6.673 5123 OKTA Information Technology 2021-12-30 -10.221 5109 MTCH Communication Services 2021-12-30 -11.153 5140 TMUS Communication Services 2021-12-30 -12.021 5127 PYPL Information Technology 2021-12-30 -17.265 5110 MELI Consumer Discretionary 2021-12-30 -17.289 5100 JD Consumer Discretionary 2021-12-30 -18.452 5051 ATVI Communication Services 2021-12-30 -24.928 5137 SPLK Information Technology 2021-12-30 -30.222 5085 DOCU Information Technology 2021-12-30 -30.237 5069 BIDU Communication Services 2021-12-30 -30.530 5150 ZM Information Technology 2021-12-30 -47.075 5130 PDD Consumer Discretionary 2021-12-30 -64.354 5128 PTON Consumer Discretionary 2021-12-30 -74.520
ツリーマップのアニメーションを作成する
#ツリーマップの作成 p <- ggplot(stock03, aes(label=tic, area = close, fill = dclose2, subgroup = Sector )) + geom_treemap( layout = "squarified", colour="white", start="topleft") + scale_fill_identity() + geom_treemap_subgroup_border(layout = "squarified", colour = "white", size = 5, start="topleft") + geom_treemap_subgroup_text(layout = "squarified", place = "top", grow = T, alpha = 1, colour = "#FAFAFA", min.size = 0, start = "topleft") + geom_treemap_text(layout = "squarified", place = "centre", grow = TRUE, colour = "grey50", min.size = 8, reflow = T, start = "topleft") + transition_time(date) + labs(title = "NASDAQ-100, Date: {frame_time}") + ease_aes('linear') #アニメーションとして出力(2-3分くらいかかる) animate(p, duration = 50, width = 500, height = 500, renderer = gifski_renderer("NASDAQ100_animation.gif"))
まとめ
2021年の振り返りに、NASDAQ-100の全銘柄コードの取得から、 株価変動のアニメーション作成までのRコードと実行結果を紹介した。
2021年も、指数を自己流ポートフォリオで、 オーバーパフォームするのはとってもとっても難しかった(泣)。
Webスクレイピングについての関連図書
Webスクレイピングの関連図書を列挙しておきます。