- 文章信息
- 作者: kaiwu
- 点击数:540

website | https://www.r-project.org/ |
download | https://cran.r-project.org/ |
wikipedia | |
R Packages | https://cran.r-project.org/web/packages/index.html |
Rtools | https://cran.r-project.org/bin/windows/Rtools/ |
R Journal | https://journal.r-project.org/ |
R Manuals | https://cran.r-project.org/manuals.html |
Rstudio | https://posit.co/ |
R is a programming language for statistical computing and graphics supported by the R Core Team and the R Foundation for Statistical Computing. Created by statisticians Ross Ihaka and Robert Gentleman, R is used among data miners, bioinformaticians and statisticians for data analysis and developing statistical software.[6] Users have created packages to augment the functions of the R language.
According to user surveys and studies of scholarly literature databases, R is one of the most commonly used programming languages used in data mining.[7] As of March 2022, R ranks 11th in the TIOBE index, a measure of programming language popularity, in which the language peaked in 8th place in August 2020.[8][9]
The official R software environment is an open-source free software environment within the GNU package, available under the GNU General Public License. It is written primarily in C, Fortran, and R itself (partially self-hosting). Precompiled executables are provided for various operating systems. R has a command line interface.[10] Multiple third-party graphical user interfaces are also available, such as RStudio, an integrated development environment, and Jupyter, a notebook interface.
https://posit.co/download/rstudio-desktop/
R 4.2.2 for windows
https://od.lk/d/178336078_aJAHZ/R-4.2.2-win%20%281%29.exe
Rstudio for windows
https://od.lk/d/178336079_xwuA8/RStudio-2022.07.2-576%20%281%29.exe
1.about the dataset (tourist satisfaction)
1.1questionnaire
1.2 data for general purpose (without variable labels and value labels)
or
1.3 R markdown file
leaflet
Leaflet is the leading open-source JavaScript library for mobile-friendly interactive maps. Weighing just about 42 KB of JS, it has all the mapping features most developers ever need.
openstreet map
Importance-Performance Analysis
Tourist2022EN
Kai Wu
2022/12/5
tourist satisfaction survey dataset(simulated data,45 variables,373 cases)
###(1)questionnaire https://od.lk/d/178312136_WyH9N/tourist_satisfaction_questionnaire_en.docx https://od.lk/d/178312135_ZW7sa/tourist_satisfaction_questionnaire_en.pdf
###(2)DATA https://od.lk/d/178313264_Tr9Ti/data_tourist_satisfaction.csv
https://od.lk/d/178313263_MVBwn/data_tourist_satisfaction_en.xlsx
###(3)Rmarkdown file(codes for data analysis)
1.data preparation
alternative terms of data preparation: data wrangling、data munging、data cleaning https://theappsolutions.com/blog/development/data-wrangling-guide-to-data-preparation/
Data Wrangling in R (1)Dplyr - essential data-munging R package. Supreme data framing tool. Especially useful for data management operating by categories. (2)Purrr - good for list function operations and error-checking. (3)Splitstackshape - an oldie but goldie. Good for shaping complex data sets and simplifying the visualization. (4)JSOnline - nice and easy parsing tool. (5)Magrittr - good for wrangling scattered sets and putting them into a more coherent form.
1.1 load packages
if(!isTRUE(require("Hmisc"))){install.packages("Hmisc")}#statistical analysis
## 载入需要的程辑包:Hmisc
## 载入需要的程辑包:lattice
## 载入需要的程辑包:survival
## 载入需要的程辑包:Formula
## 载入需要的程辑包:ggplot2
##
## 载入程辑包:'Hmisc'
## The following objects are masked from 'package:base':
##
## format.pval, units
if(!isTRUE(require("psych"))){install.packages("psych")}#statistical analysis
## 载入需要的程辑包:psych
##
## 载入程辑包:'psych'
## The following object is masked from 'package:Hmisc':
##
## describe
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
if(!isTRUE(require("lavaan"))){install.packages("lavaan")} #The lavaan package is developed to provide useRs, researchers and teachers a free open-source, but commercial-quality package for latent variable modeling
## 载入需要的程辑包:lavaan
## This is lavaan 0.6-12
## lavaan is FREE software! Please report any bugs.
##
## 载入程辑包:'lavaan'
## The following object is masked from 'package:psych':
##
## cor2cov
if(!isTRUE(require("broom"))){install.packages("broom")}# tidy report
## 载入需要的程辑包:broom
if(!isTRUE(require("ggplot2"))){install.packages("ggplot2")}# data visualization
if(!isTRUE(require("ggrepel"))){install.packages("ggrepel")} # provides geoms for ggplot2 to repel overlapping text labels
## 载入需要的程辑包:ggrepel
if(!isTRUE(require("corrplot"))){install.packages("corrplot")} #A visualization of a correlation matrix
## 载入需要的程辑包:corrplot
## corrplot 0.92 loaded
if(!isTRUE(require("semPlot"))){install.packages("semPlot")} #Description Path diagrams and visual analysis of various SEM packages' output
## 载入需要的程辑包:semPlot
if(!isTRUE(require("leaflet"))){install.packages("leaflet")}# maps based on openstreet data
## 载入需要的程辑包:leaflet
install packages
#install.packages(Hmisc)
#install.packages(psych)
#install.packages(lavaan)
#install.packages(broom)
#install.packages(ggplot2)
#install.packages(ggrepel)
#install.packages(corrplot)
#install.packages(semPlot)
#install.packages(leaflet)
1.2 import dataset
datafolder is your working folder, you can modify this by yourself
#datafolder is your working folder, you can modify this by yourself
datafolder<-'D:/pdata/rdata/tourist_en/'
#tourist<-read.csv('https://od.lk/d/178313264_Tr9Ti/data_tourist_satisfaction.csv',header=TRUE,encoding = 'UTF-8')
tourist<-read.csv(paste(datafolder,'data_tourist_satisfaction.csv',sep=""),header=TRUE,encoding = 'UTF-8')
tail(tourist)
## sid gender byear region income expense type3 type2 thotel sat1 sat2 sat3
## 368 rec368 2 1962 1 1826 269 2 1 1 4 4 3
## 369 rec369 2 1982 6 1869 301 3 2 4 5 5 3
## 370 rec370 2 1959 5 2766 261 2 2 1 5 4 4
## 371 rec371 2 1989 7 2023 210 3 2 1 4 3 4
## 372 rec372 2 1996 3 2062 266 2 1 1 5 5 4
## 373 rec373 2 2000 6 3311 444 3 2 4 5 3 4
## sat4 sat5 sat6 ri1 ri2 ri3 ri4 ri5 rp1 rp2 rp3 rp4 rp5 te1 te2 te3 te4 te5
## 368 2 4 3 5 5 5 5 5 5 5 5 5 5 4 4 4 4 4
## 369 4 3 3 5 5 5 5 5 5 5 4 5 5 5 4 3 4 4
## 370 4 5 3 4 4 5 5 5 4 4 5 5 5 4 3 4 4 4
## 371 3 5 5 5 5 5 5 5 5 5 4 5 5 4 5 4 5 3
## 372 4 4 3 5 5 5 5 5 5 5 5 5 5 5 5 5 5 4
## 373 5 4 4 5 5 4 5 5 5 5 4 5 5 4 4 4 4 5
## te6 te7 te8 zh1 zh2 zh3 zh4 zh5 zh6 zh7 latitude longitude
## 368 4 4 4 4 4 5 4 2 4 4 31.24039 121.4264
## 369 5 4 4 5 5 5 5 5 5 5 31.17127 121.4191
## 370 4 3 4 3 4 3 4 4 3 4 31.14341 121.6599
## 371 5 3 4 1 1 2 1 1 1 1 31.20616 121.4589
## 372 5 5 3 1 1 1 1 1 1 1 31.23762 121.5243
## 373 5 5 4 4 4 3 1 2 1 1 31.23350 121.4776
##### practice 1 : download the data and import the data into the working space
1.3 value labels
add value labels to categorical variables.
#value labels
tourist$gender <- factor(tourist$gender,levels = c(1,2),labels = c("male", "female"))
tourist$thotel <- factor(tourist$thotel,levels = c(1,2,3,4),labels = c("budget hotel", "luxury hotel", "bed and breakfast", "apartment hotel"))
tourist$type3 <- factor(tourist$type3,levels = c(1,2,3),labels = c("natural scenery", "historical scenery", "mixed scenery"))
tourist$type2 <- factor(tourist$type2,levels = c(1,2),labels = c("sightseeing", "participation"))
head(tourist)
## sid gender byear region income expense type3 type2
## 1 rec001 female 1971 4 2708 432 historical scenery participation
## 2 rec002 female 1995 6 1884 238 historical scenery participation
## 3 rec003 male 1990 3 2458 399 mixed scenery sightseeing
## 4 rec004 male 1970 6 2726 245 natural scenery sightseeing
## 5 rec005 female 1964 7 3084 287 historical scenery participation
## 6 rec006 female 1965 2 2184 216 natural scenery sightseeing
## thotel sat1 sat2 sat3 sat4 sat5 sat6 ri1 ri2 ri3 ri4 ri5 rp1 rp2
## 1 bed and breakfast 4 2 2 3 4 4 4 4 4 5 4 4 5
## 2 budget hotel 5 5 4 5 3 5 4 3 4 4 5 4 3
## 3 budget hotel 3 4 5 3 3 2 5 5 5 5 5 5 5
## 4 luxury hotel 5 2 3 5 4 4 4 5 5 5 5 4 5
## 5 budget hotel 5 3 5 2 4 5 5 5 5 5 5 5 5
## 6 bed and breakfast 4 3 3 2 5 3 4 4 5 5 5 4 4
## rp3 rp4 rp5 te1 te2 te3 te4 te5 te6 te7 te8 zh1 zh2 zh3 zh4 zh5 zh6 zh7
## 1 4 5 4 4 4 4 4 4 4 4 4 3 1 1 2 3 2 2
## 2 4 4 5 5 5 4 4 4 5 3 4 4 4 5 3 5 5 5
## 3 4 5 5 3 4 4 4 4 4 4 4 1 1 2 2 1 3 1
## 4 5 5 5 4 4 4 4 5 4 2 4 3 3 3 3 2 2 2
## 5 5 5 5 4 5 4 4 5 5 4 5 1 1 1 4 2 1 1
## 6 5 5 5 1 3 3 3 1 1 3 2 4 3 3 4 2 4 4
## latitude longitude
## 1 31.14558 121.7142
## 2 31.14376 121.6605
## 3 31.27034 121.4915
## 4 31.18496 121.3085
## 5 31.17656 121.3151
## 6 31.19191 121.4547
##### practice02
##### add value labels to variable of region
##### 1 Central China
##### 2 East China
##### 3 North China
##### 4 Northeast China
##### 5 Northwest China
##### 6 Southwest China
##### 7 West China
1.4 compute variables
compute variables
#compute the overall of satisfaction
tourist<-transform(tourist,sat=(sat1+sat2+sat3+sat4+sat5+sat6)/6)
summary(tourist$sat)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.667 3.667 4.000 3.994 4.333 5.000
#compute variables
# practice 03 : compute the age
# age = 2022 - byear
#compute variables
# practice 03 : compute the age
# age = 2022 - byear
tourist<-transform(tourist,age = 2022 - byear)
1.5 recode
# recode income3 (income range)
attach(tourist)
tourist$income3[income < 2000] <- 1
tourist$income3[income >= 2000 & income <3000] <- 2
tourist$income3[income >= 3000] <- 3
detach(tourist)
# add value labels
tourist$income3 <- factor(tourist$income3,levels = c(1,2,3),labels = c("below 2000", "2000-3000", "above 3000"))
summary(tourist$income3)
## below 2000 2000-3000 above 3000
## 76 191 106
# recode tourism expense
attach(tourist)
tourist$expense3[expense < 300] <- 1
tourist$expense3[expense >= 300 & expense < 400] <- 2
tourist$expense3[expense > 400] <- 3
detach(tourist)
# add value labels
tourist$expense3 <- factor(tourist$expense3,levels = c(1,2,3),labels = c("below 300", "300-399", "above 400"))
summary(tourist$expense3)
## below 300 300-399 above 400
## 152 144 77
# recode age range
# practice 04 : recode age
# recode age range
attach(tourist)
tourist$age4[age < 20] <- 1
tourist$age4[age >= 20 & age < 40] <- 2
tourist$age4[age >= 40 & age < 60] <- 3
tourist$age4[age >= 60] <- 4
detach(tourist)
# add value labels
tourist$age4 <- factor(tourist$age4,levels = c(1,2,3,4),labels = c("below 20", "20-39","40-59", "above 60"))
summary(tourist$age4)
## below 20 20-39 40-59 above 60
## 30 127 147 69
1.6 summary()
look through the dataset
summary(tourist)
## sid gender byear region income
## Length:373 male :214 Min. :1952 Min. :1.000 Min. :1133
## Class :character female:159 1st Qu.:1965 1st Qu.:2.000 1st Qu.:2121
## Mode :character Median :1978 Median :4.000 Median :2654
## Mean :1979 Mean :3.761 Mean :2590
## 3rd Qu.:1993 3rd Qu.:6.000 3rd Qu.:3084
## Max. :2005 Max. :7.000 Max. :3773
## expense type3 type2
## Min. :199.0 natural scenery : 58 sightseeing :178
## 1st Qu.:262.0 historical scenery:132 participation:195
## Median :326.0 mixed scenery :183
## Mean :330.6
## 3rd Qu.:391.0
## Max. :556.0
## thotel sat1 sat2 sat3
## budget hotel :163 Min. :3.000 Min. :2.000 Min. :1.000
## luxury hotel : 35 1st Qu.:4.000 1st Qu.:3.000 1st Qu.:3.000
## bed and breakfast:121 Median :5.000 Median :4.000 Median :4.000
## apartment hotel : 54 Mean :4.349 Mean :3.743 Mean :3.676
## 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:5.000
## Max. :5.000 Max. :5.000 Max. :5.000
## sat4 sat5 sat6 ri1
## Min. :2.000 Min. :3.000 Min. :2.000 Min. :3.000
## 1st Qu.:4.000 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:4.000
## Median :4.000 Median :4.000 Median :4.000 Median :4.000
## Mean :4.214 Mean :4.097 Mean :3.887 Mean :4.351
## 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:5.000
## Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.000
## ri2 ri3 ri4 ri5
## Min. :3.000 Min. :3.000 Min. :3.000 Min. :3.000
## 1st Qu.:4.000 1st Qu.:4.000 1st Qu.:5.000 1st Qu.:5.000
## Median :5.000 Median :5.000 Median :5.000 Median :5.000
## Mean :4.491 Mean :4.499 Mean :4.708 Mean :4.727
## 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:5.000
## Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.000
## rp1 rp2 rp3 rp4
## Min. :3.000 Min. :3.000 Min. :2.000 Min. :3.000
## 1st Qu.:4.000 1st Qu.:4.000 1st Qu.:4.000 1st Qu.:5.000
## Median :4.000 Median :5.000 Median :5.000 Median :5.000
## Mean :4.316 Mean :4.515 Mean :4.383 Mean :4.721
## 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:5.000
## Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.000
## rp5 te1 te2 te3
## Min. :3.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:4.000 1st Qu.:4.000 1st Qu.:3.000 1st Qu.:4.000
## Median :5.000 Median :4.000 Median :4.000 Median :4.000
## Mean :4.643 Mean :3.912 Mean :3.791 Mean :4.131
## 3rd Qu.:5.000 3rd Qu.:5.000 3rd Qu.:4.000 3rd Qu.:5.000
## Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.000
## te4 te5 te6 te7
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:4.000 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:3.000
## Median :4.000 Median :4.000 Median :4.000 Median :4.000
## Mean :4.078 Mean :3.756 Mean :3.898 Mean :3.729
## 3rd Qu.:5.000 3rd Qu.:4.000 3rd Qu.:5.000 3rd Qu.:4.000
## Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.000
## te8 zh1 zh2 zh3
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:3.000 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:1.000
## Median :4.000 Median :3.000 Median :3.000 Median :3.000
## Mean :3.657 Mean :2.836 Mean :2.627 Mean :2.702
## 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.000
## zh4 zh5 zh6 zh7
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:1.000
## Median :2.000 Median :3.000 Median :3.000 Median :3.000
## Mean :2.552 Mean :2.598 Mean :2.528 Mean :2.555
## 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.000
## latitude longitude sat age
## Min. :30.79 Min. :121.0 Min. :2.667 Min. :17.00
## 1st Qu.:31.15 1st Qu.:121.4 1st Qu.:3.667 1st Qu.:29.00
## Median :31.20 Median :121.5 Median :4.000 Median :44.00
## Mean :31.21 Mean :121.5 Mean :3.994 Mean :43.12
## 3rd Qu.:31.23 3rd Qu.:121.7 3rd Qu.:4.333 3rd Qu.:57.00
## Max. :31.72 Max. :121.9 Max. :5.000 Max. :70.00
## income3 expense3 age4
## below 2000: 76 below 300:152 below 20: 30
## 2000-3000 :191 300-399 :144 20-39 :127
## above 3000:106 above 400: 77 40-59 :147
## above 60: 69
##
##
1.7 distribution of the data
# density chart
plot(density(tourist$sat))
plot(density(tourist$sat),main="", xlab='overall satisfaction', ylab='frequenccy')
# histgram chart
hist(tourist$sat)
hist(tourist$sat,main="", xlab='overall satisfaction data', ylab='frequency')
# practice 05 : check the distribution of income
for loop: distribution of 31 variables
#for (i in 10:40){
# plot(density(tourist[,i]))
# plot(density(tourist[,i]),main="", xlab=names(tourist[i]))
#}
for loop: distribution of 31 variables histgram chart
#for (i in 10:40){
# hist(tourist[,i],main="", xlab=names(tourist[i]))
# h<-hist(tourist[,i],plot=FALSE)
# hist(tourist[,i],main="",xlab=names(tourist[i]),labels = TRUE,ylim=c(0,1.1*max(h$counts)))
# https://stackoverflow.com/questions/9317948/how-to-label-histogram-bars-with-data-values-or-percents-in-r
#}
1.8 save the dataset
save(tourist,file=paste(datafolder,'tourist_EN.rda',sep=""))
2. basic analysis
2.1 frequency analysis
#frequency table of gender
t1<-table(tourist$gender)
t1
##
## male female
## 214 159
#percentage of gender
prop.table(t1)
##
## male female
## 0.5737265 0.4262735
round(prop.table(t1)*100,2)
##
## male female
## 57.37 42.63
for tidy report,we can transfer the table into data frame
#t1b is on frequency
t1b<-as.data.frame(table(tourist$gender))
# t1c is on percentage
t1c<-as.data.frame(round(prop.table(t1)*100,2))
#combine t1b and t1c
t1d<-cbind(t1b,t1c$Freq)
#rename the variable names
colnames(t1d)<-c('gender','frequency','percent')
print(t1d)
## gender frequency percent
## 1 male 214 57.37
## 2 female 159 42.63
#export the result as a csv file.
write.csv(t1d,paste(datafolder,'tourist_gender.csv',sep=""))
#frequency of hotel type
t2<-table(tourist$thotel)
t2
##
## budget hotel luxury hotel bed and breakfast apartment hotel
## 163 35 121 54
round(prop.table(t2)*100,2)
##
## budget hotel luxury hotel bed and breakfast apartment hotel
## 43.70 9.38 32.44 14.48
##### practice 06 frequency analysis on region
hist(tourist$byear)
# with labels(counts)
h2<-hist(tourist$byear,plot=FALSE)
hist(tourist$byear,main="", xlab='birth year')
hist(tourist$byear,main="", xlab='birth year',labels = TRUE)
hist(tourist$byear,main="", xlab='birth year',labels = TRUE,ylim=c(0,50))
hist(tourist$byear,main="", xlab='birth year',labels = TRUE,ylim=c(0,50),col='lightblue')
hist(tourist$byear,main="", xlab='birth year',labels = TRUE,ylim=c(0,50),col='#ffb01f')
#html color codes
#https://html-color.codes/
# r colors names
# http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf
# https://www.datamentor.io/r-programming/color/
##### practice 7 :
##### step1 : 1.4 compute age based on birth year
##### step2 : 1.7 check the distribution of age
##### step3 : 1.5 record age variable age6
##### (below 20,20-29,30-39,40-49,50-59,above 60)
##### step4 : 2.1 conduct frequency analysis of age6
# solutions to practice 07
age<-2022-tourist$byear
hist(age)
attach(tourist)
## The following object is masked _by_ .GlobalEnv:
##
## age
tourist$age6[age < 20] <- 1
tourist$age6[age >= 20 & age < 30] <- 2
tourist$age6[age >= 30 & age < 40] <- 3
tourist$age6[age >= 40 & age < 50] <- 4
tourist$age6[age >= 50 & age < 60] <- 5
tourist$age6[age >= 60] <- 6
detach(tourist)
tourist$age6 <- factor(tourist$age6,levels = c(1,2,3,4,5,6),labels = c("below 20", "20-29","30-39","40-49", "50-59","above 60"))
t3<-table(tourist$age6)
t3
##
## below 20 20-29 30-39 40-49 50-59 above 60
## 30 65 62 73 74 69
round(prop.table(t3)*100,2)
##
## below 20 20-29 30-39 40-49 50-59 above 60
## 8.04 17.43 16.62 19.57 19.84 18.50
2.2 crosstable and chi-square test
### crosstable
ct1<-table(tourist$thotel,tourist$gender)
prop.table(ct1)
##
## male female
## budget hotel 0.23592493 0.20107239
## luxury hotel 0.05898123 0.03485255
## bed and breakfast 0.20375335 0.12064343
## apartment hotel 0.07506702 0.06970509
prop.table(ct1,1)
##
## male female
## budget hotel 0.5398773 0.4601227
## luxury hotel 0.6285714 0.3714286
## bed and breakfast 0.6280992 0.3719008
## apartment hotel 0.5185185 0.4814815
prop.table(ct1,2)
##
## male female
## budget hotel 0.41121495 0.47169811
## luxury hotel 0.10280374 0.08176101
## bed and breakfast 0.35514019 0.28301887
## apartment hotel 0.13084112 0.16352201
round(prop.table(ct1,1)*100,2)
##
## male female
## budget hotel 53.99 46.01
## luxury hotel 62.86 37.14
## bed and breakfast 62.81 37.19
## apartment hotel 51.85 48.15
# summary of chi-square test
summary(ct1)
## Number of cases in table: 373
## Number of factors: 2
## Test for independence of all factors:
## Chisq = 3.33, df = 3, p-value = 0.3435
#
#transfer ct1 into data frame
ct1b<-as.data.frame(ct1)
#
ct1c<-as.data.frame(round(prop.table(ct1,1)*100,2))
#combine
ct1d<-cbind(ct1b,ct1c$Freq)
#rename
colnames(ct1d)<-c('hotel','gender','frequency','percent')
print(ct1d)
## hotel gender frequency percent
## 1 budget hotel male 88 53.99
## 2 luxury hotel male 22 62.86
## 3 bed and breakfast male 76 62.81
## 4 apartment hotel male 28 51.85
## 5 budget hotel female 75 46.01
## 6 luxury hotel female 13 37.14
## 7 bed and breakfast female 45 37.19
## 8 apartment hotel female 26 48.15
#export
write.csv(ct1d,paste(datafolder,'tourist_hotel.csv',sep=""))
##### practice 08 :
##### is there significant relationship between income3 and thotel?
2.3 independent T-test
t.test(tourist$sat1~tourist$gender)
##
## Welch Two Sample t-test
##
## data: tourist$sat1 by tourist$gender
## t = 0.89559, df = 332.88, p-value = 0.3711
## alternative hypothesis: true difference in means between group male and group female is not equal to 0
## 95 percent confidence interval:
## -0.08414446 0.22480161
## sample estimates:
## mean in group male mean in group female
## 4.378505 4.308176
#tidy report
ttest1<-t.test(tourist$sat1~tourist$gender)
ttest12<-tidy(ttest1)
ttest12
## # A tibble: 1 × 10
## estim…¹ estim…² estim…³ stati…⁴ p.value param…⁵ conf.…⁶ conf.…⁷ method alter…⁸
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
## 1 0.0703 4.38 4.31 0.896 0.371 333. -0.0841 0.225 Welch… two.si…
## # … with abbreviated variable names ¹estimate, ²estimate1, ³estimate2,
## # ⁴statistic, ⁵parameter, ⁶conf.low, ⁷conf.high, ⁸alternative
#export
write.csv(ttest12,paste(datafolder,'sat1_gender.csv',sep=""))
##### practice 09 :
##### H3 female is more satisfied with scenery than male
2.4 ANOVA
fit<-aov(sat1~income3,tourist)
summary(fit)
## Df Sum Sq Mean Sq F value Pr(>F)
## income3 2 0.72 0.3594 0.646 0.525
## Residuals 370 205.97 0.5567
tidy(fit)
## # A tibble: 2 × 6
## term df sumsq meansq statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 income3 2 0.719 0.359 0.646 0.525
## 2 Residuals 370 206. 0.557 NA NA
2.5 correlation
#scatter chart
plot(tourist$income,tourist$expense)
cor.test(tourist$income,tourist$expense)
##
## Pearson's product-moment correlation
##
## data: tourist$income and tourist$expense
## t = 9.5162, df = 371, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.3574822 0.5210526
## sample estimates:
## cor
## 0.4429459
cor(tourist$income,tourist$expense,method="spearman")
## [1] 0.4027449
tidy report
cor_income_expense<-cor.test(tourist$income,tourist$expense)
cor_income_expense<-tidy(cor_income_expense)
cor_income_expense
## # A tibble: 1 × 8
## estimate statistic p.value parameter conf.low conf.high method alter…¹
## <dbl> <dbl> <dbl> <int> <dbl> <dbl> <chr> <chr>
## 1 0.443 9.52 2.35e-19 371 0.357 0.521 Pearson's pr… two.si…
## # … with abbreviated variable name ¹alternative
#export
write.csv(cor_income_expense,paste(datafolder,'cor_income_expense.csv',sep=""))
correlation matrix
cor_data<-tourist[,c(5,6,10:15)]
mydata.cor = cor(cor_data)
mydata.cor
## income expense sat1 sat2 sat3
## income 1.00000000 0.442945941 0.026740705 -0.01573717 -0.03478183
## expense 0.44294594 1.000000000 0.002191373 -0.02832285 -0.09644656
## sat1 0.02674071 0.002191373 1.000000000 0.19670856 0.10043646
## sat2 -0.01573717 -0.028322850 0.196708559 1.00000000 -0.04452174
## sat3 -0.03478183 -0.096446564 0.100436456 -0.04452174 1.00000000
## sat4 0.01931803 -0.088759307 0.044004353 0.06493726 0.04694267
## sat5 -0.03642428 -0.060271195 -0.039155116 0.09324654 0.09942254
## sat6 -0.04216386 0.073107112 0.054031745 -0.01636401 -0.11162753
## sat4 sat5 sat6
## income 0.01931803 -0.03642428 -0.04216386
## expense -0.08875931 -0.06027119 0.07310711
## sat1 0.04400435 -0.03915512 0.05403174
## sat2 0.06493726 0.09324654 -0.01636401
## sat3 0.04694267 0.09942254 -0.11162753
## sat4 1.00000000 -0.09264920 0.10743228
## sat5 -0.09264920 1.00000000 0.02634869
## sat6 0.10743228 0.02634869 1.00000000
#rcorr is from the Hmisc package
mydata.rcorr <- rcorr(as.matrix(cor_data))
mydata.coeff <- mydata.rcorr$r
mydata.coeff
## income expense sat1 sat2 sat3
## income 1.00000000 0.442945941 0.026740705 -0.01573717 -0.03478183
## expense 0.44294594 1.000000000 0.002191373 -0.02832285 -0.09644656
## sat1 0.02674071 0.002191373 1.000000000 0.19670856 0.10043646
## sat2 -0.01573717 -0.028322850 0.196708559 1.00000000 -0.04452174
## sat3 -0.03478183 -0.096446564 0.100436456 -0.04452174 1.00000000
## sat4 0.01931803 -0.088759307 0.044004353 0.06493726 0.04694267
## sat5 -0.03642428 -0.060271195 -0.039155116 0.09324654 0.09942254
## sat6 -0.04216386 0.073107112 0.054031745 -0.01636401 -0.11162753
## sat4 sat5 sat6
## income 0.01931803 -0.03642428 -0.04216386
## expense -0.08875931 -0.06027119 0.07310711
## sat1 0.04400435 -0.03915512 0.05403174
## sat2 0.06493726 0.09324654 -0.01636401
## sat3 0.04694267 0.09942254 -0.11162753
## sat4 1.00000000 -0.09264920 0.10743228
## sat5 -0.09264920 1.00000000 0.02634869
## sat6 0.10743228 0.02634869 1.00000000
mydata.p <- mydata.rcorr$P
mydata.p
## income expense sat1 sat2 sat3 sat4
## income NA 0.00000000 0.6066875674 0.7619402861 0.50305111 0.70998546
## expense 0.0000000 NA 0.9663548597 0.5855619683 0.06277516 0.08692367
## sat1 0.6066876 0.96635486 NA 0.0001314393 0.05260579 0.39675922
## sat2 0.7619403 0.58556197 0.0001314393 NA 0.39122559 0.21084024
## sat3 0.5030511 0.06277516 0.0526057891 0.3912255868 NA 0.36595875
## sat4 0.7099855 0.08692367 0.3967592185 0.2108402357 0.36595875 NA
## sat5 0.4830900 0.24556908 0.4508720781 0.0720556601 0.05505029 0.07390625
## sat6 0.4168218 0.15881152 0.2979756738 0.7527602339 0.03113131 0.03808825
## sat5 sat6
## income 0.48309000 0.41682182
## expense 0.24556908 0.15881152
## sat1 0.45087208 0.29797567
## sat2 0.07205566 0.75276023
## sat3 0.05505029 0.03113131
## sat4 0.07390625 0.03808825
## sat5 NA 0.61197396
## sat6 0.61197396 NA
tidy(mydata.rcorr)
## # A tibble: 28 × 5
## column1 column2 estimate n p.value
## <chr> <chr> <dbl> <int> <dbl>
## 1 expense income 0.443 373 0
## 2 sat1 income 0.0267 373 0.607
## 3 sat1 expense 0.00219 373 0.966
## 4 sat2 income -0.0157 373 0.762
## 5 sat2 expense -0.0283 373 0.586
## 6 sat2 sat1 0.197 373 0.000131
## 7 sat3 income -0.0348 373 0.503
## 8 sat3 expense -0.0964 373 0.0628
## 9 sat3 sat1 0.100 373 0.0526
## 10 sat3 sat2 -0.0445 373 0.391
## # … with 18 more rows
#export
write.csv(tidy(mydata.rcorr),paste(datafolder,'correlation_matrix.csv',sep=""))
chart of correlation matrix
corrplot(mydata.cor)
3.1 ggplot2
###plot
qplot(tourist$income,tourist$expense)
## Warning: `qplot()` was deprecated in ggplot2 3.4.0.
ggplot(tourist,aes(x=income,y=expense))+
geom_point(size=1)
ggplot(tourist,aes(x=income,y=expense))+
geom_point(size=2,aes(colour=gender))
ggplot(tourist,aes(x=income,y=expense))+
geom_point(size=2,aes(colour=thotel))
ggplot(tourist,aes(x=income,y=expense))+
geom_point(size=2)+
facet_grid(type3 ~.)
ggplot(tourist,aes(x=income,y=expense))+
geom_point(size=2,aes(colour=type3))+
facet_grid(type3 ~.)
ggplot(tourist,aes(x=income,y=expense))+
geom_point(size=2,aes(colour=type3))+
facet_grid(type3 ~type2)
ggplot(tourist,aes(x=sat2,fill=gender))+
geom_histogram(binwidth=1,colour="white")+
facet_grid(gender ~.)
ggplot(tourist,aes(x=sat2,fill=type3))+
geom_histogram(binwidth=1,colour="white")+
facet_grid(type2 ~type3)
4 map
leaflet
Leaflet is the leading open-source JavaScript library for mobile-friendly interactive maps. Weighing just about 42 KB of JS, it has all the mapping features most developers ever need.
openstreet map
https://www.openstreetmap.org/
center_lon = median(tourist$longitude,na.rm = TRUE)
center_lat = median(tourist$latitude,na.rm = TRUE)
leaflet(tourist) %>% addProviderTiles("Esri.NatGeoWorldMap") %>%
addCircles(lng = ~longitude, lat = ~latitude,radius = ~sqrt(income/100)) %>%
# center,zoom level
setView(lng=center_lon, lat=center_lat,zoom = 10)
5 IPA (importance-performance analysis)
Liu, X., & Zhang, N. (2020). Research on Customer Satisfaction of Budget Hotels Based on Revised IPA and Online Reviews. Science Journal of Business and Management, 8(2), 50–56. https://doi.org/10.11648/j.sjbm.20200802.11
5.1 data wrangling
review_label<-as.data.frame(c('amount', 'publishing date', 'relevance', 'positive', 'credibility'))
colnames(review_label)<-c("label")
tourist1<-tourist[,c("ri1","ri2","ri3","ri4","ri5","rp1","rp2","rp3","rp4","rp5")]
hdata<-colMeans(tourist1)
im<-tourist[,c("ri1","ri2","ri3","ri4","ri5")]
pe<-tourist[,c("rp1","rp2","rp3","rp4","rp5")]
imdata<-as.data.frame(colMeans(im))
colnames(imdata)<-c("importance")
pedata<-as.data.frame(colMeans(pe))
colnames(pedata)<-c("performance")
ipa<-cbind(imdata,pedata,review_label)
#5.2 compute means for IPA
print("------------importance----------")
## [1] "------------importance----------"
print(paste("min :",round(min(ipa[1]),2)))
## [1] "min : 4.35"
print(paste("max :",round(max(ipa[1]),2)))
## [1] "max : 4.73"
print("------------performance----------")
## [1] "------------performance----------"
print(paste("min :",round(min(ipa[2]),2)))
## [1] "min : 4.32"
print(paste("max :",round(max(ipa[2]),2)))
## [1] "max : 4.72"
print(colMeans(ipa[,1:2]))
## importance performance
## 4.554960 4.515818
5.3 IPA chart
#2 basic of
ipa_chart<-ggplot(ipa,aes(x=performance,y=importance,))+
geom_point(size=2)+
xlim(4.2, 4.8)+ylim(4.2, 4.8)+
labs(x = "performance",y="importance")+
geom_vline(xintercept=4.516,color = "red",linetype="dashed")+
geom_hline(yintercept=4.555,color = "red",linetype="dashed")+
geom_abline(intercept = 0, slope = 1, color="blue",size=0.5)+
geom_text_repel(aes(label=label),size=4,hjust=0.3,vjust=0.3)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
ipa_chart
ggsave(paste(datafolder,'ipa_chart.png',sep=""),ipa_chart, width = 7)
## Saving 7 x 5 in image
6.EFA(explortary factor analysis)
Chapman, C., & Feit, E. M. (2019). R For Marketing Research and Analytics (2nd ). Springer. https://link.springer.com/book/10.1007%2F978-3-030-14316-9
https://link.springer.com/book/10.1007/978-3-030-14316-9
#load(url('https://od.lk/d/170336656_2dwPO/tourist_EN.rda'))
load(paste(datafolder,'tourist_EN.rda',sep=""))
6.1 datasets
satdata<-tourist[,c(10:15)]
idata<-tourist[,c(16:20)]
pdata<-tourist[,c(21:25)]
tmdata<-tourist[,c(26:33)]
zhdata<-tourist[,c(34:40)]
#zhdata<-tourist[,c('zh1','zh2','zh3','zh4','zh5','zh6','zh7')]
head(satdata)
## sat1 sat2 sat3 sat4 sat5 sat6
## 1 4 2 2 3 4 4
## 2 5 5 4 5 3 5
## 3 3 4 5 3 3 2
## 4 5 2 3 5 4 4
## 5 5 3 5 2 4 5
## 6 4 3 3 2 5 3
head(idata)
## ri1 ri2 ri3 ri4 ri5
## 1 4 4 4 5 4
## 2 4 3 4 4 5
## 3 5 5 5 5 5
## 4 4 5 5 5 5
## 5 5 5 5 5 5
## 6 4 4 5 5 5
head(pdata)
## rp1 rp2 rp3 rp4 rp5
## 1 4 5 4 5 4
## 2 4 3 4 4 5
## 3 5 5 4 5 5
## 4 4 5 5 5 5
## 5 5 5 5 5 5
## 6 4 4 5 5 5
head(tmdata)
## te1 te2 te3 te4 te5 te6 te7 te8
## 1 4 4 4 4 4 4 4 4
## 2 5 5 4 4 4 5 3 4
## 3 3 4 4 4 4 4 4 4
## 4 4 4 4 4 5 4 2 4
## 5 4 5 4 4 5 5 4 5
## 6 1 3 3 3 1 1 3 2
head(zhdata)
## zh1 zh2 zh3 zh4 zh5 zh6 zh7
## 1 3 1 1 2 3 2 2
## 2 4 4 5 3 5 5 5
## 3 1 1 2 2 1 3 1
## 4 3 3 3 3 2 2 2
## 5 1 1 1 4 2 1 1
## 6 4 3 3 4 2 4 4
6.2 kmo test
https://www.statisticshowto.com/kaiser-meyer-olkin/
KMO(Kaiser-Meyer-Olkin) The Kaiser-Meyer-Olkin (KMO) Test is a measure of how suited your data is for Factor Analysis. The test measures sampling adequacy for each variable in the model and for the complete model.
KMO returns values between 0 and 1.
KMO values between 0.8 and 1 indicate the sampling is adequate.
kmo_test<-KMO(zhdata)
print(kmo_test, stats = c("both", "MSA", "KMO"), vars = "all", sort = FALSE, show = "all", digits = 3)
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = zhdata)
## Overall MSA = 0.872
## MSA for each item =
## zh1 zh2 zh3 zh4 zh5 zh6 zh7
## 0.894 0.881 0.887 0.897 0.907 0.821 0.823
6.3 Bartlett’s test
https://www.statology.org/bartletts-test-of-sphericity/
Bartlett’s test of sphericity
Bartlett’s Test of Sphericity compares an observed correlation matrix to the identity matrix. Essentially it checks to see if there is a certain redundancy between the variables that we can summarize with a few number of factors.
If the p-value from Bartlett’s Test of Sphericity is lower than our chosen significance level (common choices are 0.10, 0.05, and 0.01), then our dataset is suitable for a data reduction technique.
cortest.bartlett(zhdata)
## R was not square, finding R from data
## $chisq
## [1] 1820.569
##
## $p.value
## [1] 0
##
## $df
## [1] 21
6.4 estimate the number of factors
Chapman, C., & Feit, E. M. (2019). R For Marketing Research and Analytics. Springer. chapter 8.p206-213 https://link.springer.com/book/10.1007%2F978-3-030-14316-9
parallel <- fa.parallel(zhdata, fm = 'minres', fa = 'both',n.iter=100)
## Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, :
## The estimated weights for the factor scores are probably incorrect. Try a
## different factor score estimation method.
## Parallel analysis suggests that the number of factors = 2 and the number of components = 1
6.5 attrive 2 factors
efa2 <- fa(zhdata,nfactors = 2,rotate = "oblimin",fm="minres")
## 载入需要的名字空间:GPArotation
print(efa2)
## Factor Analysis using method = minres
## Call: fa(r = zhdata, nfactors = 2, rotate = "oblimin", fm = "minres")
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR1 MR2 h2 u2 com
## zh1 0.82 0.02 0.69 0.31 1
## zh2 0.91 -0.06 0.76 0.24 1
## zh3 0.88 0.02 0.78 0.22 1
## zh4 0.73 0.09 0.64 0.36 1
## zh5 0.09 0.74 0.64 0.36 1
## zh6 -0.03 0.87 0.72 0.28 1
## zh7 -0.01 0.91 0.81 0.19 1
##
## MR1 MR2
## SS loadings 2.86 2.18
## Proportion Var 0.41 0.31
## Cumulative Var 0.41 0.72
## Proportion Explained 0.57 0.43
## Cumulative Proportion 0.57 1.00
##
## With factor correlations of
## MR1 MR2
## MR1 1.00 0.66
## MR2 0.66 1.00
##
## Mean item complexity = 1
## Test of the hypothesis that 2 factors are sufficient.
##
## The degrees of freedom for the null model are 21 and the objective function was 4.94 with Chi Square of 1820.57
## The degrees of freedom for the model are 8 and the objective function was 0.11
##
## The root mean square of the residuals (RMSR) is 0.02
## The df corrected root mean square of the residuals is 0.03
##
## The harmonic number of observations is 373 with the empirical chi square 6.95 with prob < 0.54
## The total number of observations was 373 with Likelihood Chi Square = 39.38 with prob < 4.2e-06
##
## Tucker Lewis Index of factoring reliability = 0.954
## RMSEA index = 0.103 and the 90 % confidence intervals are 0.072 0.136
## BIC = -7.99
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy
## MR1 MR2
## Correlation of (regression) scores with factors 0.96 0.95
## Multiple R square of scores with factors 0.92 0.90
## Minimum correlation of possible factor scores 0.84 0.81
print(efa2$loadings,cutoff = 0.5)
##
## Loadings:
## MR1 MR2
## zh1 0.821
## zh2 0.911
## zh3 0.875
## zh4 0.733
## zh5 0.737
## zh6 0.868
## zh7 0.910
##
## MR1 MR2
## SS loadings 2.816 2.136
## Proportion Var 0.402 0.305
## Cumulative Var 0.402 0.707
6.6 chart
factor.plot(efa2,labels=rownames(efa2$loadings))
#heatmap.2(efa2$loadings,
# col=brewer.pal(9, "Greens"), trace="none", key=FALSE, dend="none",
# Colv=FALSE, cexCol = 1.2)
fa.diagram(efa2)
#fa.diagram(efa4plog,simple=FALSE)
7.CFA, Confirmatory Factor Analysis
MR1 MR2
zh1 0.821
zh2 0.911
zh3 0.875
zh4 0.733
zh5 0.737 zh6 0.868 zh7 0.910
7.1 model setting
cfa_model <- " D1 =~ zh1 + zh2 + zh3 + zh4
D2 =~ zh5 + zh6 + zh7 "
7.2 model estimation
cfa.fit <- cfa(cfa_model, data=zhdata)
summary(cfa.fit, fit.measures=TRUE)
## lavaan 0.6-12 ended normally after 25 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 15
##
## Number of observations 373
##
## Model Test User Model:
##
## Test statistic 48.083
## Degrees of freedom 13
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 1841.136
## Degrees of freedom 21
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.981
## Tucker-Lewis Index (TLI) 0.969
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -3488.689
## Loglikelihood unrestricted model (H1) -3464.647
##
## Akaike (AIC) 7007.377
## Bayesian (BIC) 7066.201
## Sample-size adjusted Bayesian (BIC) 7018.610
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.085
## 90 Percent confidence interval - lower 0.060
## 90 Percent confidence interval - upper 0.111
## P-value RMSEA <= 0.05 0.012
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.027
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## D1 =~
## zh1 1.000
## zh2 1.058 0.053 19.985 0.000
## zh3 1.097 0.053 20.885 0.000
## zh4 0.999 0.055 18.218 0.000
## D2 =~
## zh5 1.000
## zh6 1.108 0.061 18.060 0.000
## zh7 1.183 0.061 19.257 0.000
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## D1 ~~
## D2 0.714 0.080 8.978 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .zh1 0.512 0.047 10.851 0.000
## .zh2 0.452 0.045 10.068 0.000
## .zh3 0.374 0.042 9.008 0.000
## .zh4 0.609 0.054 11.319 0.000
## .zh5 0.534 0.050 10.684 0.000
## .zh6 0.505 0.052 9.656 0.000
## .zh7 0.316 0.047 6.751 0.000
## D1 1.137 0.118 9.613 0.000
## D2 0.988 0.109 9.074 0.000
7.3 chart
semPaths(cfa.fit, what="est", fade=FALSE, residuals=FALSE,edge.label.cex=0.75)
8.reliability analysis
zh_d1<-tourist[,c('zh1','zh2','zh3','zh4')]
alpha(zh_d1)
##
## Reliability analysis
## Call: alpha(x = zh_d1)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.91 0.91 0.89 0.71 10 0.0077 2.7 1.2 0.73
##
## 95% confidence boundaries
## lower alpha upper
## Feldt 0.89 0.91 0.92
## Duhachek 0.89 0.91 0.92
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## zh1 0.89 0.89 0.84 0.72 7.8 0.0102 0.00207 0.74
## zh2 0.88 0.88 0.83 0.70 7.1 0.0111 0.00202 0.72
## zh3 0.87 0.87 0.82 0.69 6.7 0.0118 0.00240 0.67
## zh4 0.90 0.90 0.85 0.74 8.6 0.0093 0.00043 0.74
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## zh1 373 0.88 0.88 0.82 0.78 2.8 1.3
## zh2 373 0.90 0.90 0.85 0.81 2.6 1.3
## zh3 373 0.91 0.91 0.87 0.83 2.7 1.3
## zh4 373 0.86 0.86 0.79 0.75 2.6 1.3
##
## Non missing response frequency for each item
## 1 2 3 4 5 miss
## zh1 0.24 0.14 0.25 0.30 0.07 0
## zh2 0.31 0.13 0.23 0.27 0.05 0
## zh3 0.28 0.14 0.26 0.24 0.08 0
## zh4 0.31 0.20 0.19 0.24 0.07 0
zh_d2<-tourist[,c('zh5','zh6','zh7')]
alpha(zh_d2)
##
## Reliability analysis
## Call: alpha(x = zh_d2)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.88 0.88 0.84 0.72 7.7 0.01 2.6 1.2 0.72
##
## 95% confidence boundaries
## lower alpha upper
## Feldt 0.86 0.88 0.90
## Duhachek 0.86 0.88 0.91
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## zh5 0.87 0.87 0.76 0.76 6.5 0.014 NA 0.76
## zh6 0.84 0.84 0.72 0.72 5.2 0.017 NA 0.72
## zh7 0.80 0.80 0.67 0.67 4.0 0.021 NA 0.67
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## zh5 373 0.88 0.88 0.79 0.74 2.6 1.2
## zh6 373 0.90 0.90 0.83 0.77 2.5 1.3
## zh7 373 0.92 0.92 0.87 0.82 2.6 1.3
##
## Non missing response frequency for each item
## 1 2 3 4 5 miss
## zh5 0.28 0.16 0.30 0.21 0.05 0
## zh6 0.34 0.15 0.19 0.28 0.04 0
## zh7 0.33 0.12 0.26 0.23 0.05 0
- 文章信息
- 作者: kaiwu
- 点击数:185
1.Microsoft Office
2.google document
https://en.wikipedia.org/wiki/Google_Docs,_Sheets_and_Slides,_Sheets_and_Slides
3.Zoho office
4.libreoffice
LibreOffice is a free and powerful office suite, and a successor to OpenOffice.org (commonly known as OpenOffice).
Its clean interface and feature-rich tools help you unleash your creativity and enhance your productivity.
5.openoffice
Apache OpenOffice is the leading open-source office software suite for word processing, spreadsheets, presentations, graphics, databases and more. It is available in many languages and works on all common computers. It stores all your data in an international open standard format and can also read and write files from other common office software packages.
online office platforms in China mainland.
6.WPS office
https://drive.wps.cn/ 或者https://docs.wps.cn
WPS Office is available on multiple platforms, including Windows, macOS, Linux, Android, and iOS. You can work anytime and anywhere on your mobile phone or computer. Work-study-life balance is no longer a far-fetched dream but at your fingertips.
7.tengxun document
8.dingtalk document
9.feishu document
10. Pandoc支持多种格式文本之间的转换
About pandoc
If you need to convert files from one markup format into another, pandoc is your swiss-army knife. Pandoc can convert between the following formats:
(← = conversion from; → = conversion to; ↔︎ = conversion from and to)
- Lightweight markup formats
-
↔︎ Markdown (including CommonMark and GitHub-flavored Markdown)
↔︎ reStructuredText
→ AsciiDoc
↔︎ Emacs Org-Mode
↔︎ Emacs Muse
↔︎ Textile
→ Markua
← txt2tags - HTML formats
-
↔︎ (X)HTML 4
↔︎ HTML5 - Ebooks
-
↔︎ EPUB version 2 or 3
↔︎ FictionBook2 - Documentation formats
- Roff formats
- TeX formats
- XML formats
-
↔︎ DocBook version 4 or 5
↔︎ JATS
→ TEI Simple
→ OpenDocument XML - Outline formats
-
↔︎ OPML
- Bibliography formats
-
↔︎ BibTeX
↔︎ BibLaTeX
↔︎ CSL JSON
↔︎ CSL YAML
← RIS
← EndNote XML
- Word processor formats
-
↔︎ Microsoft Word docx
↔︎ Rich Text Format RTF
↔︎ OpenOffice/LibreOffice ODT - Interactive notebook formats
-
↔︎ Jupyter notebook (ipynb)
- Page layout formats
- Wiki markup formats
-
↔︎ MediaWiki markup
↔︎ DokuWiki markup
← TikiWiki markup
← TWiki markup
← Vimwiki markup
→ XWiki markup
→ ZimWiki markup
↔︎ Jira wiki markup
← Creole - Slide show formats
-
→ LaTeX Beamer
→ Microsoft PowerPoint
→ Slidy
→ reveal.js
→ Slideous
→ S5
→ DZSlides - Data formats
-
← CSV tables
- Custom formats
-
↔︎ custom readers and writers can be written in Lua
-
→ via
pdflatex
,lualatex
,xelatex
,latexmk
,tectonic
,wkhtmltopdf
,weasyprint
,prince
,pagedjs-cli
,context
, orpdfroff
.
Pandoc understands a number of useful markdown syntax extensions, including document metadata (title, author, date); footnotes; tables; definition lists; superscript and subscript; strikeout; enhanced ordered lists (start number and numbering style are significant); running example lists; delimited code blocks with syntax highlighting; smart quotes, dashes, and ellipses; markdown inside HTML blocks; and inline LaTeX. If strict markdown compatibility is desired, all of these extensions can be turned off.
LaTeX math (and even macros) can be used in markdown documents. Several different methods of rendering math in HTML are provided, including MathJax and translation to MathML. LaTeX math is converted (as needed by the output format) to unicode, native Word equation objects, MathML, or roff eqn.
Pandoc includes a powerful system for automatic citations and bibliographies. This means that you can write a citation like
[see @doe99, pp. 33-35; also @smith04, ch. 1]
and pandoc will convert it into a properly formatted citation using any of hundreds of CSL styles (including footnote styles, numerical styles, and author-date styles), and add a properly formatted bibliography at the end of the document. The bibliographic data may be in BibTeX, BibLaTeX, CSL JSON, or CSL YAML format. Citations work in every output format.
There are many ways to customize pandoc to fit your needs, including a template system and a powerful system for writing filters.
Pandoc includes a Haskell library and a standalone command-line program. The library includes separate modules for each input and output format, so adding a new input or output format just requires adding a new module.
Pandoc is free software, released under the GPL. Copyright 2006–2021 John MacFarlane.
- 文章信息
- 作者: kaiwu
- 点击数:265
听了得到app,卓克.科技参考2 274 | 衰老:我们会在34岁、60岁、78岁突然老去
https://www.dedao.cn/course/article?id=5Mr9mzb36pP4JL5d3PXkWqB2EYNegL
衰老的进程并不是均匀的,而是有 3 个时间节点——34 岁、60 岁、78 岁。
这 3 个时间节点对大部分人来说都是如此。大家都是在这 3 个年龄节点,在可测的指标上突然大幅衰老。而在其他时间段,衰老的进程要慢得
Tony 团队把年轻小鼠的血液输送给老年小鼠后,很明显的改善了老年小鼠的大脑功能。这就提示 Tony 团队,血液中肯定存在着很多与年龄相关的物质,如果把这些物质的浓度、种类、配比准确测量出来,说不定就能用它来评估衰老程度。所以从 2015 年开始,他们就把注意力集中在血浆蛋白上。
血浆蛋白是个统称,可测的蛋白质有几千种,有的负责维持渗透压,有的负责稳定 pH 值,有的是免疫蛋白,有的起催化作用,有的起运输作用,有的起凝血作用等等。每升血液里的血浆蛋白大概有 70 多克。
Tony 团队一共召集了 4263 个年龄从 18-95 岁的人,测量了他们体内 2925 种血浆蛋白,然后在海量数据里筛选出与年龄变化高度相关的蛋白质,就得出了以上的结论。


相关的博客文章
blog posts on this study
https://med.stanford.edu/news/all-news/2019/12/stanford-scientists-reliably-predict-peoples-age-by-measuring-pr.html
https://surfaceyourrealself.com/2019/12/24/a-blood-plasma-protein-aging-clock/
https://joshmitteldorf.scienceblog.com/2019/12/23/new-aging-clock-based-on-proteins-in-the-blood/

- 文章信息
- 作者: kaiwu
- 点击数:154
lingo 20 for windows
https://od.lk/d/178246929_n9plM/LINGO-WINDOWS-64x86-20.0.zip
Excel格式模型
1.桌子和椅子问题
https://od.lk/d/178246934_ReF81/Excel_solver1_table_and_chair.xlsx
https://od.lk/d/178246939_TqxcK/lingo1_table_chair.lg4
2.牛排和土豆问题
https://od.lk/d/178246933_PDpqB/Excel_solver2_beef_and_potato.xlsx
https://od.lk/d/178246940_4g0US/lingo2_beef_and_potato.lg4
3.食物问题可拓展版本
https://od.lk/d/178246942_rli7B/Excel_solver2food_extend.xlsx
https://od.lk/d/178246941_vWbH4/lingo2006example1-7-3Excel.lg4
4.班次安排问题
https://od.lk/d/178246935_2zcYE/lingo4_workshift.lg4
接线员数量
https://od.lk/d/178246932_tvNNJ/Excel_solver3_workshift1opetator.xlsx
警察巡逻
https://od.lk/d/178246930_o2mgM/Excel_solver3_workshift2policeman.xlsx
航班
https://od.lk/d/178246931_yKrLr/Excel_solver3_workshift3crew.xlsx
- 文章信息
- 作者: kaiwu
- 点击数:758
小说《赘婿》中与穿越小说类似,也有“抄诗词”的桥段,但是故事情境与诗词配合得非常好就很厉害了。
https://book.qidian.com/info/1979049/
https://www.amazon.cn/dp/B07W53NJ18
《水调歌头·明月几时有》
苏轼
丙辰中秋,欢饮达旦,大醉,作此篇,兼怀子由。
明月几时有?把酒问青天。
不知天上宫阙,今夕是何年。
我欲乘风归去,又恐琼楼玉宇,高处不胜寒。
起舞弄清影,何似在人间。
转朱阁,低绮户,照无眠。
不应有恨,何事长向别时圆?
人有悲欢离合,月有阴晴圆缺,此事古难全。
但愿人长久,千里共婵娟。
《送别》
李叔同
长亭外,古道边,芳草碧连天。
晚风拂柳笛声残,夕阳山外山。
天之涯,地之角,知交半零落。
一瓢浊酒尽余欢,今宵别梦寒。
《青玉案·元夕》
辛弃疾
东风夜放花千树,更吹落、星如雨。
宝马雕车香满路。
凤箫声动,玉壶光转,一夜鱼龙舞。
蛾儿雪柳黄金缕,笑语盈盈暗香去。
众里寻他千百度,蓦然回首,那人却在,灯火阑珊处。
《定风波·莫听穿林打叶声》
苏轼
三月七日,沙湖道中遇雨,雨具先去,同行皆狼狈,余独不觉。已而遂晴,故作此(词)。
莫听穿林打叶声,何妨吟啸且徐行。
竹杖芒鞋轻胜马,谁怕?一蓑烟雨任平生。
料峭春风吹酒醒,微冷,山头斜照却相迎。
回首向来萧瑟处,归去,也无风雨也无晴。
《卜算子·黄州定慧院寓居作》
苏轼
缺月挂疏桐,漏断人初静。
谁见幽人独往来,缥缈孤鸿影。
惊起却回头,有恨无人省。
拣尽寒枝不肯栖,寂寞沙洲冷。
《鹊桥仙·纤云弄巧》
秦观
纤云弄巧,飞星传恨,银汉迢迢暗度。
金风玉露一相逢,便胜却人间无数。
柔情似水,佳期如梦,忍顾鹊桥归路!
两情若是久长时,又岂在朝朝暮暮。
《摸鱼儿·雁丘词》
元好问
乙丑岁赴试并州,道逢捕雁者云:“今旦获一雁,杀之矣。其脱网者悲鸣不能去,竟自投于地而死。”予因买得之,葬之汾水之上,垒石为识,号曰“雁丘”。同行者多为赋诗,予亦有《雁丘词》。旧所作无宫商,今改定之。
问世间,情是何物,直教生死相许?
天南地北双飞客,老翅几回寒暑。
欢乐趣,离别苦,就中更有痴儿女。
君应有语:渺万里层云,千山暮雪,只影向谁去?
横汾路,寂寞当年箫鼓,荒烟依旧平楚。
招魂楚些何嗟及,山鬼暗啼风雨。
天也妒,未信与,莺儿燕子俱黄土。
千秋万古,为留待骚人,狂歌痛饮,来访雁丘处。
《望海潮·东南形胜》
柳永
东南形胜,三吴都会,钱塘自古繁华。
烟柳画桥,风帘翠幕,参差十万人家。
云树绕堤沙,怒涛卷霜雪,天堑无涯。
市列珠玑,户盈罗绮,竞豪奢。
重湖叠巘清嘉,有三秋桂子,十里荷花。
羌管弄晴,菱歌泛夜,嬉嬉钓叟莲娃。
千骑拥高牙,乘醉听箫鼓,吟赏烟霞。
异日图将好景,归去凤池夸。
《人生·江湖》
黄沾
天下风云出我辈,一入江湖岁月催,
皇图霸业谈笑中,不胜人生一场醉。
提剑跨骑挥鬼雨,白骨如山鸟惊飞。
尘事如潮人如水,只叹江湖几人回。
英雄路远掌声近,莫问苍生问星辰。
天地有涯风有信,大海无量不见人!
《侠客行》
李白
赵客缦胡缨,吴钩霜雪明。
银鞍照白马,飒沓如流星。
十步杀一人,千里不留行。
事了拂衣去,深藏身与名。
闲过信陵饮,脱剑膝前横。
将炙啖朱亥,持觞劝侯嬴。
三杯吐然诺,五岳倒为轻。
眼花耳热后,意气素霓生。
救赵挥金槌,邯郸先震惊。
千秋二壮士,烜赫大梁城。
纵死侠骨香,不惭世上英。
谁能书阁下,白首太玄经。
《江上吟》
李白
木兰之枻沙棠舟,玉箫金管坐两头。
美酒尊中置千斛,载妓随波任去留。
仙人有待乘黄鹤,海客无心随白鸥。
屈平词赋悬日月,楚王台榭空山丘。
兴酣落笔摇五岳,诗成笑傲凌沧洲。
功名富贵若长在,汉水亦应西北流。
《如梦令·昨夜雨疏风骤》
李清照
昨夜雨疏风骤,浓睡不消残酒。
试问卷帘人,却道海棠依旧。
知否,知否?应是绿肥红瘦。
《临江仙·滚滚长江东逝水》
杨慎
滚滚长江东逝水,浪花淘尽英雄。
是非成败转头空。
青山依旧在,几度夕阳红。
白发渔樵江渚上,惯看秋月春风。
一壶浊酒喜相逢。
古今多少事,都付笑谈中。