Archive for May, 2017

Data alignment in block devices

CentOS 7 [root@]# cat /etc/redhat-release CentOS Linux release 7.3.1611 (Core) Problem: mkfs.xfs warning: device is not properly aligned [root@]# parted -a optimal /dev/mapper/mpathb mkpart primary 0% 100% [root@]# parted /dev/mapper/mpathb align-check opt 1 1 aligned parted /dev/mapper/mpathb p Model: Linux device-mapper (multipath) (dm) Disk /dev/mapper/mpathb: 21.5GB Sector size (logical/physical): 512B/512B Partition Table: gpt Disk Flags: […]

Read more...

Spark and CSV and SQL

SQL data_frame = spark.read.csv(“/db/nowe/APLUS_LOG_MINMAX_1183241001_LoggerData.csv”, header=True).select(“ID”, “UTC”).limit(200) data_frame.createOrReplaceTempView(“my_table”) # What happened? # data_frame.printSchema() spark.sql(“desc my_table”).show() # Wow # data_frame.first() spark.sql(“select * from my_table limit 1”).show() # data_frame.withColumnRenamed spark.sql(“select ID as SOME_ID from my_table limit 1”).show() # Casting… data_frame.select(data_frame.ID.cast(“float”)).show(2) spark.sql(“select CAST(ID as FLOAT) as SOME_ID from my_table limit 1”).show() # Now, cast ID to float, then get […]

Read more...

Spark and CSV for python language

Now, we have 2017 year, second quarter. It seems that in one year the a/m instruction are not to be adequate. Simple instructions: # Open file and use first columns as header data_frame = spark.read.csv(“/path/to/file.csv”, header=True) # You received basic Spark type – DataFrame. # See what structure how looks the structure data_frame.printSchema() root |– […]

Read more...