九色国产,午夜在线视频,新黄色网址,九九色综合,天天做夜夜做久久做狠狠,天天躁夜夜躁狠狠躁2021a,久久不卡一区二区三区

打開APP
userphoto
未登錄

開通VIP,暢享免費電子書等14項超值服

開通VIP
java 讀寫Parquet格式的數(shù)據(jù) Parquet example
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.Random;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.log4j.Logger;
import org.apache.parquet.example.data.Group;
import org.apache.parquet.example.data.GroupFactory;
import org.apache.parquet.example.data.simple.SimpleGroupFactory;
import org.apache.parquet.hadoop.ParquetReader;
import org.apache.parquet.hadoop.ParquetReader.Builder;
import org.apache.parquet.hadoop.ParquetWriter;
import org.apache.parquet.hadoop.example.GroupReadSupport;
import org.apache.parquet.hadoop.example.GroupWriteSupport;
import org.apache.parquet.schema.MessageType;
import org.apache.parquet.schema.MessageTypeParser;

public class ReadParquet {
    static Logger logger=Logger.getLogger(ReadParquet.class);
    public static void main(String[] args) throws Exception {
        
//        parquetWriter("test\\parquet-out2","input.txt");
        parquetReaderV2("test\\parquet-out2");
    }
    
    
    static void parquetReaderV2(String inPath) throws Exception{
        GroupReadSupport readSupport = new GroupReadSupport();
        Builder<Group> reader= ParquetReader.builder(readSupport, new Path(inPath));
        ParquetReader<Group> build=reader.build();
        Group line=null;
        while((line=build.read())!=null){
      Group time= line.getGroup("time", 0);
        //通過下標和字段名稱都可以獲取

        /*System.out.println(line.getString(0, 0)+"\t"+
        line.getString(1, 0)+"\t"+
        time.getInteger(0, 0)+"\t"+
        time.getString(1, 0)+"\t");*/

        System.out.println(line.getString("city", 0)+"\t"+
        line.getString("ip", 0)+"\t"+
        time.getInteger("ttl", 0)+"\t"+
        time.getString("ttl2", 0)+"\t");

        //System.out.println(line.toString());

        }
        System.out.println("讀取結(jié)束");
    } 
    //新版本中new ParquetReader()所有構造方法好像都棄用了,用上面的builder去構造對象
    static void parquetReader(String inPath) throws Exception{
        GroupReadSupport readSupport = new GroupReadSupport();
        ParquetReader<Group> reader = new ParquetReader<Group>(new Path(inPath),readSupport);
        Group line=null;
        while((line=reader.read())!=null){
  System.out.println(line.toString());
}
        System.out.println("讀取結(jié)束");
        
    }
    /**
     * 
     * @param outPath  輸出Parquet格式
     * @param inPath  輸入普通文本文件
     * @throws IOException
     */
    static void parquetWriter(String outPath,String inPath) throws IOException{
        MessageType schema = MessageTypeParser.parseMessageType("message Pair {\n" +
                " required binary city (UTF8);\n" +
                " required binary ip (UTF8);\n" +
                " repeated group time {\n"+
                  " required int32 ttl;\n"+
                  " required binary ttl2;\n"+
                "}\n"+
              "}");
        GroupFactory factory = new SimpleGroupFactory(schema);
        Path path = new Path(outPath);
       Configuration configuration = new Configuration();
       GroupWriteSupport writeSupport = new GroupWriteSupport();
       writeSupport.setSchema(schema,configuration);
       ParquetWriter<Group> writer = new ParquetWriter<Group>(path,configuration,writeSupport);
    //把本地文件讀取進去,用來生成parquet格式文件 BufferedReader br =new BufferedReader(new FileReader(new File(inPath))); String line=""; Random r=new Random(); while((line=br.readLine())!=null){ String[] strs=line.split("\\s+"); if(strs.length==2) { Group group = factory.newGroup() .append("city",strs[0]) .append("ip",strs[1]); Group tmpG =group.addGroup("time"); tmpG.append("ttl", r.nextInt(9)+1); tmpG.append("ttl2", r.nextInt(9)+"_a"); writer.write(group); } } System.out.println("write end"); writer.close(); } }
說下schema(寫Parquet格式數(shù)據(jù)需要schema,讀取的話"自動識別"了schema)
/*
 * 每一個字段有三個屬性:重復數(shù)、數(shù)據(jù)類型和字段名,重復數(shù)可以是以下三種:
 *         required(出現(xiàn)1次)
 *         repeated(出現(xiàn)0次或多次) 
 *         optional(出現(xiàn)0次或1次)
 * 每一個字段的數(shù)據(jù)類型可以分成兩種:
 *         group(復雜類型)
 *         primitive(基本類型)
* 數(shù)據(jù)類型有
* INT64, INT32, BOOLEAN, BINARY, FLOAT, DOUBLE, INT96, FIXED_LEN_BYTE_ARRAY
*/
這個repeated和required 不光是次數(shù)上的區(qū)別,序列化后生成的數(shù)據(jù)類型也不同,
比如repeqted修飾 ttl2 打印出來為 WrappedArray([7,7_a])
而 required修飾 ttl2 打印出來為 [7,7_a]  
除了用MessageTypeParser.parseMessageType類生成MessageType 還可以用下面方法
(注意這里有個坑--spark里會有這個問題--ttl2這里
as(OriginalType.UTF8) required binary city (UTF8)作用一樣,加上UTF8,在讀取的時候可以轉(zhuǎn)為StringType,不加的話會報錯 [B cannot be cast to java.lang.String
/*MessageType schema = MessageTypeParser.parseMessageType("message Pair {\n" +
                " required binary city (UTF8);\n" +
                " required binary ip (UTF8);\n" +
                "repeated group time {\n"+
                "required int32 ttl;\n"+
                "required binary ttl2;\n"+
                "}\n"+
                "}");*/
        
//import org.apache.parquet.schema.Types;
MessageType schema = Types.buildMessage() 
           .required(PrimitiveTypeName.BINARY).as(OriginalType.UTF8).named("city") 
           .required(PrimitiveTypeName.BINARY).as(OriginalType.UTF8).named("ip") 
           .repeatedGroup().required(PrimitiveTypeName.INT32).named("ttl")
                            .required(PrimitiveTypeName.BINARY).as(OriginalType.UTF8).named("ttl2")
                            .named("time")
          .named("Pair"); 
 
解決 [B cannot be cast to java.lang.String 異常:
1.要么生成parquet文件的時候加個UTF8
2.要么讀取的時候再提供一個同樣的schema類指定該字段類型,比如下面:



hadoop Mapreducer讀寫 Parquetexample
http://www.cnblogs.com/yanghaolie/p/7389543.html
maven依賴(我用的1.7)
<dependency>
    <groupId>org.apache.parquet</groupId>
    <artifactId>parquet-hadoop</artifactId>
    <version>1.7.0</version>
</dependency>

 

本站僅提供存儲服務,所有內(nèi)容均由用戶發(fā)布,如發(fā)現(xiàn)有害或侵權內(nèi)容,請點擊舉報。
打開APP,閱讀全文并永久保存 查看更多類似文章
猜你喜歡
類似文章
解析四種大數(shù)據(jù)文件格式
java調(diào)用db2存儲過程例子
用Lucene檢索數(shù)據(jù)庫
天氣預報-java代碼
轉(zhuǎn) axis 自定義對象序列化--阿甘正傳
Android學習系列(20)
更多類似文章 >>
生活服務
熱點新聞
分享 收藏 導長圖 關注 下載文章
綁定賬號成功
后續(xù)可登錄賬號暢享VIP特權!
如果VIP功能使用有故障,
可點擊這里聯(lián)系客服!

聯(lián)系客服