Skip to content

cmoscardi/SEM

This branch is 2 commits ahead of ZZR8066/SEM:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

author
Christian Moscardi
Oct 14, 2024
def16ef · Oct 14, 2024

History

3 Commits
Oct 14, 2024
Mar 10, 2023
Mar 10, 2023
Mar 10, 2023
Mar 10, 2023

Repository files navigation

Split, Embed and Merge: An accurate table structure recognizer

This repository contains the source code of our Pattern Recognition 2022 paper: Split, Embed and Merge: An accurate table structure recognizer.

Introduction

pipeline

Split, Embed and Merge (SEM) is a new framework for parsing the tabular data into the structured format, which is mainly composed of three parts, splitter, embedder and merger. We won the first place of complex tables and third place of all tables in Task-B of ICDAR 2021 Competition on Scientific Literature Parsing.

Dataset

We provide scripts for processing the SciTSR dataset, which contains 15,000 tables in PDF format as well as their corresponding structure labels.

It’s worth noting that we need to align the text information with the table cells in order to generate labels of splitter.

Requirements

  • torch==1.7.1

Training and Testing

python runner/train.py --cfg default

Citation

If you find SEM useful in your research, please consider citing:

@article{zhang2022split,
  title={Split, embed and merge: An accurate table structure recognizer},
  author={Zhang, Zhenrong and Zhang, Jianshu and Du, Jun and Wang, Fengren},
  journal={Pattern Recognition},
  volume={126},
  pages={108565},
  year={2022},
  publisher={Elsevier}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%