Beide Seiten der vorigen Revision Vorhergehende Überarbeitung | Nächste ÜberarbeitungBeide Seiten der Revision |
arbeiten:extraktion_strukturierter_daten_aus_rechnungsdokumenten_mittels_maschinellen_lernens [07.06.2021 13:32] – Christian Wolff | arbeiten:extraktion_strukturierter_daten_aus_rechnungsdokumenten_mittels_maschinellen_lernens [14.06.2021 12:58] – Fix Aufzählungen + Zweitgutachter hinzugefügt wef17307 |
---|
BearbeiterIn : Felix Wende | BearbeiterIn : Felix Wende |
ErstgutachterIn_thesisprofessor : Christian Wolff | ErstgutachterIn_thesisprofessor : Christian Wolff |
ZweitgutachterIn_secondthesisprofessor : | ZweitgutachterIn_secondthesisprofessor : Raphael Wimmer |
Status_thesisstate : in Bearbeitung | Status_thesisstate : in Bearbeitung |
Stichworte_thesiskeywords : | Stichworte_thesiskeywords : |
| |
=== Konkrete Aufgaben === | === Konkrete Aufgaben === |
1. Literaturrecherche | - Literaturrecherche |
2. Toolrecherche (OCR, pdf2python, Annotation) | - Toolrecherche (OCR, pdf2python, Annotation) |
3. Dokumente digitalisieren (Scannen, OCR, sortieren) | - Dokumente digitalisieren (Scannen, OCR, sortieren) |
4. Dokumente annotieren | - Dokumente annotieren |
5. Inhalte extrahieren (Text, Position, Größe, etc.) und labeln mit den Annotationen | - Inhalte extrahieren (Text, Position, Größe, etc.) und labeln mit den Annotationen |
6. Trainingsdatensatz erstellen | - Trainingsdatensatz erstellen |
7. Entwicklung des ML-Modells | - Entwicklung des ML-Modells |
- Datenaufbereitung | - Datenaufbereitung |
- Modell-Auswahl | * Modell-Auswahl |
- Feature Engineering | * Feature Engineering |
- Feature Selection | * Feature Selection |
- Parameter Optimization | * Parameter Optimization |
8. Evaluation und Vergleich mit DL Ansatz | - Evaluation und Vergleich mit DL Ansatz |
| |
=== Erwartete Vorkenntnisse === | === Erwartete Vorkenntnisse === |
- Machine Learning | * Machine Learning |
- python | * Listenpunktpython |
| |
=== Weiterführende Quellen === | === Weiterführende Quellen === |
- R. B. Palm, F. Laws and O. Winther, "Attend, Copy, Parse End-to-end Information Extraction from Documents," 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia, 2019, pp. 329-336, doi: 10.1109/ICDAR.2019.00060. | * R. B. Palm, F. Laws and O. Winther, "Attend, Copy, Parse End-to-end Information Extraction from Documents," 2019 International Conference on Document Analysis and Recognition (ICDAR), Sydney, Australia, 2019, pp. 329-336, doi: 10.1109/ICDAR.2019.00060. |
- R. B. Palm, O. Winther and F. Laws, "CloudScan - A Configuration-Free Invoice Analysis System Using Recurrent Neural Networks," 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, 2017, pp. 406-413, doi: 10.1109/ICDAR.2017.74. | * R. B. Palm, O. Winther and F. Laws, "CloudScan - A Configuration-Free Invoice Analysis System Using Recurrent Neural Networks," 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, 2017, pp. 406-413, doi: 10.1109/ICDAR.2017.74. |
- D. Schuster et al., "Intellix -- End-User Trained Information Extraction for Document Archiving," 2013 12th International Conference on Document Analysis and Recognition, Washington, DC, 2013, pp. 101-105, doi: 10.1109/ICDAR.2013.28. | * D. Schuster et al., "Intellix -- End-User Trained Information Extraction for Document Archiving," 2013 12th International Conference on Document Analysis and Recognition, Washington, DC, 2013, pp. 101-105, doi: 10.1109/ICDAR.2013.28. |
- F. Schulz, M. Ebbecke, M. Gillmann, B. Adrian, S. Agne and A. Dengel, "Seizing the Treasure: Transferring Knowledge in Invoice Analysis," 2009 10th International Conference on Document Analysis and Recognition, Barcelona, 2009, pp. 848-852, doi: 10.1109/ICDAR.2009.47. | * F. Schulz, M. Ebbecke, M. Gillmann, B. Adrian, S. Agne and A. Dengel, "Seizing the Treasure: Transferring Knowledge in Invoice Analysis," 2009 10th International Conference on Document Analysis and Recognition, Barcelona, 2009, pp. 848-852, doi: 10.1109/ICDAR.2009.47. |
- Holt, X., & Chisholm, A. (2018, December). Extracting structured data from invoices. In Proceedings of the Australasian Language Technology Association Workshop 2018 (pp. 53-59). | * Holt, X., & Chisholm, A. (2018, December). Extracting structured data from invoices. In Proceedings of the Australasian Language Technology Association Workshop 2018 (pp. 53-59). |
- Bardelli, C., Rondinelli, A., Vecchio, R., & Figini, S. (2020). Automatic electronic invoice classification using machine learning models. Machine Learning and Knowledge Extraction, 2(4), 617-629. | * Bardelli, C., Rondinelli, A., Vecchio, R., & Figini, S. (2020). Automatic electronic invoice classification using machine learning models. Machine Learning and Knowledge Extraction, 2(4), 617-629. |
| |